How I Manage My Own Data Storage
All Posts, Computers, Linux July 28th, 2008I’ve been (seriously) playing with computers since I was about 12 years old. So in the last 10 years or so, one of the things I’ve always tried to remain conscious of is how reliable (or unreliable) my computer is. No one likes it when their computer breaks (not even someone who is talented at fixing them). For me, this concern has been brought on by the stress of losing data that, in hindsight, may not have been all that important. But, it was a data loss nonetheless and no one likes it when it happens.
So when I was in high school, my first strategy was to put my OS on a different drive than my user data. I figured that the system drive would experience many more read/writes than the data drive, because of paging and other things. So, it would be more prone to stress than a data drive that gets used less vigorously. I kept this strategy for several years. I was lucky enough to replace drives before they failed (as all drives do).
But, when I got to Case, I was exposed to a ridiculously fast fiber optic network that made it even easier to fill my hard drive with junk I probably don’t need. I needed a storage solution that met a few objectives:
- It had to be a relatively large amount available storage.
- It had to provide some level of fault-tolerance
- It had to run on Linux, since I was tired of switching between Linux and Windows Server 2003. I chose Linux and I was sticking with it.
So I built a 1.2TB software RAID5 array. There’s a pic missing on that page, I’ll see if I can locate it or take another. I also have some other items to add to that page, including benchmarks and other info. The RAID array is terrific. It’s survived one and a half drive failures (one legitimate drive failure and half a drive failure that I simulated by yanking a disc while it was on). Rebuilding the array takes several hours, but that’s to be expected. More on that when I update that post.
So of course, even with local data fault-tolerance, I want to have an off-site backup service. At first I was using Dreamhost, but they tried to extort a bunch of money out of me to use my storage space for backups. It was really shady. But it did force me to look elsewhere, and I’m glad I did. I wanted a cross-platform solution (given the heterogenous nature of my current computer collection). So I selected JungleDisk, which uses Amazon’s S3 web service for data storage. I pay a pitance (maybe a couple of dollars per month, if that) to store about 20GB of data. Backups happen nightly (or however often you select).
I use JungleDisk to backup all of the documents I’ve written in college, all of the code I’ve written, all of my digital photos, my encrypted password database, and a dump of my MySQL databases. Everything is encrypted before it even leaves my computer. The client runs right on my file server and just works. It’s an absolutely terrific product. My only complaint (a feature request, really) is that I wish it would e-mail me a copy of its log file every time a backup occurs. That way I know if it was successful and what was transferred.
I foresee keeping this setup for quite a while.
Updates: I added an extra drive to the RAID array. A few quick terminal commands later and I’m up to 1.5TB. Also, JungleDisk added a reporting option. You can get an e-mail when a backup job completes but if you want details you have to pay for their monthly service.
Popularity: 18% [?]
Recent Comments