How I Manage My Own Data Storage

All Posts, Computers, Linux No Comments »

I’ve been (seriously) playing with computers since I was about 12 years old.  So in the last 10 years or so, one of the things I’ve always tried to remain conscious of is how reliable (or unreliable) my computer is.  No one likes it when their computer breaks (not even someone who is talented at fixing them).  For me, this concern has been brought on by the stress of losing data that, in hindsight, may not have been all that important.  But, it was a data loss nonetheless and no one likes it when it happens.

So when I was in high school, my first strategy was to put my OS on a different drive than my user data.  I figured that the system drive would experience many more read/writes than the data drive, because of paging and other things.  So, it would be more prone to stress than a data drive that gets used less vigorously.  I kept this strategy for several years.  I was lucky enough to replace drives before they failed (as all drives do).

But, when I got to Case, I was exposed to a ridiculously fast fiber optic network that made it even easier to fill my hard drive with junk I probably don’t need.  I needed a storage solution that met a few objectives:

  • It had to be a relatively large amount available storage.
  • It had to provide some level of fault-tolerance
  • It had to run on Linux, since I was tired of switching between Linux and Windows Server 2003.  I chose Linux and I was sticking with it.

So I built a 1.2TB software RAID5 array. There’s a pic missing on that page, I’ll see if I can locate it or take another.  I also have some other items to add to that page, including benchmarks and other info.  The RAID array is terrific.  It’s survived one and a half drive failures (one legitimate drive failure and half a drive failure that I simulated by yanking a disc while it was on).  Rebuilding the array takes several hours, but that’s to be expected.  More on that when I update that post.

So of course, even with local data fault-tolerance, I want to have an off-site backup service.  At first I was using Dreamhost, but they tried to extort a bunch of money out of me to use my storage space for backups.  It was really shady.  But it did force me to look elsewhere, and I’m glad I did.  I wanted a cross-platform solution (given the heterogenous nature of my current computer collection).  So I selected JungleDisk, which uses Amazon’s S3 web service for data storage.  I pay a pitance (maybe a couple of dollars per month, if that) to store about 20GB of data.  Backups happen nightly (or however often you select).

I use JungleDisk to backup all of the documents I’ve written in college, all of the code I’ve written, all of my digital photos, my encrypted password database, and a dump of my MySQL databases.  Everything is encrypted before it even leaves my computer.  The client runs right on my file server and just works.  It’s an absolutely terrific product.  My only complaint (a feature request, really) is that I wish it would e-mail me a copy of its log file every time a backup occurs.  That way I know if it was successful and what was transferred.

I foresee keeping this setup for quite a while.

Updates: I added an extra drive to the RAID array.  A few quick terminal commands later and I’m up to 1.5TB.  Also, JungleDisk added a reporting option.  You can get an e-mail when a backup job completes but if you want details you have to pay for their monthly service.

Popularity: 21% [?]

How fast is the Case Network, really?

All Posts 2 Comments »

This is a question Colin and I thought about a few months ago.  Well there’s a two-part answer to that.  Our network is entirely gigabit fiber end-to-end.  So, theoretically I should be able to get 1000mbits/sec between two machines.  However, with network IO overhead and other hardware limitations, it seems the best I can do is:

alex@tardis:~$ iperf -s -w 100K
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:   200 KByte (WARNING: requested   100 KByte)
------------------------------------------------------------
[  4] local 129.22.56.170 port 5001 connected with 129.22.57.154 port 53101
[  4]  0.0-10.0 sec  1.08 GBytes    928 Mbits/sec

So about 930 Mbits/sec between two machines on the same network switch.  I had to tune the TCP window a little bit to get that, but that’s still not bad.  Of course real-world transfers are much slower because of a number of reasons: protocol limitations (like HTTP, FTP, Samba, etc), IO speeds on hard drives, etc.  For example, even though I can achieve nearly 930 mbits/sec between my two Linux machines using raw socket connections (where the source of the data on both machines is in memory), Samba transfers seem to top out around 250 mbits/sec, due to Samba limitations and probably a bit of hardware limitations as well, I’m sure.

Measuring internet bandwidth is a little trickier.  Case is connected to a few ‘internets’.  Obviously, we are connected to the same internet that everyone else in the world uses (the ‘commodity’ internet).  Our ISP for that is OneCommunity and our SLA calls gives us 450Mbits/sec (upstream and downstream).  Since our internal LAN behaves, essentially, as one giant network switch (and since we also don’t do any significant packet shaping), every end user can consume as much of that bandwidth as possible.  In this case, data transfer speeds are more dependent on the source you’re downloading from (or the destination you’re uploading to).  We’re also connected to internet2 (i2), which is a private fiber optic network for Higher Education institutions and research labs (like Los Alamos National Laboratory).

So, measuring i2 bandwidth can be accomplished by downloading a file hosted on an i2 server:

alex@tardis:/sata-raid5/downloads$ wget http://ftp.ussg.iu.edu/linux/ubuntu-releases/hardy/ubuntu-8.04.1-desktop-i386.iso
--17:40:46--  http://ftp.ussg.iu.edu/linux/ubuntu-releases/hardy/ubuntu-8.04.1-desktop-i386.iso
           => `ubuntu-8.04.1-desktop-i386.iso'
Resolving ftp.ussg.iu.edu... 156.56.247.193
Connecting to ftp.ussg.iu.edu|156.56.247.193|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 728,221,696 (694M) [application/octet-stream]

100%[=================================================================================>] 728,221,696    5.24M/s    ETA 00:00

17:42:44 (5.91 MB/s) - `ubuntu-8.04.1-desktop-i386.iso' saved [728221696/728221696]

So 6MB/s is roughly 48mbits/sec.

Let’s download something over a known fast commodity internet source (One of Apple’s servers that serves movie trailers):

alex@tardis:/sata-raid5/downloads$ wget http://movies.apple.com/movies/sony_pictures/quantum_of_solace/quantum_of_solace-tlr1_h1080p.mov
--17:55:22--  http://movies.apple.com/movies/sony_pictures/quantum_of_solace/quantum_of_solace-tlr1_h1080p.mov
           => `quantum_of_solace-tlr1_h1080p.mov'
Resolving movies.apple.com... 192.5.110.40, 192.5.110.39
Connecting to movies.apple.com|192.5.110.40|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 141,490,583 (135M) [video/quicktime]

100%[=================================================================================>] 141,490,583    6.74M/s    ETA 00:00

17:55:39 (8.05 MB/s) - `quantum_of_solace-tlr1_h1080p.mov' saved [141490583/141490583]

So that’s pretty fast, even faster than i2. But then I forgot to look up the host (movies.apple.com) in the first place

alex@tardis:/sata-raid5/downloads$ tracepath movies.apple.com
 1:  tardis.STUDENT.CWRU.Edu (129.22.56.170)                0.118ms pmtu 1500
 1:  129.22.56.2 (129.22.56.2)                              0.415ms
 2:  10.2.0.50 (10.2.0.50)                                  0.446ms
 3:  cwru2-fa4-0-0.cwru.edu (192.5.109.1)                   0.681ms
 4:  a192-5-110-40.deploy.akamaitechnologies.com (192.5.110.40)   0.915ms reached
     Resume: pmtu 1500 hops 4 back 4

and was reminded we have our own private Akamai cache, so that explains why the 135MB trailer download in 17 seconds–that Akamai cache is on our internal network, but outside our firewall, so that’s why it’s not even faster. Let’s try another server (Diggnation podcast distribution server):

alex@tardis:/sata-raid5/downloads$ wget http://www.podtrac.com/pts/redirect.mov/bitcast-a.bitgravity.com/revision3/web/diggnation/0160/diggnation--0160--2008-07-24joshv--hd.h264.mov
--17:59:27--  http://www.podtrac.com/pts/redirect.mov/bitcast-a.bitgravity.com/revision3/web/diggnation/0160/diggnation--0160--2008-07-24joshv--hd.h264.mov
           => `diggnation--0160--2008-07-24joshv--hd.h264.mov'
Resolving www.podtrac.com... 69.16.233.67
Connecting to www.podtrac.com|69.16.233.67|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://bitcast-a.bitgravity.com/revision3/web/diggnation/0160/diggnation--0160--2008-07-24joshv--hd.h264.mov [following]
--17:59:27--  http://bitcast-a.bitgravity.com/revision3/web/diggnation/0160/diggnation--0160--2008-07-24joshv--hd.h264.mov
           => `diggnation--0160--2008-07-24joshv--hd.h264.mov'
Resolving bitcast-a.bitgravity.com... 208.67.237.237
Connecting to bitcast-a.bitgravity.com|208.67.237.237|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 716,434,724 (683M) [video/quicktime]

100%[=================================================================================>] 716,434,724    4.38M/s    ETA 00:00

18:02:21 (3.92 MB/s) - `diggnation--0160--2008-07-24joshv--hd.h264.mov' saved [716434724/716434724]

About 4MB/sec equals 32mbits/sec.

Of course there are a few problems with these tests. For one, it suffers from a pretty obvious selection bias: I purposely picked hosts that I knew ahead of time were fast enough to show off the network speed. But I think the question is not “How fast is the Case network on average” but “How fast can you make the Case network go?” The external hosts I tested are among the fastest I know of.

More information: Network traffic stats, contains near real-time information on network traffic as well as IP routing info, topology maps, etc.  Some links on that page may require a Case Network ID

Leave comments below

Popularity: 17% [?]

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in