Posted tagged ‘sata’

Building cheap cloud storage – the Backblaze way!



I recently read this article written by cloud backup service provider Backblaze on how to build  a cheap cloud storage device – 67 TB of storage for under 8000 USD to be exact.

Backblaze provides unlimited backups to individuals for a mere 5 USD/month, and it is really interesting to read about how they are coping with the demand.

I am going to make a brief summary of the most interesting parts of their  article.

Use of cheap RAID controllers
It’s interesting that they don’t use expensive hardware RAID controllers – their “Syba SD-SA2PEX-2IR” controllers cost only 35 dollars a piece, and leave the CPU to handle the job of maintaining RAID functions. This is ingenious in cloud storage, because the cost of a good CPU is far less than four or more full-featured hardware RAID controllers. (They use a Intel Core 2 in their rig, but since the post is a few months old a Core i5/i7 would probably be a better choice today.)

RAID structure
Using 45 drives in each server, Backblaze chooses to divide these drives into three RAID6 volumes of 15 drives each. This gives every volume  resistance against two disk failures, or a total of six drive failures in a best-case  scenario. (Two failing drives per volume.) The interesting part here is the “threshold” for maximum number of disks in a RAID6 volume, as estimated by Backblaze. With every new disk added, the likelihood of two drive failures in quick succession increases.

Tomcat backbone for communication
I mostly put this in to further dispel the idea that Java is “slow”. This is a great choice of platform, which must have lowered development costs considerably over an alternative implementation. (Such as modifying Apache, or writing a custom daemon to handle the communication.)

Encrypted communication
I am still a bit puzzled at this. I guess the encryption is supposed to protect against snooping on their internal network, but the data is encrypted on end users personal computers before upload, so this measure seems a bit unnecessary, especially with the added CPU usage. I’d be glad to hear some other ideas on this, so if you know, leave a comment!

JFS File system
JFS is a stable file system with low CPU utilization and great performance when looking for files. Read this in-depth file system benchmark for more information.

And that’s all. If you have any questions or other ideas, don’t hesitate to leave a comment!