As you may have read in our previous posts (here, here and here), one of the largest pieces of our data infrastructure at Localytics is the petabyte scale Vertica analytics database that we host on Amazon Web Services. We've been relying on Amazon's Elastic Block Store (EBS) as the storage solution for this database for more than a year now. EBS allows you to provision virtual block devices and attach them to your compute instances as regular drives, giving you the ability to effectively decouple the size of your computing resources from their storage.
EBS has been a good choice for us because of the flexibility it allows in the operation of our warehouse. One feature that we really take advantage of is the ability to quickly snapshot and restore the contents of these virtual drives to and from S3. In addition to being an excellent backup/recovery solution, it also allows us to replicate the entire contents of our data warehouse in order to scale or benchmark new features.
Recently, we had the opportunity to spin up a warehouse with Amazon's shiny new SC1 magnetic backed EBS volume type. Amazon already has three different storage types, the "standard" magnetic backed storage, a general purpose SSD backed storage solution called GP2 and a provisioned high-throughput SSD solution called io1. The SC1 type is a member of their new SC/ST family of magnetic backed volume types that are designed specifically for data warehouse use cases such as ours. They offer a lower price point, stronger performance consistency guarantees and "burst" optimization for long sequential reads that column-oriented databases such as Vertica really love. SC1, specifically, is optimized for colder storage.
EBS standard magnetic has worked pretty well for us in the past. Over the summer, we flirted with GP2s and found that we really liked the consistency the newer generation of SSD backed volumes offered, but couldn't justify the price as our data volume has grown exponentially.
For a quick background, here's our current cluster setup and requirements:
- r3.4xlarge memory optimized instances
- 1 TB EBS standard magnetic volumes in a RAID-0 arrangement (1 TB is the max size of a standard volume)
- Data striped across all nodes in the cluster
- Continuous trickle load of real time data
Based off a month of performance testing, SC1/ST1 hits the sweet spot for us because it's designed with our use case in mind and offers an excellent price per performance tradeoff.
Here's a side-by-side comparison of the peformance distribution of the two different volume types serving a 24 hour sample of read/write queries that take Vertica a median time right at 1 second to serve on standard EBS:
Type | Data Size | 50th pct (ms) | 75th pct | 90th pct | 95th pct | 99th pct |
---|---|---|---|---|---|---|
SC1 | Small | 1,382 | 2,344 | 3,923 | 5,372 | 9,014 |
SC1 | Large | 1,572 | 2,748 | 4,922 | 7,304 | 22,251 |
Std | Small | 798 | 1,433 | 2,593 | 3,823 | 9,697 |
Std | Large | 1,097 | 2,205 | 4,501 | 7,845 | 23,575 |