Amazon’s DynamoDB is a great technology: it’s easy to set up and provides all the benefits of an infinitely scalable distributed database without the operational overhead. Unfortunately, DynamoDB can also be very expensive for high volume, high throughput storage. This post will show how we were able to tweak our deduplication schema to drastically decrease our storage costs.
At Localytics we take in several billion datapoints each day. Each datapoint represents a user action taken within an application such as purchasing an item or levelling up in a game. Most of of our datapoints come from mobile devices which may lose connection before a request is acknowledged, leading to repeated sends and duplicate data. In practice, poor wireless connections are common and duplicates may account for up to 10% of our incoming traffic. This means we see hundreds of millions of duplicate datapoints each day.
In principle, deduping is simple. Each time we receive a datapoint, check its UUID against a store of previously seen UUIDs. If the UUID is present, ignore the datapoint; if the UUID hasn’t been seen, add the UUID to the store and process the datapoint. The question is: how can we make deduping in DynamoDB cost effective while processing billions of datapoints each day?
If we take in 5 billion datapoints each day we’ll need to do at least that many daily reads and/or writes to the table. Additionally, we’ll need to store at least 150 billion UUIDs in the table to give us 30 days of dedupe history. Why do we need such a large dedupe window? If an app sends a failed upload while a user is closing it, the upload won’t be resent until the user reopens the app which could be days or weeks in the future.
The read/write costs for the table will be fixed relative to the volume of incoming data because we need to do a read and/or write for each incoming datapoint. However, we will show how storage costs can be controlled with some clever tweaks to table setup.
Naive Schema
If we take the most naive approach and simply store each UUID in Dynamo, we’ll have a table that looks something like this:
UUID |
---|
aaaaaaaabbbbccccddddeeeeeeeeeeee |
aaaaaaaabbbbccccdddd222222222222 |
aaaaaaaabbbbccccdddd777777777777 |
... |
Each row contains a single, indexed, 32-character (128-bit) UUID. When deduping an incoming datapoint, we can take advantage of Dynamo’s Conditional Writes to run the check and write in a single operation. This approach is obviously impractical because it provides no way to age out data over time. Before we address the aging issue, let’s look at the storage costs assuming we take in 5 billion datapoints each day.
Dynamo stores all non-binary data (including numeric types) as UTF-8 strings. UUIDs consist of characters in the standard ASCII range so each of the 32 characters will take up one byte of storage. In addition to the 32 bytes per UUID, there is a 100 byte storage cost per index in the table, making each row in the table account for 132 bytes. Without removing any old data, after one year of collecting 5 billion datapoints each day we’d expect to have 1.8 trillion entries in the table, or 238 TB. At a cost of $0.25/GB/month, the naive table implementation will cost us $60,100/month after one year.
Naive Schema + Aging Out
Obviously we don’t want our table to keep growing indefinitely. We need some kind of age-out scheme. DynamoDB does not yet offer any kind of TTL on their tables, but we could add a unix timestamp field to each entry and run a daily job to delete all entries older than 30 days.
In this case our table might look something like this:
UUID | Timestamp |
---|---|
aaaaaaaabbbbccccddddeeeeeeeeeeee | 1482148833 |
aaaaaaaabbbbccccdddd222222222222 | 1482158856 |
aaaaaaaabbbbccccdddd777777777777 | 1482141133 |
... | ... |
Now the number of entries in our table will be capped at 150 billion. The size of each entry will increase by 10 bytes to account for the 10-character timestamp making them 142 bytes. Now our table size will be roughly 21.3 TB costing $5450/month. This gets gets us close to an order of magnitude in storage savings, though we’re ignoring the increased cost in reads/writes for the daily cleanup. We can still do significantly better without the additional work of manual cleanup.
Prefix/Suffix Schema
In the previous schema, 100 out of every 142 bytes came from the index, making the index responsible for 70% of the storage cost. Storage costs for our data will be fixed based on the number of UUIDs, but if we can limit the size of the index we can create significant savings.
We can limit the index size by take advantage of Dynamo’s Set Types. Instead of giving each UUID its own row, we take the first n bits (more about choosing n below) of each UUID and use that as an index that maps into a set of the suffix bytes. Now our table looks like this:
Prefix | Suffix Set |
---|---|
aaaaaab | {bbbccccddddeeeeeeeeeeee, bbbccccdddd222222222222, bbbccccdddd777777777777, ... } |
bbbbbbc | {cccaaaaddddeeeeeeeeeeee, cccaaaadddd222222222222, cccaaaadddd777777777777, ... } |
ccccccd | {dddaaaaddddeeeeeeeeeeee, dddaaaadddd222222222222, dddaaaadddd777777777777, ... } |
... | ... |
Dynamo’s conditional writes again allow us to do a single test-and-set operation, conditionally inserting a new suffix into the suffix set. The indexed prefix table structure will limit the number of entries in the table to 2^n. Say n is 33, or the first 9 characters in the UUID, now our table will only ever have 8.6 billion rows. Thus, even without age-out, after one year we are still only paying the 100 byte index cost 8.6 billion times instead of 1.8 trillion.
Now the index size for the table is fixed at 860 billion bytes (800GB). With a year’s worth of data, the storage size will then be 9 bytes for each of the 8.6 billion prefixes (72 GB) plus 1.8 trillion suffixes of 23 bytes (38.6TB). Now only 2% of the storage volume is in the index, down from 70%. After a year, with 1.8 trillion UUIDs stored, the cost for this table would be $9860/month. This gives us significant savings over the naive implementation but is still double the storage cost of the naive schema with aging out.
Prefix/Suffix Schema + Aging Out
This approach saves a good amount of space and money but we’re still growing the table indefinitely and still don’t have a way to age out the data. The suffix sets will grow indefinitely as we continue to dedupe data. To begin aging out the data, we can create a new suffix column on the entry for each month. To keep 30 days of data and allow aging out, we create a column in the database for each month, deleting two months previous when it is present.
The table looks something like this:
Prefix | Nov_2016 | Dec_2016 | Jan_2017 |
---|---|---|---|
aaaaaab | {bbbccccddddeeeeeeeeeeee, bbbccccdddd222222222222, ... } | { bbbccccdddd777777777777, ... } | |
bbbbbbc | {cccaaaaddddeeeeeeeeeeee, cccaaaadddd222222222222, ... } | { cccaaaadddd777777777777, ... } | |
ccccccd | {dddaaaaddddeeeeeeeeeeee, dddaaaadddd777777777777, ... } | { dddaaaadddd222222222222, ... } | |
... | ... | ... | ... |
For example, on January 1, 2017, we’ll insert a UUID into the suffix set of the Jan_2017 column only if it doesn’t already exist in the Dec_2016 suffix set. As part of the same operation, we’ll also delete the Nov_2016 column if it’s present for the row. All of these operations occur on the same row so they can be done as a single conditional write operation in Dynamo.
Now we’re both aging data out and limiting our indexes by splitting the UUIDs into prefixes and suffixes. The last piece is the question: how long should the prefixes be? Dynamo charges for writes in write units of 1KB. Any object larger than 1KB will cost at least two write units, so it’s beneficial to try to keep each object in the table under 1KB in size. The randomness of UUIDs ensures we’ll get an even distribution of suffixes across our prefix indexes so we can calculate the average number of suffixes in each column as 2 * (# of UUIDs per month) / 2^n.
As an example, to ensure the vast majority of entries are under 1KB, we need to ensure that the average size remains under 800 bytes. If our suffix strings are 23 characters long, we can safely store 35 suffixes over two months and would want to choose n such that (300,000,000,000 / 2^n ) < 35, making n = 33 and putting a hard limit on the number of rows in the table at 8.6 billion.
As before, our index is 800GB, our 8.6 billion 9 byte prefixes take up 72 GB, but now we only ever store two months worth of 23-byte suffixes at 6.27 TB. This makes the total cost of this implementation $1820/month.
Summary
Finally, we can compare the cost for all three approaches and see that we’ve reduced storage costs by almost two orders of magnitude over the naive approach.
Naive | Naive + Age Out | Prefix/Suffixes | Prefix/Suffixes + Age Out |
---|---|---|---|
|
|
|
|
|
|
|
|
238 TB | 21.3 TB | 38.5 TB | 7.13 TB |
$60,100 / month | $5450 / month | $9860 / month | $1824 / month |
In summary, if you want to save on storage costs with Dynamo, you should ask yourself:
- Can I take advantage of conditional writes?
- Can I bound my table to limit index size?
- Can I take advantage of sets?
- Can I build the table to allow unneeded data to age out?
- Can I structure my data so most records are just under the 1KB write limit?