Learning About Dynamo

The stuff I've learned about dynamo recently.

Fantastic video by Alex DeBrie. Starts with very elementary explanation of dynamo as a whole and then drills down to provide some examples about modelling for one to many relationships. Found the middle to later parts super useful personally as one of the things I feel I'm lacking the most when working with dynamo are application patterns for dealing with modelling.

I've also heard this stated before from other sources but my favorite quote from the video would have to be

"Schema-less does not mean that there is no schema enforcement. But rather that you have to enforce the schema at the application layer".

This concept is easy to miss or omit while building applications that are dynamo backed. The freedom felt by not having your storage system forcing you to define the shape of your data is liberating. But it does not come without a cost. Having seen fully schemaless applications at play. More often than not you end up with a pile of scripts moving abstract bits of data back and forth between an API and dynamo with little to no evidence of what it is that's moving. So tracking head from tail becomes an arduous task.

To that end, you'll say. "Well what do you do then?" And from my reading, it appears we end up back at Object Oriented Programming 🎉. [DynamoDb Toolbox] seems like a moderately popular library for accomplishing this in node. I've also talked with a friend who makes heavy use of it and he had no complaints. A newer player in the space is TypeDORM taking heavy inspiration from TypeORM and using DynamoDb Toolbox under the hood. Personally I've yet to use this tool but it's for sure on my short list to try out.

Whatever your runtime environment may be, more than likely you're going to need to develop your entity definitions in your code base, and then use some combination of either 3rd party libraries or aws libraries to hydrate those definitions for use via API's or other services.

Another core concept that diverges from traditional RDBMS backed systems is the storage/access to constant values. While not a requirement of those systems, it is a common to pattern to put constants within a table. So that those values can be derived via joins at the database layer instead of accessed via application code. In dynamo this is a pretty major anti-pattern as joins don't exist. You probably could hack this into a single table design by having them exists as items with like a PK of the constant group name and an SK of the constant key. But you would have to do essentially an in memory join with multiple queries. Where instead you could get the same result if you just defined a static object that holds those values and referenced them after you perform a query to get the item.

A possible footgun you could accidentally build while working with dynamo is using a common delimiter within your PK's and SK's. Context: while working with dynamo its common to concatenate data points for use within one of your items keys. e.g. a user data type might have a PK of the businessId that they belong to, and then a concatenated string for the SK of something like userId+createdAt. The footgun comes into play involving what you use to separate those values in the SK. Lets say you're using UUID's for the userId. Typically UUID's are delimited by dashes "-". If you also delimit the two values with a dash, you'll probably run into some issues at some point, as your SK will look something like "1234-1234-1234-01-12-2022T000.0". Its less of a problem in this example because you can predetermine how many of the dashes in each value. But in other cases, its possible to delimit with a symbol that could unexpectedly appear in your data, which could cause issues when you try to find a value with too many delimiters in it.

Something important to consider when designing your keys, is relative uniqueness of whatever value you assign as your PK. Since PK's 1:1 translate to partitions, there's a Goldilocks zone of having a lot of unique PK's so that your data is well dispersed, but still having enough things related to that data point so that you can work with related things. AKA you don't want 1 item per partition, but you also don't want to only have 1 partition.

Find me on Twitter | LinkedIn | Twitch

Sponsor me on Github