AWS DynamoDB Fundamentals | A Complete Guide
Huzaifa Asif
Engineering Lead | Solution Architect | Cloud Engineer | FinTech | SaaS | PaaS | AWS | Azure | GCP
Introduction
DynamoDB is a NoSQL database technology created by Amazon that is renowned for its high performance. As opposed to a Relational Database, NoSQL databases are not structured using tables and relations, but instead, data is stored using unique keys. This allows for data to be stored in the form of a JSON document and retrieved quickly by looking up the key. DynamoDB is utilized in a variety of applications such as mobile applications, gaming, ad technology, and other applications that require a fast data layer.
DynamoDB can be adjusted to meet your read and write capacity needs in Provisioned Capacity mode, or you can use On-Demand mode, which requires little to no capacity planning.
List of DynamoDB Key Features
DynamoDB Data Types
DynamoDB supports the following data types:
Scaler
Multi-valued
Document
DynamoDB Table Structure
Partition Key, also known as?HASH, is the primary key for identifying items in a table. This partition key is used as input to an internal hash function which determines in which physical partition the item will be stored. All items in a table with a partition key must have unique values for the partition key, as two items cannot have the same partition key value.
Partition Key and Sort Key, also referred to as a composite primary key, are composed of two attributes. The partition key is used as an input to an internal hash function which determines the physical storage of the item within DynamoDB. All items with the same partition key are stored together in sorted order by their sort key value.
It is possible for two items to have the same partition key value, but they must have different sort key values. In a table with a composite primary key, you can access any item directly by providing the respective partition and sort key values.
A composite primary key provides additional flexibility when querying data.?For example, if you supply only the partition key value, DynamoDB will retrieve all items with that key. You can also provide a value for the partition key and a range of values for the sort key to retrieve a subset of items with the same partition key. For example, a table of movies may have a composite primary key composed of the Producer and Title. You can access any movie in the table directly by providing the Producer and Title values for that item. You can also use the Producer value and a range of values for Title to retrieve a subset of movies by that author.
Secondary Indexes
There are two types of Indexes used in DynamoDB: Local Secondary Index (LSI) and Global Secondary Index (GSI)
LSI (Local Secondary index)
GSI (Global Secondary Index)
DynamoDB Reads and Writes Consistency
DynamoDB can be configured to provide either Eventually Consistent Reads (default) or Strongly Consistent Reads on a per-call basis.
Eventually Consistent?Reads?may not be consistent, but it will be available immediately. Generally, data copies should become consistent within one second.
Strongly Consistent Reads?guarantee that any read operation will always return the most up-to-date version of the data, as it is always read from the leader partition. This ensures that data is never inconsistent, although latency may be higher than with other read methods. Data consistency is guaranteed within 1 second.
Capacity Modes
DynamoDB has two capacity modes, Provisioned and On-Demand. You can switch between these modes once every 24 hours.
1- Provisioned
Provisioned Throughput Capacity is the allocated capacity that your application is able to read or write from a table or index per second. It is best for applications that have predictable or steady traffic and is measured in?Read Capacity Units (RCUs)?and?Write Capacity Units (WCUs).
Auto Scaling?with Provisioned capacity mode should be enabled. This setting allows you to set a minimum and maximum capacity for your DynamoDB table. DynamoDB will automatically adjust the capacity between these values, and will throttle calls that exceed the maximum capacity for an extended period of time.
If you make requests that exceed the capacity that has been provisioned for you, you will get an?Exception: ProvisionedThroughputExceededException (throttling).?Throttling occurs when requests are blocked because the frequency of reads or writes is higher than the thresholds that have been set. For example, this can happen if you exceed the provisioned capacity, if partitions are splitting, or if there is a mismatch between the capacity of a table or index.
2- On-Demand
On-Demand Capacity is the?pay-per-request service, which is especially suited for new or?unpredictable workloads. The throughput is limited only by the default upper limits for a table, up to?40K RCUs?and?40K WCUs. However, if the maximum throughput exceeds double the previous peak capacity within 30 minutes, throttling may occur. Therefore, it is important to note that On-Demand can become costly under certain circumstances.
领英推荐
Calculating Read and Write Capacity Units
Calculating Read Capacity Unit (RCU):
A read capacity unit is equivalent to one strongly consistent read per second or two eventually consistent reads per second for an item of up to 4 KB in size.
How to calculate RCUs for strongly consistent reads
Example:
How to calculate RCUs for eventually consistent reads
Example:
Calculating Write Capacity Unit (WCU):
A write capacity unit is equivalent to one write operation per second, for an item that is up to 1 KB in size.
How to calculate Writes
Example:
DynamoDB Partitions
DynamoDB automatically splits large tables into smaller chunks of data called Partitions to improve read speeds. Partitioning occurs when the table exceeds 10GB of data or when the table exceeds 3000 Read Capacity Units, or 1000 Write Capacity Units. In addition, DynamoDB may split a partition if it detects a hot partition issue to try and evenly distribute the RCUs and WCUs across the Partitions.
Block Diagram For Partition
Partition Formula
DynamoDB Streams
DynamoDB streams provide a log of changes to a table, similar to a transaction log.
A DynamoDB stream is a sequence of changes made to items in an Amazon DynamoDB table. It can be activated to keep track of any modifications to data items in the table.
Your application must connect to a DynamoDB Streams endpoint and issue API requests in order to read and process a stream. Stream records are organized into shards which act as a container for multiple records. Shards are ephemeral and can split into multiple new shards automatically. When a stream is disabled, any open shards will be closed and the data will remain readable for 24 hours. It is important to process the parent shard before the child shard to ensure the stream records are in the correct order. The DynamoDB Streams Kinesis Adapter can handle this automatically.
This feature is useful in a range of scenarios, such as when sending welcome messages to new customers or when updating messages or pictures in a group chat. An endpoint must be maintained for DynamoDB and DynamoDB Streams to ensure data is kept up-to-date.
DynamoDB Accelerator
DAX is a fully managed, in-memory caching system for DynamoDB that runs in a cluster to provide write-through caching.
When to use DAX
When Not to Use DAX
Summary
DynamoDB is a NoSQL database technology created by Amazon for high-performance applications. It replicates data across three Availability Zones and stores three copies of the data using solid-state drives (SSDs). It is infinitely scalable and supports a variety of data types. DynamoDB offers two capacity modes, Provisioned and On-Demand, and has two types of indexes, Local Secondary Index (LSI) and Global Secondary Index (GSI). Additionally, DynamoDB Streams and DynamoDB Accelerator (DAX) are available for specific workloads.