Amazon DynamoDB — How it Reads/Writes Data Under the Hood
Amazon DynamoDB — How it Reads/Writes Data Under the Hood

Amazon DynamoDB — How it Reads/Writes Data Under the Hood

What is DynamoDB?

DynamoDB offered by Amazon Web Services (AWS), is a fully managed NoSQL database that guarantees quick and consistent performance and provides effortless scalability. Its data model is flexible that enables the storage and retrieval of any type of data, accommodating both document and key-value structures. Its top priority is security, which includes network isolation. Its low-latency data access makes it a preferred option for a variety of applications to perform better at scale.


We are going to look into that how DynamoDB stores and retrieves data under the hood. For this, we are going to discuss the two requests offered by DynamoDB.

  1. GET Request
  2. PUT Request

Later on, we will also see how a?table?having multiple rows of data is going to be stored in DynamoDB.

GET Request

When an application requests data from DynamoDB, the request is directed through the network regardless of its origin, whether it’s from a VPC, public network, or EC2. DynamoDB handles the request and returns the data without any consideration of the source. The network forwards the request to DynamoDB, and it is fulfilled with the requested data.

No alt text provided for this image
Simple GET Request to DynamoDB

Upon passing through the network, the request then reaches a stateless component known as the?Request Router. The exact Request Router that handles the request does not matter as they are interchangeable. The first step for the Request Router is to check the requester’s authorization through the?Authentication service. If the requester is authorized or authenticated, the request continues, otherwise, it returns an error indicating either unauthorized or unauthenticated.

No alt text provided for this image
Request Router

The Authentication Service utilized by DynamoDB is the same used throughout AWS. It involves a policy written in JSON format that specifies what actions are allowed and what are not for the requester.

After successful authentication and authorization, the Request router is set to send the request to the Nodes where the data is stored. But, before that, there is another service known as?Partition Metadata System?connected to the Request router.

No alt text provided for this image
Partition Metadata System

The Partition Metadata System contains information about the partition, including the leader node within it. There are several storage nodes present in each availability zone, a topic that will be discussed later. To determine the master or leader node among the storage nodes, DynamoDB uses the?Paxos-Algorithm?to elect a leader.

No alt text provided for this image
Storage Nodes

The Request Router forwards the request to one of the storage nodes to balance the load and returns the requested data to the application.

No alt text provided for this image
Complete GET Request to DynamoBD

Because of its partition tolerance, DynamoDB does not guarantee the most consistent read. The Request Router may route the request to a node that may not have the most recent data, potentially returning outdated information. This is why DynamoDB provides Eventually Consistent Reads, although the likelihood of eventually consistent data is low due to network issues. However, in most cases, the data is consistent.

PUT Request

When an application wants to store data in DynamoDB, the process is similar to a GET request, but with some differences at the end. The request is sent through a network, regardless of whether it is a public network, VPC, or EC2, and it reaches the Request Router. As previously discussed, the Request Router then forwards the request to the Authentication Service for authorization and authorization.

DynamoDB increases durability by replicating data to two additional storage nodes when it is sent to a storage node. To improve latency, DynamoDB immediately returns a flag indicating that the data has been stored once it has confirmed that the replication has occurred, without requiring the user to wait.

No alt text provided for this image
PUT Request to DynamoDB

For a storage node to become a leader, it must have all the updates and modifications active. To perform a conditional put, the leader must be aware of the correct value for comparison. Every time a PUT request is initiated, it is first directed to the leader node for data storage and then the replication to other peer nodes takes place.

Each storage node maintains the heartbeats of the other nodes. If the heartbeat of a storage node stops, it is assumed that the node has gone down. The remaining nodes then determine who will become the new leader by evaluating each other against the necessary criteria, and the node that satisfies the requirements becomes the new leader.

To achieve maximum availability, data must be stored in multiple availability zones. There are numerous request routers and storage nodes in each availability zone to handle incoming requests. When a request travels through the network, it is first directed to the nearest availability zone and then to a random request router, which is stateless and therefore doesn’t matter which router it reaches.

No alt text provided for this image

After reaching a request router, the request is redirected to the leader storage node. Once the data is stored in the leader, it is then replicated to the other peer nodes in different availability zones through asynchronous connections. When the data is stored on at least two nodes, including the leader, a successful response is sent back to the application, confirming that the data has been stored in DynamoDB.

Let’s take a closer look at how the DynamoDB table is stored in these storage nodes.

Table

Imagine we have a table containing multiple rows of user information. Let’s examine how this data is distributed among the various storage nodes.

User Table in DynamoDB
User Table in DynamoDB

DynamoDB employs a secret hash function on the primary key of each table to produce a unique hash key. The advantage of this hash function is that it consistently generates the same hash value for the same data.

No alt text provided for this image
Hash of every Primary Key

Once the hash of each primary key has been generated, DynamoDB organizes the data based on the hashes and assigns each hash to a specific partition for storage.

No alt text provided for this image
User table partitions

DynamoDB creates partitions for the table and distributes them to storage nodes in each availability zone. The selection of the leader storage node is determined by the?Pexos-Algorithm?that runs among the storage nodes.

No alt text provided for this image

As we have discussed earlier, in DynamoDB, the data is stored across multiple storage nodes, leading to the possibility of an Eventually Consistent Read. This occurs when a GET request is sent to a node that has not yet been updated, resulting in inconsistent data being returned. However, this is a rare occurrence.

A PUT request is considered complete when data has been successfully stored in two of the nodes, thereby improving both latency and durability.

Summary

In this article, the process of GET and PUT requests was explored. When a GET request is made, it is sent via a network to a request router. After authentication and authorization, it is directed to a storage node, where the requested data is retrieved and returned to the application. The process for POST requests is similar, but it is important to know the leader storage node, as the data must first be stored there before being replicated to peer nodes. Once the data has been stored on two nodes, the PUT request is considered complete. We also looked at the storage of tables in DynamoDB.

If this article provided insight into the inner workings of DynamoDB and was helpful to you, please consider giving it a clap.

Jose Quijada

Senior Software Engineer at P+

11 个月

Exactly how does the storage node handle the read request? Is it a collection of "worker" threads that handle reads directed to a storage node? Is it a single worker per table on that storage node? Can you please expand/elaborate on that? Thank you.

回复
Huzaifa Asif

Engineering Lead | Solution Architect | Cloud Engineer | FinTech | SaaS | PaaS | AWS | Azure | GCP

2 年

Great article ??

要查看或添加评论,请登录

Asim Hafeez的更多文章

社区洞察

其他会员也浏览了