登录查看更多内容

Amazon DynamoDB — How it Reads/Writes Data Under the Hood

Asim Hafeez

Senior Software Engineer | Lead | AI | LLMs | System Design | Blockchain | AWS

发布日期: 2023年2月6日

What is DynamoDB?

DynamoDB offered by Amazon Web Services (AWS), is a fully managed NoSQL database that guarantees quick and consistent performance and provides effortless scalability. Its data model is flexible that enables the storage and retrieval of any type of data, accommodating both document and key-value structures. Its top priority is security, which includes network isolation. Its low-latency data access makes it a preferred option for a variety of applications to perform better at scale.

We are going to look into that how DynamoDB stores and retrieves data under the hood. For this, we are going to discuss the two requests offered by DynamoDB.

GET Request
PUT Request

Later on, we will also see how a?table?having multiple rows of data is going to be stored in DynamoDB.

GET Request

When an application requests data from DynamoDB, the request is directed through the network regardless of its origin, whether it’s from a VPC, public network, or EC2. DynamoDB handles the request and returns the data without any consideration of the source. The network forwards the request to DynamoDB, and it is fulfilled with the requested data.

No alt text provided for this image — Simple GET Request to DynamoDB

Upon passing through the network, the request then reaches a stateless component known as the?Request Router. The exact Request Router that handles the request does not matter as they are interchangeable. The first step for the Request Router is to check the requester’s authorization through the?Authentication service. If the requester is authorized or authenticated, the request continues, otherwise, it returns an error indicating either unauthorized or unauthenticated.

The Authentication Service utilized by DynamoDB is the same used throughout AWS. It involves a policy written in JSON format that specifies what actions are allowed and what are not for the requester.

After successful authentication and authorization, the Request router is set to send the request to the Nodes where the data is stored. But, before that, there is another service known as?Partition Metadata System?connected to the Request router.

The Partition Metadata System contains information about the partition, including the leader node within it. There are several storage nodes present in each availability zone, a topic that will be discussed later. To determine the master or leader node among the storage nodes, DynamoDB uses the?Paxos-Algorithm?to elect a leader.

The Request Router forwards the request to one of the storage nodes to balance the load and returns the requested data to the application.

Because of its partition tolerance, DynamoDB does not guarantee the most consistent read. The Request Router may route the request to a node that may not have the most recent data, potentially returning outdated information. This is why DynamoDB provides Eventually Consistent Reads, although the likelihood of eventually consistent data is low due to network issues. However, in most cases, the data is consistent.

PUT Request

When an application wants to store data in DynamoDB, the process is similar to a GET request, but with some differences at the end. The request is sent through a network, regardless of whether it is a public network, VPC, or EC2, and it reaches the Request Router. As previously discussed, the Request Router then forwards the request to the Authentication Service for authorization and authorization.

DynamoDB increases durability by replicating data to two additional storage nodes when it is sent to a storage node. To improve latency, DynamoDB immediately returns a flag indicating that the data has been stored once it has confirmed that the replication has occurred, without requiring the user to wait.

领英推荐

SELECT News From Yugabyte - Nov 22

Yugabyte 2 年前

2025 - Week 6 (3 Feb - 9 Feb)

Ankur Patel 1 个月前

Amazon S3: First look and simple demo to upload image…

Saigon Technology - Accelerate Software Development 1 年前

For a storage node to become a leader, it must have all the updates and modifications active. To perform a conditional put, the leader must be aware of the correct value for comparison. Every time a PUT request is initiated, it is first directed to the leader node for data storage and then the replication to other peer nodes takes place.

Each storage node maintains the heartbeats of the other nodes. If the heartbeat of a storage node stops, it is assumed that the node has gone down. The remaining nodes then determine who will become the new leader by evaluating each other against the necessary criteria, and the node that satisfies the requirements becomes the new leader.

To achieve maximum availability, data must be stored in multiple availability zones. There are numerous request routers and storage nodes in each availability zone to handle incoming requests. When a request travels through the network, it is first directed to the nearest availability zone and then to a random request router, which is stateless and therefore doesn’t matter which router it reaches.

After reaching a request router, the request is redirected to the leader storage node. Once the data is stored in the leader, it is then replicated to the other peer nodes in different availability zones through asynchronous connections. When the data is stored on at least two nodes, including the leader, a successful response is sent back to the application, confirming that the data has been stored in DynamoDB.

Let’s take a closer look at how the DynamoDB table is stored in these storage nodes.

Table

Imagine we have a table containing multiple rows of user information. Let’s examine how this data is distributed among the various storage nodes.

DynamoDB employs a secret hash function on the primary key of each table to produce a unique hash key. The advantage of this hash function is that it consistently generates the same hash value for the same data.

Once the hash of each primary key has been generated, DynamoDB organizes the data based on the hashes and assigns each hash to a specific partition for storage.

DynamoDB creates partitions for the table and distributes them to storage nodes in each availability zone. The selection of the leader storage node is determined by the?Pexos-Algorithm?that runs among the storage nodes.

As we have discussed earlier, in DynamoDB, the data is stored across multiple storage nodes, leading to the possibility of an Eventually Consistent Read. This occurs when a GET request is sent to a node that has not yet been updated, resulting in inconsistent data being returned. However, this is a rare occurrence.

A PUT request is considered complete when data has been successfully stored in two of the nodes, thereby improving both latency and durability.

Summary

In this article, the process of GET and PUT requests was explored. When a GET request is made, it is sent via a network to a request router. After authentication and authorization, it is directed to a storage node, where the requested data is retrieved and returned to the application. The process for POST requests is similar, but it is important to know the leader storage node, as the data must first be stored there before being replicated to peer nodes. Once the data has been stored on two nodes, the PUT request is considered complete. We also looked at the storage of tables in DynamoDB.

If this article provided insight into the inner workings of DynamoDB and was helpful to you, please consider giving it a clap.

Connect with Asim: AI Focus

1,296 位关注者

Jose Quijada

Senior Software Engineer at P+

11 个月

Exactly how does the storage node handle the read request? Is it a collection of "worker" threads that handle reads directed to a storage node? Is it a single worker per table on that storage node? Can you please expand/elaborate on that? Thank you.

Huzaifa Asif

2 年

Great article ??

1 次回应

查看更多评论

要查看或添加评论，请登录

Asim Hafeez的更多文章

Architectures and Models of Generative AI

2024年10月28日

Architectures and Models of Generative AI

Generative AI is shaping the future of technology by enabling machines to mimic human creativity and intelligence…
Building a YouTube AI Q&A Bot with Langchain, Llama, and?Python

2024年10月21日

Building a YouTube AI Q&A Bot with Langchain, Llama, and?Python

Asking questions about specific parts of a YouTube video and getting quick, precise answers can save time and enhance…
How Vector Databases and Embeddings Power?AI

2024年10月15日

How Vector Databases and Embeddings Power?AI

Artificial intelligence (AI) has significantly advanced in recent years, largely thanks to innovations like vector…
Introduction to Function Calling with?LLMs

2024年10月7日

Introduction to Function Calling with?LLMs

As artificial intelligence gets smarter, Large Language Models (LLMs) are changing the way we interact with technology.…
Build a RAG App with Langchain and Node.js: Chat with Your PDF

2024年9月30日

Build a RAG App with Langchain and Node.js: Chat with Your PDF

Today, we’ll learn how to build a RAG application that lets you chat with your PDF files. Using Langchain and Node.

6 条评论
Use Llama 3.1 as Your Private?LLM

2024年9月26日

Use Llama 3.1 as Your Private?LLM

This article will guide you through setting up Llama 3.1 as a local large language model on your machine.
Use OpenAI with Node.js

2024年9月24日

Use OpenAI with Node.js

In this article, we’ll explore how to build a simple yet powerful chatbot using Node.js and the OpenAI API.
What are Large Language Models (LLMs)? How do they work?

2024年9月19日

What are Large Language Models (LLMs)? How do they work?

In recent years, there has been significant buzz in the tech industry about Large Language Models (LLMs), particularly…
Configure and Implement AWS Cognito using?Nestjs

2024年3月26日

Configure and Implement AWS Cognito using?Nestjs

When I had to set up AWS Cognito for the first time, I found it pretty tricky. I looked everywhere for an…

5 条评论
Building Web Services with NestJS, TypeORM, and PostgreSQL

2024年2月27日

Building Web Services with NestJS, TypeORM, and PostgreSQL

The combination of NestJS, TypeORM, and PostgreSQL provides a scalable, and efficient stack for developing web…

2 条评论

See all articles

Amazon DynamoDB — How it Reads/Writes Data Under the Hood

Asim Hafeez

Senior Software Engineer | Lead | AI | LLMs | System Design | Blockchain | AWS

What is DynamoDB?

GET Request

PUT Request

领英推荐

Table

Summary

Connect with Asim: AI Focus

1,296 位关注者

Asim Hafeez的更多文章

社区洞察

其他会员也浏览了

5 Tips To Help You Save On DynamoDB Costs

Power of Amazon Aurora | Optimizing Your Database Performance

Why is DynamoDB AWSome?

AWS update of Week 22 (29May-4Jun)

Hands-On with DynamoDB

Microsoft Azure Cosmos Database For Large Scale Applications

Azure Storage Queues

Amazon’s DynamoDB-Performance

Make Data Reports Easier with Amazon S3 and AWS Glue

Daily AWS Solution Architect questions #2

What is DynamoDB?

GET Request

PUT Request

领英推荐

Table

Summary

Connect with Asim: AI Focus

1,296 位关注者

Asim Hafeez的更多文章

Architectures and Models of Generative AI

Building a YouTube AI Q&A Bot with Langchain, Llama, and?Python

How Vector Databases and Embeddings Power?AI

Introduction to Function Calling with?LLMs

Build a RAG App with Langchain and Node.js: Chat with Your PDF

Use Llama 3.1 as Your Private?LLM

Use OpenAI with Node.js

What are Large Language Models (LLMs)? How do they work?

Configure and Implement AWS Cognito using?Nestjs

Building Web Services with NestJS, TypeORM, and PostgreSQL

社区洞察

其他会员也浏览了

5 Tips To Help You Save On DynamoDB Costs

Power of Amazon Aurora | Optimizing Your Database Performance

Why is DynamoDB AWSome?

AWS update of Week 22 (29May-4Jun)

Hands-On with DynamoDB

Microsoft Azure Cosmos Database For Large Scale Applications

Azure Storage Queues

Amazon’s DynamoDB-Performance

Make Data Reports Easier with Amazon S3 and AWS Glue

Daily AWS Solution Architect questions #2