登录查看更多内容

AWS DynamoDB Fundamentals | A Complete Guide

Huzaifa Asif

Engineering Lead | Solution Architect | Cloud Engineer | FinTech | SaaS | PaaS | AWS | Azure | GCP

发布日期: 2023年2月6日

Introduction

DynamoDB is a NoSQL database technology created by Amazon that is renowned for its high performance. As opposed to a Relational Database, NoSQL databases are not structured using tables and relations, but instead, data is stored using unique keys. This allows for data to be stored in the form of a JSON document and retrieved quickly by looking up the key. DynamoDB is utilized in a variety of applications such as mobile applications, gaming, ad technology, and other applications that require a fast data layer.

DynamoDB can be adjusted to meet your read and write capacity needs in Provisioned Capacity mode, or you can use On-Demand mode, which requires little to no capacity planning.

List of DynamoDB Key Features

DynamoDB replicates data across three Availability Zones in a given region, utilizing solid-state drives (SSDs) to store three copies of the data.
Tables are composed of rows which are referred to as Items, and each Item is made up of Attributes which are displayed as columns.
A system that can be infinitely scaled for read-write input/output operations, optimally utilizing IOPS-enabled solid-state drives.
Data stored in DynamoDB can be securely backed up to Amazon S3 for long-term storage.
Integration of Amazon Machine Learning with other AWS services such as Elastic MapReduce (Amazon EMR), Data Pipeline, and Amazon Kinesis is possible. This provides users with the ability to take advantage of the scalability and performance of these services to process and manage data quickly and efficiently.
The pricing model for Amazon DynamoDB is pay-per-use, meaning that customers will only pay for the hardware and services they actually use, rather than paying for resources they don’t need.
Security and access control can be managed using the AWS Identity and Access Management (IAM) service. This service helps to create and manage users, groups, and permissions to control access to AWS resources.
For those looking for enterprise-grade features, a robust SLA, monitoring tools, and a private VPN are essential components. These components are designed to enable businesses to run smoothly and securely.

DynamoDB Data Types

DynamoDB supports the following data types:

Scaler

Number (including both integer and floating point)
String
Binary
Boolean
Null

Multi-valued

String Set [“Steph”, “Klay”, “Kevin”],
Number Set [23, 12, -3, 34.3]
Binary Set [“uKASDB”, “ASDDSFFF”]
Binaries must be encoded using base64 before being sent to DynamoDB. A binary set store's unique binary attributes. Binary sets are useful for representing a collection of unique binary values, such as images, documents, or any other binary data. Binary sets allow developers to store and retrieve binary data in an efficient and organized manner.

Document

List (DynamoDB List data type is a data type that allows users to store an ordered collection of items, similar to an array.)
Map (DynamoDB Map data type is an unordered collection of key-value pairs that can be used to store and retrieve data)

DynamoDB Table Structure

Partition Key, also known as?HASH, is the primary key for identifying items in a table. This partition key is used as input to an internal hash function which determines in which physical partition the item will be stored. All items in a table with a partition key must have unique values for the partition key, as two items cannot have the same partition key value.

Partition Key and Sort Key, also referred to as a composite primary key, are composed of two attributes. The partition key is used as an input to an internal hash function which determines the physical storage of the item within DynamoDB. All items with the same partition key are stored together in sorted order by their sort key value.

It is possible for two items to have the same partition key value, but they must have different sort key values. In a table with a composite primary key, you can access any item directly by providing the respective partition and sort key values.

A composite primary key provides additional flexibility when querying data.?For example, if you supply only the partition key value, DynamoDB will retrieve all items with that key. You can also provide a value for the partition key and a range of values for the sort key to retrieve a subset of items with the same partition key. For example, a table of movies may have a composite primary key composed of the Producer and Title. You can access any movie in the table directly by providing the Producer and Title values for that item. You can also use the Producer value and a range of values for Title to retrieve a subset of movies by that author.

Secondary Indexes

There are two types of Indexes used in DynamoDB: Local Secondary Index (LSI) and Global Secondary Index (GSI)

LSI (Local Secondary index)

Supports strongly or eventual consistency reads.
Can only be created together with the base table and cannot be modified or deleted unless also deleting the table.
Only Composite.
Maximum of 10GB per partition.
Shares capacity units with the base table.
Must have the same Partition Key (PK) as the base table.

GSI (Global Secondary Index)

Offers only eventual consistency reads, but can create, modify, or delete at any time.
Supports both Simple and Composite keys.
Can have any attribute such as Primary Key (PK) or Secondary Key (SK).
No size restriction per partition.
Has its own capacity settings, not shared with the base table

DynamoDB Reads and Writes Consistency

DynamoDB can be configured to provide either Eventually Consistent Reads (default) or Strongly Consistent Reads on a per-call basis.

Eventually Consistent?Reads?may not be consistent, but it will be available immediately. Generally, data copies should become consistent within one second.

Strongly Consistent Reads?guarantee that any read operation will always return the most up-to-date version of the data, as it is always read from the leader partition. This ensures that data is never inconsistent, although latency may be higher than with other read methods. Data consistency is guaranteed within 1 second.

Capacity Modes

DynamoDB has two capacity modes, Provisioned and On-Demand. You can switch between these modes once every 24 hours.

1- Provisioned

Provisioned Throughput Capacity is the allocated capacity that your application is able to read or write from a table or index per second. It is best for applications that have predictable or steady traffic and is measured in?Read Capacity Units (RCUs)?and?Write Capacity Units (WCUs).

Auto Scaling?with Provisioned capacity mode should be enabled. This setting allows you to set a minimum and maximum capacity for your DynamoDB table. DynamoDB will automatically adjust the capacity between these values, and will throttle calls that exceed the maximum capacity for an extended period of time.

If you make requests that exceed the capacity that has been provisioned for you, you will get an?Exception: ProvisionedThroughputExceededException (throttling).?Throttling occurs when requests are blocked because the frequency of reads or writes is higher than the thresholds that have been set. For example, this can happen if you exceed the provisioned capacity, if partitions are splitting, or if there is a mismatch between the capacity of a table or index.

2- On-Demand

On-Demand Capacity is the?pay-per-request service, which is especially suited for new or?unpredictable workloads. The throughput is limited only by the default upper limits for a table, up to?40K RCUs?and?40K WCUs. However, if the maximum throughput exceeds double the previous peak capacity within 30 minutes, throttling may occur. Therefore, it is important to note that On-Demand can become costly under certain circumstances.

领英推荐

How to Improve the Performance of DynamoDB in General…

Centizen, Inc. 5 个月前

Data Virtualization for Google Bigquery with a…

Lyftrondata 6 个月前

Azure Cosmos DB’s Advantages Over Standard Databases

Bizmetric 5 个月前

Calculating Read and Write Capacity Units

Calculating Read Capacity Unit (RCU):

A read capacity unit is equivalent to one strongly consistent read per second or two eventually consistent reads per second for an item of up to 4 KB in size.

How to calculate RCUs for strongly consistent reads

Round data up to the nearest 4
Divide data by 4
Multiply the result by the number of reads

Example:

20 reads at 40KB per item. (40/4) x 20 = 200 RCUs
5 reads at 6KB per item. (8/4) x 5= 10 RCUs

How to calculate RCUs for eventually consistent reads

Round data up to the nearest 4
Divide data by 4
Multiply by the number of reads
Divide the final number by 2
Round up to the nearest whole number

Example:

20 reads at 40KB per item. ( (40/4) x 20 ) / 2 = 100 RCUs
15 reads at 10KB per item. ( (12/4) x 15 ) / 2 = 23 RCUs

Calculating Write Capacity Unit (WCU):

A write capacity unit is equivalent to one write operation per second, for an item that is up to 1 KB in size.

How to calculate Writes

Round data up to the nearest whole number.
Multiply by the number of writes

Example:

20 writes at 40KB per item. 40 x 20 = 800 WCUs
78 writes at 1KB per item. 1 x 78 = 78 WCUs
34 writes at 500 BYTES per item. 1 x 34 = 34 WCUs

DynamoDB Partitions

DynamoDB automatically splits large tables into smaller chunks of data called Partitions to improve read speeds. Partitioning occurs when the table exceeds 10GB of data or when the table exceeds 3000 Read Capacity Units, or 1000 Write Capacity Units. In addition, DynamoDB may split a partition if it detects a hot partition issue to try and evenly distribute the RCUs and WCUs across the Partitions.

Block Diagram For Partition

Partition Formula

DynamoDB Streams

DynamoDB streams provide a log of changes to a table, similar to a transaction log.

A DynamoDB stream is a sequence of changes made to items in an Amazon DynamoDB table. It can be activated to keep track of any modifications to data items in the table.

Your application must connect to a DynamoDB Streams endpoint and issue API requests in order to read and process a stream. Stream records are organized into shards which act as a container for multiple records. Shards are ephemeral and can split into multiple new shards automatically. When a stream is disabled, any open shards will be closed and the data will remain readable for 24 hours. It is important to process the parent shard before the child shard to ensure the stream records are in the correct order. The DynamoDB Streams Kinesis Adapter can handle this automatically.

This feature is useful in a range of scenarios, such as when sending welcome messages to new customers or when updating messages or pictures in a group chat. An endpoint must be maintained for DynamoDB and DynamoDB Streams to ensure data is kept up-to-date.

DynamoDB Accelerator

DAX is a fully managed, in-memory caching system for DynamoDB that runs in a cluster to provide write-through caching.

Reads are eventually consistent.
Requests to the cluster are evenly distributed among its nodes.
DAX can dramatically reduce read response times to microseconds

When to use DAX

Applications that need the quickest response times.
Applications that read items regularly
Apps that are read-intensive.

When Not to Use DAX

Applications that require strongly consistent reads.
Applications that do not need microsecond read response times
Write-intensive applications or those with minimal read activity
If you don’t need DAX, consider ElastiCache as an alternative

Summary

DynamoDB is a NoSQL database technology created by Amazon for high-performance applications. It replicates data across three Availability Zones and stores three copies of the data using solid-state drives (SSDs). It is infinitely scalable and supports a variety of data types. DynamoDB offers two capacity modes, Provisioned and On-Demand, and has two types of indexes, Local Secondary Index (LSI) and Global Secondary Index (GSI). Additionally, DynamoDB Streams and DynamoDB Accelerator (DAX) are available for specific workloads.

Modern System Design

5,844 位关注者

要查看或添加评论，请登录

Huzaifa Asif的更多文章

Caching with Redis & Memcached

2024年2月26日

Caching with Redis & Memcached

1. Introduction In the realm of web development, enhancing application performance through efficient data retrieval is…

3 条评论
Monolithic vs Microservices Architecture | Case Study of Netflix and Atlassian

2023年9月29日

Monolithic vs Microservices Architecture | Case Study of Netflix and Atlassian

Introduction In the fiercely competitive world of software architecture, two contenders vie for dominance: Monolithic…

4 条评论
Indexing and Hashing in DBMS

2023年8月30日

Indexing and Hashing in DBMS

Introduction In the ever-evolving world of data management, efficient access and retrieval of information lie at the…
Distributed Systems: Exploring Architecture Styles

2023年8月24日

Distributed Systems: Exploring Architecture Styles

In today’s rapidly evolving technological landscape, distributed systems have emerged as a cornerstone of modern…

2 条评论
Mastering Dynamic Programming

2023年7月28日

Mastering Dynamic Programming

Introduction Dynamic Programming is a powerful technique that can revolutionize your algorithms by making them more…
The Definitive Guide to Site Reliability Engineering: Ensuring Uninterrupted Operations and Optimal Performance

2023年7月18日

The Definitive Guide to Site Reliability Engineering: Ensuring Uninterrupted Operations and Optimal Performance

Introduction In the ever-evolving digital world, where system failures and downtime can have severe consequences, Site…
Load Testing: Simulating High Traffic for Application Performance Analysis

2023年7月15日

Load Testing: Simulating High Traffic for Application Performance Analysis

In today’s digital world, where websites and applications play a crucial role in business success, it is essential to…
5 Machine Learning Classification Algorithms

2023年7月11日

5 Machine Learning Classification Algorithms

Classification, a crucial task in natural language processing, heavily relies on machine learning algorithms. Numerous…

4 条评论
Uploading Large Files to AWS-S3 with Lightning Fast Speed | Parallel Chunk Upload

2023年5月15日

Uploading Large Files to AWS-S3 with Lightning Fast Speed | Parallel Chunk Upload

When it comes to uploading large files to AWS-S3, there are two main techniques that can be used: creating chunks on…

3 条评论
How AI is Revolutionizing Content Creators & Developers

2023年4月29日

How AI is Revolutionizing Content Creators & Developers

The emergence of artificial intelligence has paved the way for significant advancements in various industries…

2 条评论

See all articles

Introduction

List of DynamoDB Key Features

DynamoDB Data Types

Scaler

Multi-valued

Document

DynamoDB Table Structure

Secondary Indexes

LSI (Local Secondary index)

GSI (Global Secondary Index)

DynamoDB Reads and Writes Consistency

Capacity Modes

1- Provisioned

2- On-Demand

领英推荐

Calculating Read and Write Capacity Units

Calculating Read Capacity Unit (RCU):

Calculating Write Capacity Unit (WCU):

DynamoDB Partitions

Block Diagram For Partition

Partition Formula

DynamoDB Streams

DynamoDB Accelerator

When to use DAX

When Not to Use DAX

Summary

Modern System Design

5,844 位关注者

Huzaifa Asif的更多文章

Caching with Redis & Memcached

Monolithic vs Microservices Architecture | Case Study of Netflix and Atlassian

Indexing and Hashing in DBMS

Distributed Systems: Exploring Architecture Styles

Mastering Dynamic Programming

The Definitive Guide to Site Reliability Engineering: Ensuring Uninterrupted Operations and Optimal Performance

Load Testing: Simulating High Traffic for Application Performance Analysis

5 Machine Learning Classification Algorithms

Uploading Large Files to AWS-S3 with Lightning Fast Speed | Parallel Chunk Upload

How AI is Revolutionizing Content Creators & Developers

社区洞察

其他会员也浏览了

MongoDB Roadmap: What’s Coming Next

Decoding Data Storage in Web3 Apps: Amazon S3 vs. DynamoDB

Diving Deep into DynamoDB: Unraveling the Mysteries of Amazon's NoSQL Database

Cloud Storage and ETL Pricing: A Comparison of Azure, AWS, and GCP

Amazon DynamoDB: Scalable NoSQL Database Simplified EP:08

Data Platforms - The Differences between AWS & Azure.

Unlocking Redis Potential: How Piccotalent Builds Future-Ready Solutions for Real-Time Needs

Comparison Between Redis and Kafka

Week 23 (3 Jun - 9 Jun)

Day - 07 | Databases & Analytics | AWS Cloud Practitioner Certification CLF-C02