登录查看更多内容

Advanced Filtering Techniques With DynamoDB

Uriel Bitton

AWS Cloud Engineer | The DynamoDB guy | AWS Certified & AWS Community Builder | I help you build scalable DynamoDB databases ????

发布日期: 2024年6月28日

Welcome to the 12th edition of Excelling With DynamoDB!

In this week’s issue I’ll guide you through some advanced filtering techniques in DynamoDB and explain how they can be used to perform powerful data filtering.

Here’s the breakdown of what I’ll discuss:

How filtering works differently than filtering in other NoSQL databases
The 2 methods of filtering in DynamoDB and how they compare
When to apply which filtering method

In SQL databases filtering queries are straightforward. You have an expressive syntax and can add any number of filter conditions by specifying WHERE clauses.

In SQL, you can join tables and filter on the value of the join.

Simply declaring “WHERE age > 20” will instantly filter your table items.

DynamoDB is radically different with filters.

DynamoDB, being a key-value database allows you to query in 2 different manners. One is highly efficient, the other is a little less so but more flexible.

Compared to SQL databases, DynamoDB’s recommended filtering is much faster and scalable; let’s see why.

The 2 Methods Of Filtering

Primary key filtering - this is done by cleverly designing our partition and sort keys
Filter Expressions - filtering done on the server side but after a scan of all items in a partition

Let’s understand these 2 methods in detail.

Primary Key Filtering

Primary key filtering is the filtering technique that is the most different and has a higher learning curve for developers learning DynamoDB.

This filtering method, while more complex, remains the most efficient and highly scalable.

It involves carefully modelling our data, using carefully designed partition and sort key combinations to enable efficient filtering.

This is best understood with an example.

Say we wish to design a movies database. We wouldn’t design it the same way we would for a relational database. Instead, based on our required data access patterns, we will do 2 things

Use our base table for our main access pattern, which is usually something like “get a movie by movieID”.
Create a global secondary index for our other access patterns like “get movies by year, by genre and by title”.

To satisfy our data access patterns above we need to model our data accordingly.

A typical movie item in our base movies table will look like this:

The data model here isn’t in any way special; it simply allows you to query for a single movie by providing a movieID.

But take a look at the next data model for our global secondary index:

It’s radically different, especially the second column sort key “genreTitle”. We are also using the year as the partition key.

We model the data in this way for 2 reasons:

First, we define the year as the partition key because it is a low cardinality key (limited items in the partition). This also allows us to get all movies by a given year.

Second, we define “genreTitle” as the sort key which contains strings of a particular movie’s genre and title delimited by ‘#’ characters.

This is where the filtering takes place.

Let’s demonstrate how we can perform powerful filtering with this data modelling and satisfy our data access patterns defined earlier.

To get all movies for a given year we can run the following pseudo query on our global secondary index:

year = 2023 AND begins_with(genreTitle, ‘#’)

This query will find the “2023” partition and get all items whose ‘genreTitle’ (sort key) attribute begins with “#” - which is all movies released in 2023.

On a side note, even though this could return many items, keep in mind DynamoDB will always return a maximum of 1MB’s worth of items. This is why Alex Debrie often says:

“DynamoDB won’t let you write a query that won’t scale”

This is also a design which makes DynamoDB powerful and highly scalable.

Now the above query returned a base filter of items for a given year only. But say we want to perform an even deeper filter.

Here’s what we can do.

We’ll filter in movies from 2023 AND movies that belong to the history genre as well.

Here is our filtered pseudo query:

year = 2023 AND begins_with(genreTitle, ‘#history’)

This would return items “Troy” and “Oppenheimer”.

We can further filter the results by providing the movie title:

year = 2023 AND begins_with(genreTitle, ‘#history#oppenheimer’)

This would of course return the item “Oppenheimer”.

We’ve now satisfied all of our access patterns. We can now easily get:

all movies from a given year
all movies from a given genre
a movie from a given title

Additionally, If we wanted to filter even deeper and had for example a tv shows table, we could apply filters down to the season and episode level.

All we require to modify for this TV Shows table is to add the season and episode numbers in our sort key string like so:

领英推荐

The Latest In Distributed SQL - July

TiDB, powered by PingCAP 7 个月前

ITea Talks with Hristo Zhelev: Indexing in DynamoDB

adesso Bulgaria 4 个月前

The Comprehensive Guide to Apache Parquet: A…

DataTech Integrator 1 个月前

Then we can further filter for tv shows by providing a string for a particular season:

yearStart = 2011 AND begins_with(genreTitleSeasonEpisode, ‘#fantasy#game of thrones#s1)

Now we are filtering on the year (startYear for TV Shows), the genre, the title and on the season (we can also filter on the episode if we need to by appending #e1 to the filter string).

The above query would return all episodes that are in “Game of Thrones” season 1.

The examples we’ve seen are all examples of filtering using the primary keys of our table/secondary indexes.

Every filter query we have performed above is efficient and highly scalable.

This means even if we had millions or billions of items in our movies or tv shows table, the latency would be the same as if we had 10 items - an astonishing 2-digit milliseconds.

The second type of filtering we’ll take a look at are Filter Expressions. These are less efficient but more flexible as we’ll see.

Filter Expressions

Filter Expressions is filtering that is performed on our query after a query or scan operation has run.

This is the filtering method most of you coming from other NoSQL databases would be familiar with.

With Filter Expressions you can provide a query expression like year = 2023, to which DynamoDB scans all items in the given year “2023” partition, and then applies your given filters and returns the matching items.

There are two problems with this “traditional” filtering in DynamoDB.

First, DynamoDB must scan every item in the partition, once it has scanned all those items, it then applies your filters. This naturally adds higher latency to your query.

Second, since DynamoDB charges you per data scanned, you are effectively being charged for all items scanned and not the filtered items that are returned.

When you perform any query against your DynamoDB table, in the response you’ll get your data, as well as an attribute “ScannedCount” and “Count”.

ScannedCount is the number of items DynamoDB had to scan, regardless of your filter.

Count is the number of items DynamoDB returned after your filter.

In terms of costs, you are charged for the ScannedCount number not the Count number.

This is why primary key filtering is more efficient and cost-effective - here the ScannedCount will be the same as the Count.

However, Filter Expressions are not all doom and gloom. They provide you with more flexibility than primary key filtering.

Recall with primary key filtering, we are limited to filtering on attributes that we pre-define and cannot modify after creating our table; such as genre, title and season.

With Filter Expressions we can filter our query on any attribute we like. We can use the genre or season attributes non-sequentially, contrary to primary key filtering - allowing for more expressive filtering.

Filter Expression Caveats

There is one caveat however. Because of DynamoDB’s 1MB limit, our filter expressions are not effective for retrieving items that amount to larger than 1MB in size.

To understand this limitation, I’ve used an example from Alex Debrie’s DynamoDB Guide:

“However, the key point to understand is that the Query and Scan operations will return a maximum of 1MB of data, and this limit is applied in step 1, before the filter expression is applied.

Imagine your music table was 1GB in size, but the songs that were platinum were only 100KB in size. You might expect a single Scan request to return all the platinum songs, since it is under the 1MB limit. However, since the filter expression is not applied until after the items are read, your client will need to page through 1000 requests to properly scan your table. Many of these requests will return empty results as all non-matching items have been filtered out.” [1]

The bottom line here is to be careful with filter expressions as they often will not retrieve exactly what you needed.

When To Apply Which Filtering Method

The question now remains when should you apply either filtering method?

According to Alex Debrie, you should avoid Filter Expressions whenever you can.

I tend to stick to primary key filtering for the strong majority of the time and recommend to only use Filter Expressions for the following 2 cases:

When you have an important access pattern which cannot be satisfied with the primary keys of your table or index
When your table contains a limited and low number of items, and latency or cost is not a concern.

One thing to keep in mind is that DynamoDB will charge you for each global secondary index. If you have already created several GSIs it may be more efficient to use a filter expression, especially if the first point above applies.

Conclusion

Filtering in DynamoDB is performed differently than in most NoSQL databases. Understanding how DynamoDB works is crucial to designing effective and optimized filters on your tables.

To write performant filtering, you must first work on carefully modelling your table data and designing your primary keys intelligently.

In this article, we went through the 2 types of filtering in DynamoDB; primary key filtering - a powerful and efficient filtering technique that optimizes for latency and costs, and Filter Expressions which are mostly recommended on smaller tables where latency and costs are not a concern.

If you enjoyed this post, please consider subscribing and sharing the newsletter with your network: https://www.dhirubhai.net/newsletters/7181386750357893121/

?? My name is Uriel Bitton and I hope you learned something in this edition of Excelling With DynamoDB.

?? You can share the article with your network to help others learn as well.

?? If you enjoyed this you can subscribe to my newsletter on Medium to get my latest articles on DynamoDB and serverless by email.

?? my blog website is coming soon, so stay tuned for that.

?? I hope to see you in the next week's edition!

Uriel

References

The DynamoDB Guide. Alex Debrie. https://www.dynamodbguide.com.

Excelling With DynamoDB

1,157 位关注者

要查看或添加评论，请登录

Uriel Bitton的更多文章

What Is Split For Heat In DynamoDB?

2025年3月8日

What Is Split For Heat In DynamoDB?

It is well known that Amazon DynamoDB adapts to workloads of any scale while offering consistent performance. But a…
How To Master IAM For All Your AWS Services

2025年3月3日

How To Master IAM For All Your AWS Services

AWS IAM is a free service that provides you with powerful ways to manage user and service access permissions in your…

2 条评论
The Only 2 Query Methods You Need To Learn In DynamoDB

2025年3月1日

The Only 2 Query Methods You Need To Learn In DynamoDB

DynamoDB has a lot of different queries you can run to fetch data. If we put aside the Scan method, as it is not…

2 条评论
I Built A Serverless AirBnB-Styled App For Garages — Here’s What Happened

2025年2月25日

I Built A Serverless AirBnB-Styled App For Garages — Here’s What Happened

?? Hello there! Welcome to The Serverless Spotlight! The Story A couple of years ago I had this potentially great idea.…

4 条评论
Designing A Job Application Database With DynamoDB

2025年2月22日

Designing A Job Application Database With DynamoDB

I built a job board mobile app for an idea I had with a partner of mine last year. The idea was innovative and would…

2 条评论
How Zoom Used DynamoDB To Grow From 10 Million To 300 Million Daily Users

2025年2月18日

How Zoom Used DynamoDB To Grow From 10 Million To 300 Million Daily Users

In early 2020, Zoom encountered massive traffic spikes in user requests, due to the Covid-19 pandemic. As users stayed…

4 条评论
Why You Shouldn't Use An ORM With DynamoDB

2025年2月15日

Why You Shouldn't Use An ORM With DynamoDB

ORMs simplify your life. I get it.

6 条评论
How To Create An HTTP API In AWS API Gateway

2025年2月11日

How To Create An HTTP API In AWS API Gateway

Build an HTTP API in a matter of minutes not hours — with AWS API Gateway. Back in the day, creating REST or HTTP APIs…

4 条评论
My Top 3 Takeaways From Building Scalable Databases With DynamoDB

2025年2月8日

My Top 3 Takeaways From Building Scalable Databases With DynamoDB

I spent the past 5 years building scalable databases for clients and organizations. Here are a few lessons I’ve learned…

2 条评论
Amazon DynamoDB Vs Google Firebase: Which Serverless Database To Choose For Your App?

2025年2月3日

Amazon DynamoDB Vs Google Firebase: Which Serverless Database To Choose For Your App?

?? Hello there! Welcome to The Serverless Spotlight! In this week's edition, we;ll be taking a look at two serverless…

4 条评论

See all articles

Advanced Filtering Techniques With DynamoDB

Uriel Bitton

AWS Cloud Engineer | The DynamoDB guy | AWS Certified & AWS Community Builder | I help you build scalable DynamoDB databases ????

The 2 Methods Of Filtering

Primary Key Filtering

领英推荐

Filter Expressions

Filter Expression Caveats

When To Apply Which Filtering Method

Conclusion

Excelling With DynamoDB

1,157 位关注者

Uriel Bitton的更多文章

社区洞察

其他会员也浏览了

Trino Deep Dive

A deep dive: What is LSM tree?

Overview of Discord's data platform that daily processes petabytes of data

Data Partitioning and Sharding - From Scratch

Learn How to Use ClickHouse Materialized Views to Move Data from Kafka Topics into ClickHouse Tables Real Time : A Beginner's Guide with Hands-On Labs

Graph Database - Trying out Neo4J

The Challenges of Graph Database Adoption: An In-Depth Analysis

Efficient Data Modelling In DynamoDB

Speeding up database full text searches

Reverse Engineering a Source System - Metadata (2 of 5)

The 2 Methods Of Filtering

Primary Key Filtering

领英推荐

Filter Expressions

Filter Expression Caveats

When To Apply Which Filtering Method

Conclusion

Excelling With DynamoDB

1,157 位关注者

Uriel Bitton的更多文章

What Is Split For Heat In DynamoDB?

How To Master IAM For All Your AWS Services

The Only 2 Query Methods You Need To Learn In DynamoDB

I Built A Serverless AirBnB-Styled App For Garages — Here’s What Happened

Designing A Job Application Database With DynamoDB

How Zoom Used DynamoDB To Grow From 10 Million To 300 Million Daily Users

Why You Shouldn't Use An ORM With DynamoDB

How To Create An HTTP API In AWS API Gateway

My Top 3 Takeaways From Building Scalable Databases With DynamoDB

Amazon DynamoDB Vs Google Firebase: Which Serverless Database To Choose For Your App?

社区洞察

其他会员也浏览了

Trino Deep Dive

A deep dive: What is LSM tree?

Overview of Discord's data platform that daily processes petabytes of data

Data Partitioning and Sharding - From Scratch

Learn How to Use ClickHouse Materialized Views to Move Data from Kafka Topics into ClickHouse Tables Real Time : A Beginner's Guide with Hands-On Labs

Graph Database - Trying out Neo4J

The Challenges of Graph Database Adoption: An In-Depth Analysis

Efficient Data Modelling In DynamoDB

Speeding up database full text searches

Reverse Engineering a Source System - Metadata (2 of 5)