Difference between local and global indexes in DynamoDB
Omar Ismail
Senior Software Engineer @ Digitinary | Java & Spring Expert ? | AWS & Microservices Architect ? | FinTech & Open Banking Innovator ?? | Digital Payments Expert ?? | Top 200 IT Content Creator in Jordan ?? | 40K+ ??
Why Secondary Indexes
AWS DynamoDB being a No SQL database doesn’t support queries such as?SELECT?with a condition such as the following query.
SELECT * FROM Users
WHERE email='[email protected]';
Note that when doing the query above with an SQL database, a query optimizer evaluates available indexes to see if any index can fulfill the query.
It is possible to obtain the same query?result using the DynamoDB scan operation. However, scan operations access every item in a table which is slower than query operations that access items at specific indices. Imagine, you have to look for a book in a library by going through possibly all the books in the library versus you knowing which shelf the book is at.
Thus, there is a need for another table or data structure that stores data with different primary keys and maps a subset of attributes from this base table. This other table is called a secondary index and is managed by AWS DynamoDB. When items are added, modified, or deleted in the base table, associated secondary indexes will be updated to reflect the changes.
Global(GSI) vs Local Secondary Indexes(LSI)
AWS DynamoDB supports two types of indexes: Global Secondary Index (GSI) and Local Secondary Index (LSI).
A global secondary index?is an index that has a partition key and an optional sort key that are different from the base table's primary key. It is deemed "global" because queries on the index can access the data across different partitions of the base table. It can be viewed as a different table that contains attributes based on the base table.
A local secondary index?is an index that must have the same partition key but a different sort key from the base table. It is considered "local" because every partition of a local secondary index is bounded by the same partition key value of the base table. It enables data queries with different sorting orders of the specified sort key attribute.
The local secondary index allows Query operation to retrieve several items that have the same partition key value but different sort key values AND one item with a specific partition key value and a sort key value.
Note
Secondary Index Examples
Check out the following GSI and LSI examples to get an idea of when to use which.
GSI Example
Consider this table that contains Uuid as primary key, UserId, and Data attributes.
| Uuid(Partition Key) | UserId | Data |
With this base table key schema, it can answer queries to retrieve data for a uuid. However, to get all data for a user id, it would have to do a scan query and get all the items that have a matching user id.
To be able to get all data for a user efficiently, you can use a global secondary index that has?UserId?as its primary key (partition key). Using this index, you can do a query to retrieve all data for a user.
LSI Example
Local Secondary Index enables different sorting order of the same list of items as LSI uses the same partition key as base table but different sort key. Consider this table that uses composite keys:?UserId?as partition key,?ArticleName?as sort key and other attributes: DateCreated and Data.
|UserId(Partition Key) | ArticleName(Sort Key) | DateCreated | Data|
With this base table key schema, it can answer queries to retrieve all the article sorted by names for a specific user(query by UserId). However, to retrieve all the articles associated with a user sorted by date created, you would have to retrieve all the articles first and sort them.
With a local secondary index that has?UserId?as its partition key and?DateCreated?as its sort key, you can retrieve a user’s articles sorted by date created.
领英推荐
|UserId(Partition Key) | DateCreated(Sort Key) | ArticleName | Data|
Summary — Which One Should You Use?
In short, use DynamoDB Global Secondary Index when you need to support querying non-primary key attributes of a table.
And, use DynamodB Local Secondary index when you need to support querying items with different sorting orders of attributes.
Check out?How To Create AWS DDB Secondary Indexes article?to learn how to create secondary indexes.
Here is the formal definition from the documentation:
Global secondary index?— an index with a hash and range key that can be different from those on the table. A global secondary index is considered "global" because queries on the index can span all of the data in a table, across all partitions.
Local secondary index?— an index that has the same hash key as the table, but a different range key. A local secondary index is "local" in the sense that every partition of a local secondary index is scoped to a table partition that has the same hash key.
However, the differences go way beyond the possibilities in terms of key definitions. Find below some important factors that will directly impact the cost and effort for maintaining the indexes:
Local Secondary Indexes consume throughput from the table. When you query records via the local index, the operation consumes read capacity units from the table. When you perform a write operation (create, update, delete) in a table that has a local index, there will be two write operations, one for the table another for the index. Both operations will consume write capacity units from the table.
Global Secondary Indexes have their own provisioned throughput, when you query the index the operation will consume read capacity from the index, when you perform a write operation (create, update, delete) in a table that has a global index, there will be two write operations, one for the table another for the index*.
*When defining the provisioned throughput for the Global Secondary Index, make sure you pay special attention to the following requirements:
In order for a table write to succeed, the provisioned throughput settings for the table and all of its global secondary indexes must have enough write capacity to accommodate the write; otherwise, the write to the table will be throttled.
Local Secondary Indexes can only be created when you are creating the table, there is no way to add Local Secondary Index to an existing table, also once you create the index you cannot delete it.
Global Secondary Indexes can be created when you create the table and added to an existing table, deleting an existing Global Secondary Index is also allowed.
Local Secondary Indexes support eventual or strong consistency, whereas, Global Secondary Index only supports eventual consistency.
Local Secondary Indexes allow retrieving attributes that are not projected to the index (although with additional cost: performance and consumed capacity units). With Global Secondary Index you can only retrieve the attributes projected to the index.
Special Consideration about the Uniqueness of the Keys Defined to Secondary Indexes:
In a Local Secondary Index, the range key value DOES NOT need to be unique for a given hash key value, the same thing applies to Global Secondary Indexes, the key values (Hash and Range) DO NOT need to be unique.