Rocks DB: One of tool to achieve lowest latency

Rocks DB: One of tool to achieve lowest latency

Before we begin, if you are looking out to use RocksDB in your project then you are at right place, here in this article we are trying to cover one must know features of RocksDB. Article is quick summary of rocksdb. Article doesn't deep dive into intricate details, just setting an expectations here.

Let's begin now...

Client facing applications demands lightning speed real time latency, accessing data over the network can slowdown the response time at least by network roundtrip and IO. Applications cares to not lose fraction millisecond has to do something about network consuming latency as it's dominating factor.

  1. One option is to optimize network which is not fairy tail game.
  2. Second option could be to choose architecture which eliminate network itself and move database close to application


In a client-server architecture where the database is hosted on an SSD, nearly half of the time is consumed by network latency, making it a significant bottleneck. The relative impact of this latency can vary based on hardware and network configurations. One potential solution to mitigate this issue is to implement a faster network.

Client-Server Architecture

Another solution to problem is to completely remove the network from architecture and move the storage near to application.

Architecture which eliminate network itself.

Will this model fits perfect in all applications? Well answer is No! you have to take concise decision based on your application use case.

What RocksDB is not?

  • Not Distributed
  • No failover
  • Not highly available, if machine dies you lose your data (with ephemeral storage)


Write Request Path

  • Write Ahead Log (WAL): When a write request (put, delete, or update) is made, the data is first written to the Write Ahead Log (WAL) to ensure durability. This log helps in recovery in case of a crash.
  • MemTable: After writing to the WAL, the data is added to the active MemTable, which is an in-memory sorted table. This allows for fast writes and reads of recent data.
  • Flush to SST Files: When the MemTable becomes full, it is frozen and written to disk as an immutable SST file. This process is called flushing. During flushing, the data is sorted and written in a format optimized for read performance.
  • Compaction: Over time, multiple SST files accumulate. RocksDB periodically merges these files through a process called compaction. Compaction reduces the number of SST files, reclaims space from deleted entries, and maintains read efficiency by organizing data into levels.

Write path

Read Request Path

  • Active MemTable: The read operation starts by checking the MemTable, an in-memory data structure where recent writes are stored.
  • Immutable MemTables: If the key is not found in the active MemTable, RocksDB checks any immutable MemTables, which are previous MemTables that have not yet been flushed to disk.
  • Block Cache: If the key is not found in the MemTables, the block cache is checked. The block cache stores frequently accessed blocks of data from SST files (Sorted String Tables).
  • SST Files: If the key is not in the block cache, the read operation proceeds to the SST files on disk.
  • Bloom Filters: For each SST file, RocksDB uses Bloom filters to quickly determine if the key might be present in the file, which helps avoid unnecessary reads.

Read Path


Can we create tables in RocksDB like in other relational databases?

RocksDB has column family handler which is logical partition of key-value pairs. You can create multiple column family handlers and insert/read key-value pairs. To conclude the answer, Yes, in a way column family handler serve the same purpose as tables.

For intricate details - refer

What happens if the machine crashes while the application is running?

You will lose the database. You need to have your own ways to backup the database to durable storage. One way could be to backup database on some interval to durable storage and on crash restore database back on application startup. Application needs to be resilient enough to handle missing data scenarios and start from the last consistent state.

To know more details about backup - refer

How to retrieve data from RocksDB?

Here are several ways to retrieve data from RocksDB

  1. Get Operation - The get operation retrieves a value for a specific key.
  2. MultiGet Operation - The multiGet operation retrieves values for multiple keys at once.
  3. Iterators - Iterators allow sequential access to the key-value pairs in the database.
  4. Prefix Seek - Prefix Seek allows efficient range queries by seeking to a specified prefix.

Does RocksDB support Indexing?

RocksDB does not natively support indexing, it is flexible enough to allow developers to build and manage their own indexing mechanisms tailored to their application's requirements. Sample custom indexing demonstrated as below.

Index key/value payloads are light weight compared to real dataset key/value pairs.

Key: prefix,k1,v1     Value: v1
Key: prefix,k1,v2     Value: v2
Key: prefix,k1,v3     Value: v3

// Index
Key: index_prefix,v1     Value: prefix,k1,v1
Key: index_prefix,v2     Value: prefix,k1,v2
Key: index_prefix,v3     Value: prefix,k1,v3        

What is the key ordering in RocksDB?

Organizes keys in lexicographical (or byte-wise) order by default. This ordering is based on the binary representation of the keys. Here’s a more detailed breakdown:

Byte-wise Lexicographical Order: Keys are compared byte by byte. This means that a shorter key might come before a longer key if the shorter key is a prefix of the longer one. For example, the key "abc" would come before the key "abcd".

Numerical Sorting: When keys are numeric, they are still compared based on their byte representation. Therefore, the string "10" will come before the string "2" because in lexicographical order, "1" is less than "2", and the first byte determines the order.

Custom Comparators: While the default is lexicographical order, RocksDB allows you to define custom comparators if you need a different sorting order for your keys. This can be useful for implementing specific application logic.

Does RocksDB support Transaction?

Yes, it does support transactions.

For more details - refer

Does RocksDB experience performance degradation in any scenario?

Yes, it does get's slower in some scenarios, this is coming from practical experience using rocksdb in real world project.

  1. As size of database grows, read becomes slower. Pruning and creating indexes might be potential workaround based on use case.
  2. Frequent deletes slow down iterator - In rocksdb delete creates tombstone. Tombstones doesn't disappear until bottom level compaction happens. Some reads needs to scan lot of tombstones which is inefficient.

How can you manage the increasing database size in RocksDB?

This really depends on use-case, for us we were simply pruning the database which was keeping database size in check. We were moving rocksdb data to secondary storage for query purpose, any client query was served from secondary storage, this allowed us to prune the primary storage i.e. RocksDB and keep only active dataset.

How can you handle scenarios involving frequent deletions?

Again answer largely vary on use-case. We were triggering compaction immediately after deleting set of keys which was helping to retain the performance at large extend.

Compaction in RocksDB is a process that reorganizes data on disk to optimize storage utilization and improve read and write performance.

As data is written to RocksDB, old data versions and deleted data accumulate on disk. Compaction helps reclaim this space by removing obsolete data and consolidating live data.

How to avoid memory leaks?

Close the iterators and meta resources associated with iterators

What is Compaction in RocksDB?

Compaction in RocksDB is a process that reorganizes data on disk to optimize storage utilization and improve read and write performance. Here are the primary functions and benefits of compaction in RocksDB:

  1. Reclaiming Space
  2. Merging Data
  3. Sorting and Organizing Data
  4. Reducing Write Amplification
  5. Improving Read Performance

Where to look RocksDB performance benchmarking

https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks

Who all using RocksDB in Industry?

https://github.com/facebook/rocksdb/wiki/RocksDB-Users-and-Use-Cases


Reference

  1. https://www.youtube.com/watch?v=V_C-T5S-w8g
  2. https://engineering.fb.com/2013/11/21/core-infra/under-the-hood-building-and-open-sourcing-rocksdb/
  3. https://github.com/facebook/rocksdb/wiki


Amit Wadhe

Arcesium | Walmart | Morgan Stanley | Yodlee

5 个月
回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了