Speedy DolphinDB – Why is DolphinDB So Fast?
Linxiao Ma
Financial Data Architect | Hands-on Engineer | PhD | CFA Candidate | Distributed Database Expert | DolphinDB UK Rep. | Tech Blogger | Insane Coder
*More articles can be found from my blog site - https://dataninjago.com
What makes me buy into DolphinDB:
In my last blog post, I showcased a cross-exchange arbitrage example to demonstrate just how easy it is to develop with DolphinDB — a perfect way to highlight its “Friendly” side. Ever since I started using DolphinDB, the more I explored it, the more I grew to love it. In this post and the next few, I’ll exploit the key traits of DolphinDB that make me confident it will play a significant role in the financial real-time database space.
I’ve found that DolphinDB is the kind of product that practically sells itself once people dive deeper into it. Therefore, I’ll be recommending additional resources along the way, which I hope will help you explore DolphinDB more thoroughly.
This blog post highlights the “Speedy” trait of DolphinDB. Performance is a core consideration when adopting a real-time financial database, where fast data processing and low-latency queries are essential. Today, a real-time financial database must excel under extreme conditions, capable of processing massive volumes of data in very short timeframes. In this blog post, I will first present the results of several DolphinDB benchmark performance tests, and then walk you through the key features of DolphinDB that enable this level of performance.
Some Numbers First
*The test setup and detailed results can be found in the “Details” links.
Query Performance – DolphinDB vs Spark (Details)
Factor Calculation – DolphinDB vs Pandas (Details)
IoT Data Queries – DolphinDB TSDB vs DolphinDB OLAP vs ClickHouse (Details)
Low Latency ROC Ranking (ROC calc + Ranking) – DolphinDB Stream Engine (Details)
Low Latency Feature Engineering – DolphinDB Stream Engine (Details)
Real-Time Fixed Income Pricing & Risk Management – DolphinDB (Details in Chinese)
Query Performance – DolphinDB vs InfluxDB (Details in Chinese)
Why DolphinDB is so Fast?
1) Optimised TSDB Engine
DolphinDB uses the LSM-Tree structure to handle super high write workloads in real-time. Unlike B+ trees, which need to rebalance the tree during writes and cause random writes to disk, the LSM-Tree works by writing data sequentially and using in-memory write buffers. This setup allows for much faster and more efficient writes.
On the read side, DolphinDB uses several techniques to optimize performance, such as sorting, indexing, and in-memory caching. When data is being written, DolphinDB first sorts it based on the specified sortColumns, then stores it in a sorted buffer within the cache engine before flushing it to disk. During read operations, DolphinDB checks the cache engine first, performing a sequential scan in the write buffer or a binary search in the sorted buffer. On disk, records with the same sort key are organised by the time column and stored together in blocks, which serve as the fundamental units for both querying and compression. DolphinDB retrieves the relevant data blocks by fetching the index of the related sort keys, which contains the block address offsets.
Recommended Reading:
2) Multi-Model Database
DolphinDB’s multi-model database architecture boosts query performance by tailoring storage and querying mechanisms to the specific data model and use scenarios. DolphinDB currently supports the following storage engines:
Recommended Reading:
领英推荐
3) C++ Native
The core of DolphinDB, including its underlying architecture, query engine, and compute-intensive functions, is written in C++. This design choice allows DolphinDB to achieve low-level optimisation of system resources and provides fine-grained control over memory management. By using C++, DolphinDB benefits from several advantages over interpreted or virtual machine-based languages (such as Java or Python), such as:
Recommended Reading:
4) Distributed, Parallel Processing
DolphinDB is a distributed database that splits data into partitions and stores them across multiple nodes in a cluster. This not only provides scalability for handling massive datasets but also supports parallel processing, allowing query workloads to be distributed across multiple nodes in the cluster.
DolphinDB supports distributed SQL query execution, implemented using the underlying map-reduce model. Partition pruning is applied based on the filter clause, enhancing query performance by processing only the relevant partitions. Additionally, analytics algorithms built into DolphinDB, including machine learning algorithms, are also designed to run in a distributed manner.
Recommended Reading:
5) Vectorisation
Vectorisation in DolphinDB is a technique that allows the database to process multiple data elements simultaneously by leveraging modern CPU instruction sets designed for parallel processing. As a core feature of DolphinDB, Vectorisation is supported by most of its built-in functions, including moving window functions. This enables DolphinDB to perform operations on entire columns (vectors) of data in one go, rather than processing data row-by-row. The result is a significant boost in performance, especially when handling large datasets, as it minimizes the overhead of repetitive operations and maximizes CPU efficiency.
Recommended Reading:
6. Query Optimisation
DolphinDB features a cost-based query optimiser that evaluates the costs of different execution paths and selects the most efficient one for each query. The optimiser is particularly effective for distributed queries, ensuring optimal performance in a distributed environment.
Recommended Reading:
7. In-Memory Computing
DolphinDB provides a built-in, lightweight in-memory computing engine that processes data directly in memory, minimising the need for disk I/O operations and resulting in significant improvements in execution speed. DolphinDB offers a range of pre-built in-memory tables, including keyed, indexed, stream, cached, and MVCC tables. In-memory tables are particularly crucial for stream processing, which demands low-latency, real-time computing. DolphinDB also provides partitioned in-memory tables that leverage the parallel computing capabilities of multi-core CPUs.
In addition, starting from version 3.0, DolphinDB introduced an in-memory OLTP engine, enabling it to handle use cases that require ultra-low latency, high concurrency, strong consistency, and ACID transactions, such as in trading platforms.
Recommended Reading:
8. Optimised OOTB Data Structure, Functions, and Modules
DolphinDB comes with a wide range of optimised, out-of-the-box (OOTB) data structures, functions, modules, and plug-ins, such as Array Vectors, optimised matrices, rewritten factors, and time-series functions, all specifically engineered to handle large-scale data computations with high efficiency. These data structures are essential to DolphinDB’s performance and speed, particularly when working with time-series data, real-time analytics, and extensive datasets.
Recommended Reading:
9. Just-In-Time (JIT) Support
While DolphinDB is implemented in C++, the programming language of DolphinDB is an interpreted language that parses DolphinDB scripts developed by an end developer or analyst into a syntax tree, which is then executed recursively. This process can be time-consuming for non-vectorised computations. Therefore, DolphinDB implements JIT, which significantly improves the execution speed.
Recommended Reading:
10. Support Efficient Inter-node Communication Technology
Starting from DolphinDB version 3.00.1, RDMA (Remote Direct Memory Access) technology is natively supported, enabling the full utilisation of RDMA’s performance benefits. This is especially advantageous in scenarios that require high-speed, low-latency communication, such as distributed data processing and real-time analytics. By bypassing the operating system’s kernel and allowing direct memory access between servers, RDMA significantly reduces latency and improves throughput, making DolphinDB even more efficient for large-scale, performance-sensitive applications.
Recommended Reading: