登录查看更多内容

Speedy DolphinDB – Why is DolphinDB So Fast?

Linxiao Ma

Financial Data Architect | Hands-on Engineer | PhD | CFA Candidate | Distributed Database Expert | DolphinDB UK Rep. | Tech Blogger | Insane Coder

发布日期: 2024年11月24日

+ 关注

*More articles can be found from my blog site - https://dataninjago.com

What makes me buy into DolphinDB:

In my last blog post, I showcased a cross-exchange arbitrage example to demonstrate just how easy it is to develop with DolphinDB — a perfect way to highlight its “Friendly” side. Ever since I started using DolphinDB, the more I explored it, the more I grew to love it. In this post and the next few, I’ll exploit the key traits of DolphinDB that make me confident it will play a significant role in the financial real-time database space.

I’ve found that DolphinDB is the kind of product that practically sells itself once people dive deeper into it. Therefore, I’ll be recommending additional resources along the way, which I hope will help you explore DolphinDB more thoroughly.

This blog post highlights the “Speedy” trait of DolphinDB. Performance is a core consideration when adopting a real-time financial database, where fast data processing and low-latency queries are essential. Today, a real-time financial database must excel under extreme conditions, capable of processing massive volumes of data in very short timeframes. In this blog post, I will first present the results of several DolphinDB benchmark performance tests, and then walk you through the key features of DolphinDB that enable this level of performance.

Some Numbers First

*The test setup and detailed results can be found in the “Details” links.

Query Performance – DolphinDB vs Spark (Details)

Single user queries – DolphinDB is 200x faster than Spark.
Multi-user concurrent queries – DolphinDB’s advantage increases linearly as the number of users grows.
In-memory computing – DolphinDB is 8x faster in average than Spark

Factor Calculation – DolphinDB vs Pandas (Details)

Calculate Alpha 98 (101 Formulaic Alphas) – DolphinDB is 1000x faster than Pandas
mcorr function – DolphinDB is 200x faster than Pandas

IoT Data Queries – DolphinDB TSDB vs DolphinDB OLAP vs ClickHouse (Details)

The DolphinDB OLAP engine and ClickHouse are neck and neck
The DolphinDB TSDB engine is faster for all test cases with an average 8x faster

Low Latency ROC Ranking (ROC calc + Ranking) – DolphinDB Stream Engine (Details)

per stock – 0.2 ms
1650 stocks – 3.45 ms

Low Latency Feature Engineering – DolphinDB Stream Engine (Details)

1 security (201 records) – 22 ms
20 securities (4020 records) – 163 ms
50 securities (10054 records) – 386 ms

Real-Time Fixed Income Pricing & Risk Management – DolphinDB (Details in Chinese)

Calculate DirtyPrice, AccrInt, Duration, Convexity, VaR, Cond VaR for 10,000 symbols with 1M rows data – around 200 ms !
PiecewiseLinFit function – DolphinDB is 6.6x faster than Python
kroghInterpolate function – DolphinDB is 192x faster than Python
Nelson-Siegel-Svensson(NSS) model – DolphinDB is 321x faster than Python

Query Performance – DolphinDB vs InfluxDB (Details in Chinese)

by time, symbol – DolphinDB is 1.2x faster than InfluxDB
agg (max/min) – DolphinDB is 1.6x faster than InfluxDB
by date, time range, symbol, price range – DolphinDB is same as InfluxDB
avg change by date, time range, bid/ask, symbol – DolphinDB is 183x faster than InfluxDB
avg price by date range, time range, symbol – DolphinDB is 71x faster than InfluxDB
avg price per day per symbol by date range, time range – DolphinDB is 208x faster than InfluxDB

Why DolphinDB is so Fast?

1) Optimised TSDB Engine

DolphinDB uses the LSM-Tree structure to handle super high write workloads in real-time. Unlike B+ trees, which need to rebalance the tree during writes and cause random writes to disk, the LSM-Tree works by writing data sequentially and using in-memory write buffers. This setup allows for much faster and more efficient writes.

On the read side, DolphinDB uses several techniques to optimize performance, such as sorting, indexing, and in-memory caching. When data is being written, DolphinDB first sorts it based on the specified sortColumns, then stores it in a sorted buffer within the cache engine before flushing it to disk. During read operations, DolphinDB checks the cache engine first, performing a sequential scan in the write buffer or a binary search in the sorted buffer. On disk, records with the same sort key are organised by the time column and stored together in blocks, which serve as the fundamental units for both querying and compression. DolphinDB retrieves the relevant data blocks by fetching the index of the related sort keys, which contains the block address offsets.

Recommended Reading:

Unveiling the TSDB Engine Behind the Extreme Performance of DolphinDB (Video – Chinese)
TSDB Storage Engine
A Detailed Explanation of DolphinDB TSDB Storage Engine (in Chinese)
Designing Data-Intensive Applications

2) Multi-Model Database

DolphinDB’s multi-model database architecture boosts query performance by tailoring storage and querying mechanisms to the specific data model and use scenarios. DolphinDB currently supports the following storage engines:

TSDB – Optimised for time-series processing scenarios
OLAP – Optimised for large-scale, multi-dimensional data analysis
VectorDB – Based on TSDB engine, optimised for fast retrieval with massive datasets scenarios, such as search engines, AI generative models
PKEY – Key-value storage engine, optimised for real-time updates scenarios
TextDB – a document retrieval engine, optimised for storing and querying text data

Recommended Reading:

DolphinDB Multi-Model Database

领英推荐

Understanding News Crawlers: Stay Informed on the Go

Forage AI 7 个月前

Best Amibroker Datafeeds for Realtime & Tick Data

Global Datafeeds 4 个月前

2022 - 120 million views later

Genuine Impact 2 年前

3) C++ Native

The core of DolphinDB, including its underlying architecture, query engine, and compute-intensive functions, is written in C++. This design choice allows DolphinDB to achieve low-level optimisation of system resources and provides fine-grained control over memory management. By using C++, DolphinDB benefits from several advantages over interpreted or virtual machine-based languages (such as Java or Python), such as:

Compiled to native machine code that the processor can execute, which leads to faster execution times compared to interpreted languages.
Direct memory control and reduced memory overhead. Direct memory control enables DolphinDB to optimise memory usage according to specific needs and avoids the overhead of garbage collection in other languages.
Predictable Low-latency execution. The lack of an interpreter or virtual machine allows DolphinDB to execute instructions with minimal delay. Since there is no virtual machine or interpreter involved, the execution behaviour is more predictable.
Native parallel and multi-threading support. C++ provides advanced features like multi-threading and parallelism, enabling DolphinDB to take full advantage of multi-core processors.
Integration with system-level libraries. C++ can interact directly with system-level libraries and hardware, providing the ability to utilise platform-specific optimisations.

Recommended Reading:

4) Distributed, Parallel Processing

DolphinDB is a distributed database that splits data into partitions and stores them across multiple nodes in a cluster. This not only provides scalability for handling massive datasets but also supports parallel processing, allowing query workloads to be distributed across multiple nodes in the cluster.

DolphinDB supports distributed SQL query execution, implemented using the underlying map-reduce model. Partition pruning is applied based on the filter clause, enhancing query performance by processing only the relevant partitions. Additionally, analytics algorithms built into DolphinDB, including machine learning algorithms, are also designed to run in a distributed manner.

Recommended Reading:

DolphinDB – Distributed Computing
DolphinDB – Distributed Database Overview
DolphinDB – Working with Partitioned In-memory Tables
Distributed Query Optimization

5) Vectorisation

Vectorisation in DolphinDB is a technique that allows the database to process multiple data elements simultaneously by leveraging modern CPU instruction sets designed for parallel processing. As a core feature of DolphinDB, Vectorisation is supported by most of its built-in functions, including moving window functions. This enables DolphinDB to perform operations on entire columns (vectors) of data in one go, rather than processing data row-by-row. The result is a significant boost in performance, especially when handling large datasets, as it minimizes the overhead of repetitive operations and maximizes CPU efficiency.

Recommended Reading:

6. Query Optimisation

DolphinDB features a cost-based query optimiser that evaluates the costs of different execution paths and selects the most efficient one for each query. The optimiser is particularly effective for distributed queries, ensuring optimal performance in a distributed environment.

Recommended Reading:

DolphinDB Query Execution Plan

7. In-Memory Computing

DolphinDB provides a built-in, lightweight in-memory computing engine that processes data directly in memory, minimising the need for disk I/O operations and resulting in significant improvements in execution speed. DolphinDB offers a range of pre-built in-memory tables, including keyed, indexed, stream, cached, and MVCC tables. In-memory tables are particularly crucial for stream processing, which demands low-latency, real-time computing. DolphinDB also provides partitioned in-memory tables that leverage the parallel computing capabilities of multi-core CPUs.

In addition, starting from version 3.0, DolphinDB introduced an in-memory OLTP engine, enabling it to handle use cases that require ultra-low latency, high concurrency, strong consistency, and ACID transactions, such as in trading platforms.

Recommended Reading:

8. Optimised OOTB Data Structure, Functions, and Modules

DolphinDB comes with a wide range of optimised, out-of-the-box (OOTB) data structures, functions, modules, and plug-ins, such as Array Vectors, optimised matrices, rewritten factors, and time-series functions, all specifically engineered to handle large-scale data computations with high efficiency. These data structures are essential to DolphinDB’s performance and speed, particularly when working with time-series data, real-time analytics, and extensive datasets.

Recommended Reading:

9. Just-In-Time (JIT) Support

While DolphinDB is implemented in C++, the programming language of DolphinDB is an interpreted language that parses DolphinDB scripts developed by an end developer or analyst into a syntax tree, which is then executed recursively. This process can be time-consuming for non-vectorised computations. Therefore, DolphinDB implements JIT, which significantly improves the execution speed.

Recommended Reading:

Just-in-time (JIT) Compilation
IV and Greeks Calculation using JIT (in Chinese)

10. Support Efficient Inter-node Communication Technology

Starting from DolphinDB version 3.00.1, RDMA (Remote Direct Memory Access) technology is natively supported, enabling the full utilisation of RDMA’s performance benefits. This is especially advantageous in scenarios that require high-speed, low-latency communication, such as distributed data processing and real-time analytics. By bypassing the operating system’s kernel and allowing direct memory access between servers, RDMA significantly reduces latency and improves throughput, making DolphinDB even more efficient for large-scale, performance-sensitive applications.

Recommended Reading:

要查看或添加评论，请登录

Linxiao Ma的更多文章

Coding towards CFA (44) – General Residual Income Model for Equity Valuation

2025年1月30日

Coding towards CFA (44) – General Residual Income Model for Equity Valuation

*More articles can be found from my blog site - https://dataninjago.com The Residual Income Model is one of the equity…
Coding towards CFA (43) – Equity Risk Premium Estimates using Forward-Looking Approach

2025年1月27日

Coding towards CFA (43) – Equity Risk Premium Estimates using Forward-Looking Approach

*More articles can be found from blog site - https://dataninjago.com In the previous blog post, we explored the…
Coding towards CFA (42) – Equity Risk Premium Estimates using Historical Approach

2025年1月26日

Coding towards CFA (42) – Equity Risk Premium Estimates using Historical Approach

*More articles can be found in my blog site - https://dataninjago.com In this blog post, I will begin exploring the…
Coding towards CFA (41) – Cobb-Douglas Production Function and Neoclassical Growth Model

2025年1月25日

Coding towards CFA (41) – Cobb-Douglas Production Function and Neoclassical Growth Model

*More articles can be found in my blog site - https://dataninjago.com This is another code-less blog post in my “Coding…
Coding towards CFA (40) – FX Carry Trade

2025年1月24日

Coding towards CFA (40) – FX Carry Trade

*More articles can be found from this blog site - https://dataninjago.com As discussed in the previous blog post, under…
Coding towards CFA (39) – International Parity Conditions

2025年1月22日

Coding towards CFA (39) – International Parity Conditions

*More articles can be found from my blog site - https://dataninjago.com This is the first code-less blog post in my…
Coding towards CFA (38) – Mark-to-Market of Forex Forward Contract

2025年1月21日

Coding towards CFA (38) – Mark-to-Market of Forex Forward Contract

*More articles can be found from my blog site - https://dataninjago.com Mark-to-Market (MTM) is the process of valuing…
Coding towards CFA (37) – Triangular Arbitrage in Forex Trading

2025年1月20日

Coding towards CFA (37) – Triangular Arbitrage in Forex Trading

*More articles can be found from my blog site - https://dataninjago.com Triangular arbitrage is a strategy used to…
Coding towards CFA (36) – Performance Attribution with Brinson Model in DolphinDB and Python

2025年1月19日

Coding towards CFA (36) – Performance Attribution with Brinson Model in DolphinDB and Python

*More articles can be found from this blog site - https://dataninjago.com Performance attribution is discussed in the…
Coding towards CFA (35) – The Monte Carlo Method of VaR Estimation

2025年1月17日

Coding towards CFA (35) – The Monte Carlo Method of VaR Estimation

*More articles can be found from my blog site - https://dataninjago.com In the previous blog post, we explored the…

See all articles

Speedy DolphinDB – Why is DolphinDB So Fast?

Linxiao Ma

Financial Data Architect | Hands-on Engineer | PhD | CFA Candidate | Distributed Database Expert | DolphinDB UK Rep. | Tech Blogger | Insane Coder

Some Numbers First

Why DolphinDB is so Fast?

领英推荐

Linxiao Ma的更多文章

社区洞察

其他会员也浏览了

1486 Labs Notes?

Small Data

Dirty data is the mother of all screw-ups

Weekly Release Notes: Week 17, 2023

UPDATE II: Hello world!

Ironbeam Platform Update: Market-By-Order (MBO) Data Now Available

Why we do what we do.

Generate ALPHA across all asset classes

The future of the modern data stack in 2023

Harnessing the Power of Real-Time Data: The New Frontier for Brokers

Some Numbers First

Why DolphinDB is so Fast?

领英推荐

Linxiao Ma的更多文章

Coding towards CFA (44) – General Residual Income Model for Equity Valuation

Coding towards CFA (43) – Equity Risk Premium Estimates using Forward-Looking Approach

Coding towards CFA (42) – Equity Risk Premium Estimates using Historical Approach

Coding towards CFA (41) – Cobb-Douglas Production Function and Neoclassical Growth Model

Coding towards CFA (40) – FX Carry Trade

Coding towards CFA (39) – International Parity Conditions

Coding towards CFA (38) – Mark-to-Market of Forex Forward Contract

Coding towards CFA (37) – Triangular Arbitrage in Forex Trading

Coding towards CFA (36) – Performance Attribution with Brinson Model in DolphinDB and Python

Coding towards CFA (35) – The Monte Carlo Method of VaR Estimation

社区洞察

其他会员也浏览了

1486 Labs Notes?

Small Data

Dirty data is the mother of all screw-ups

Weekly Release Notes: Week 17, 2023

UPDATE II: Hello world!

Ironbeam Platform Update: Market-By-Order (MBO) Data Now Available

Why we do what we do.

Generate ALPHA across all asset classes

The future of the modern data stack in 2023

Harnessing the Power of Real-Time Data: The New Frontier for Brokers