登录查看更多内容

Storage and I/O Tuning in Linux for Ultra-Low Latency Trading

Nikhil G.

SRE/DevOps | Web3/De-Fi | Ultra-low latency Trading infra

发布日期: 2024年9月14日

In my previous articles, I covered CPU tuning, Network Optimization using Solarflare NICs, and Memory tuning for ultra-low latency trading. Often, storage and I/O tuning are overlooked compared to CPU, memory, and network optimization. However, optimizing storage and I/O is crucial. Imagine hardware interrupts stacking up due to a poorly optimized disk. A single disk-intensive thread can block others, causing delays.

When optimizing for performance, ensure data integrity is not compromised, as compliance in trading and exchange systems is legally mandatory.

Use Fast Storage:

The backbone of low-latency systems is fast storage. NVMe SSDs offer significantly lower latencies than traditional SSDs or HDDs. They provide direct CPU access via the PCIe bus, reducing read/write times and ensuring high throughput. Always monitor vendor releases for the latest hardware and features. It’s not just a recommendation but a necessity to consult with vendors and read their manuals regularly to stay updated on the latest innovations in storage.

Filesystem Selection:

XFS and EXT4 are the most suitable filesystems due to their low overhead and efficient journaling. Journaling helps maintain filesystem consistency by logging changes before applying them, allowing for quick recovery after crashes. Use data=writeback mode for the lowest latency if you can tolerate some risk. Adjust the journal commit interval to balance performance and data safety.

Other Filesystems:

ZFS: Offers advanced caching (ARC and L2ARC) but requires careful tuning for low latency.
Btrfs: Provides features like copy-on-write and built-in RAID but may introduce latency.

EXT4 and XFS still remain the most reliable choices for trading systems.

I/O Scheduler Tuning:

Linux I/O schedulers manage read/write queue processing:

none: Bypasses scheduling, ideal for NVMe devices, minimizing latency.
mq-deadline: Ensures I/O request deadlines, reducing spikes. Suitable for modern multi-queue storage.
bfq: Fair allocation of I/O bandwidth, mainly for multi-tenant environments.

For ultra-low latency, use none for NVMe and mq-deadline for other storage types.

Direct I/O and Buffer Bypass:

Use Direct I/O (O_DIRECT) to bypass kernel buffers and avoid unnecessary data copying between user space and kernel space. This approach is highly beneficial for applications requiring predictable and low-latency I/O.

领英推荐

The entire SG200X series supports the rv64ilp32…

SOPHGO 11 个月前

CURT -- THE “CPU USAGE REPORTING TOOL”

Mark Ray 1 年前

Develop Your Windows Apps on Arm-Based Hardware

Patrick Hopper 1 年前

Disk Write Caching:

Enable disk write caching to boost write performance. However, understand that while it increases speed, it risks data loss in case of power failure. For critical data, ensure write caching is backed by reliable power sources or UPS systems. Needless to say DO NOT LOSE DATA in the process of improving the performance.

Reduce Dirty Page Ratios:

Tuning the Linux virtual memory system can reduce I/O spikes:

Adjust dirty background ratio and dirty ratio to lower values, ensuring data is written to disk more frequently. This prevents large write bursts and helps maintain consistent I/O latency. I wrote more about this in Memory tuning article.

Asynchronous I/O (AIO):

Leverage Asynchronous I/O to allow non-blocking read/write operations. By decoupling I/O processing from application threads, AIO can enhance throughput and lower latency, especially for applications handling multiple I/O requests.

Align Partitions Properly:

Improper partition alignment can cause additional I/O operations, affecting performance. Align partitions to the disk's physical block size (typically 4K or 8K) to ensure optimal read and write efficiency.

I/O Affinity and NUMA:

Bind I/O operations to local NUMA nodes, similar to CPU and memory tuning, to prevent cross-node latencies. Tools like numactl can be used to bind storage processes to specific NUMA nodes.

Final Thoughts:

Do not lose data. Optimizing storage and I/O configurations is essential in achieving ultra-low latency trading performance my experience. By leveraging fast storage, selecting appropriate filesystems, using direct I/O, and reducing memory pressure, you can minimize I/O-related latencies. When combined with CPU, Network, and memory tuning, these practices ensure a well-rounded low-latency trading infrastructure.

Jose Baez

IT professional with over 13 years of experience in financial and government sectors, specializing in cloud technologies, system administration, support, infrastructure and database management.

3 个月

Thanks for the article. Every component where data is being transferred between devices can cause bottlenecks. So, all the devices need to be included as part of the tuning process.

1 次回应

Jeremy Lucid

KDB+/Q Consultant

5 个月

When it comes to publishing your data from the engine to a message broker, what protocol is best for fast serialisation? Protobuf or avro or something else?

查看更多评论

要查看或添加评论，请登录

Nikhil G.的更多文章

[ 5 min ] Docker internals in ASCII - Fun illustration

2024年11月5日

[ 5 min ] Docker internals in ASCII - Fun illustration

1. Docker Architecture Docker Architecture Overview 2.
Memory Tuning in Linux for Ultra-Low Latency Trading

2024年9月12日

Memory Tuning in Linux for Ultra-Low Latency Trading

In my previous articles, I covered CPU optimization and Network Optimization using Solarflare NICs for ultra-low…

2 条评论
CPU Optimization in Linux for Ultra-Low Latency Trading

2024年9月10日

CPU Optimization in Linux for Ultra-Low Latency Trading

In my previous article, I wrote about Network Optimization using Solarflare NICs for ultra-low latency trading. In…

10 条评论
Achieving Ultra-Low Latency in Trading: A Guide for Engineers

2024年9月9日

Achieving Ultra-Low Latency in Trading: A Guide for Engineers

In high-frequency trading (HFT), every microsecond matters. Achieving ultra-low latency isn’t just an advantage—it’s a…

2 条评论

Storage and I/O Tuning in Linux for Ultra-Low Latency Trading

Nikhil G.

SRE/DevOps | Web3/De-Fi | Ultra-low latency Trading infra

Use Fast Storage:

Filesystem Selection:

I/O Scheduler Tuning:

Direct I/O and Buffer Bypass:

领英推荐

Disk Write Caching:

Reduce Dirty Page Ratios:

Asynchronous I/O (AIO):

Align Partitions Properly:

I/O Affinity and NUMA:

Final Thoughts:

Nikhil G.的更多文章

社区洞察

其他会员也浏览了

LimitRange in Kubernetes

Checking whether an ARM NEON register is zero

WHAT IS CLUSTER

How Container Runtimes(CRI) Leverage the Linux Kernel: Core Motivations and Technical Insights

ARM processor

CURT -- The CPU Usage Reporting Tool

(1981) The First PC: IBM 5150 Personal Computer

Using quantum computing in your system architecture

Guide to Kernel in Operating System | Types of Kernel OS

Use Fast Storage:

Filesystem Selection:

I/O Scheduler Tuning:

Direct I/O and Buffer Bypass:

领英推荐

Disk Write Caching:

Reduce Dirty Page Ratios:

Asynchronous I/O (AIO):

Align Partitions Properly:

I/O Affinity and NUMA:

Final Thoughts:

Nikhil G.的更多文章

[ 5 min ] Docker internals in ASCII - Fun illustration

Memory Tuning in Linux for Ultra-Low Latency Trading

CPU Optimization in Linux for Ultra-Low Latency Trading

Achieving Ultra-Low Latency in Trading: A Guide for Engineers

社区洞察

其他会员也浏览了

LimitRange in Kubernetes

Checking whether an ARM NEON register is zero

WHAT IS CLUSTER

How Container Runtimes(CRI) Leverage the Linux Kernel: Core Motivations and Technical Insights

ARM processor

CURT -- The CPU Usage Reporting Tool

(1981) The First PC: IBM 5150 Personal Computer

Using quantum computing in your system architecture

Guide to Kernel in Operating System | Types of Kernel OS