登录查看更多内容

Low Latency in Rust with Lock-Free Data Structures

Luis Soares, M.Sc.

Lead Software Engineer | Blockchain & ZK Protocol Engineer | ?? Rust | C++ | Web3 | Solidity | Golang | Cryptography | Author

发布日期: 2024年7月3日

In high-performance and distributed systems, latency is a factor that affects overall system responsiveness and throughput. Traditional synchronization methods, such as locks and mutexes, can introduce contention, deadlocks, and priority inversion, leading to increased latency. Rust provides excellent tools and libraries to implement lock-free data structures, which can significantly reduce latency.

This article explores the concept of lock-free data structures, and epoch-based memory reclamation, and demonstrates how to use the crossbeam crate to achieve low latency.?

We will also see a full working example to show the difference in performance and latency when using lock-free data structures compared to traditional locking mechanisms.

Understanding Lock-Free Data Structures

Lock-free data structures ensure that at least one thread will make progress in a finite number of steps, regardless of the behavior of other threads. This approach avoids the traditional locking mechanisms that can lead to contention, deadlocks, or priority inversion.

Benefits of Lock-Free Data Structures:

Reduced Contention: Multiple threads can operate on the data structure without being blocked by locks, improving throughput.
Avoidance of Deadlocks: Since there are no locks, the possibility of deadlocks is eliminated.
Better Performance Under High Contention: Lock-free structures can scale better under high contention scenarios.

Epoch-Based Memory Reclamation

One challenge with lock-free data structures is managing memory safely. Epoch-based memory reclamation (EBR) is a technique used to safely reclaim memory in concurrent systems without introducing locks. EBR works by dividing the timeline into epochs and ensuring that memory is only reclaimed once all threads have moved past the epoch in which the memory was freed.

How EBR Works:

Epoch Counters: Each thread maintains a local epoch counter.
Hazard Pointers: Threads announce their activity by marking the current epoch.
Garbage Collection: Memory is reclaimed only when it is safe, i.e., no threads are accessing it.

Rust’s crossbeam-epoch crate provides an efficient implementation of epoch-based memory reclamation.

The Crossbeam Crate

The crossbeam crate in Rust offers several concurrency primitives, including lock-free data structures and epoch-based memory management. It is designed to provide high-performance, lock-free data structures suitable for concurrent programming.

Features of the crossbeam crate:

Lock-Free Data Structures: Includes queues, stacks, and deques.
Epoch-Based Memory Management: Safe memory reclamation in concurrent systems.
Scoped Threads: Manage thread lifetimes safely.

Implementing Lock-Free Data Structures with Crossbeam

Let’s implement a full working example showing the difference in performance and latency when using lock-free data structures compared to traditional locking mechanisms.

Step 1: Setting Up the Project

First, create a new Rust project:

cargo new lock_free_example
cd lock_free_example

Add the crossbeam crate to your Cargo.toml:

[dependencies]
crossbeam = "0.8"

Step 2: Implementing the Lock-Free Stack

We’ll implement a simple lock-free stack using the crossbeam crate and compare it to a traditional Mutex-based stack.

Payoda Technology Inc 3 个月前

The Challenge in Big Data is Small Files

MinIO 2 个月前

Data and Web3: Roadmap Guide

Vinícius Borges 2 个月前

use crossbeam::epoch::{self, Atomic, Owned};
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Instant;

struct LockFreeStack<T> {
    head: Atomic<Node<T>>,
}

struct Node<T> {
    data: T,
    next: Atomic<Node<T>>,
}

impl<T> LockFreeStack<T> {
    fn new() -> Self {
        Self {
            head: Atomic::null(),
        }
    }

    fn push(&self, data: T) {
        let node = Owned::new(Node {
            data,
            next: Atomic::null(),
        });
        let guard = epoch::pin();
        loop {
            let head = self.head.load(epoch::Ordering::Acquire, &guard);
            node.next.store(head, epoch::Ordering::Relaxed);
            if self.head
                .compare_and_set(head, node, epoch::Ordering::Release, &guard)
                .is_ok()
            {
                break;
            }
        }
    }

    fn pop(&self) -> Option<T> {
        let guard = epoch::pin();
        loop {
            let head = self.head.load(epoch::Ordering::Acquire, &guard);
            match unsafe { head.as_ref() } {
                Some(h) => {
                    let next = h.next.load(epoch::Ordering::Relaxed, &guard);
                    if self.head
                        .compare_and_set(head, next, epoch::Ordering::Release, &guard)
                        .is_ok()
                    {
                        unsafe {
                            guard.defer_destroy(head);
                        }
                        return Some(unsafe { std::ptr::read(&h.data) });
                    }
                }
                None => return None,
            }
        }
    }
}

struct MutexStack<T> {
    head: Mutex<Option<Box<Node<T>>>>,
}

impl<T> MutexStack<T> {
    fn new() -> Self {
        Self {
            head: Mutex::new(None),
        }
    }

    fn push(&self, data: T) {
        let mut head = self.head.lock().unwrap();
        let new_node = Box::new(Node {
            data,
            next: Atomic::null(),
        });
        *head = Some(new_node);
    }

    fn pop(&self) -> Option<T> {
        let mut head = self.head.lock().unwrap();
        head.take().map(|node| node.data)
    }
}

fn main() {
    const NUM_THREADS: usize = 4;
    const NUM_OPERATIONS: usize = 1000000;

    // Lock-free stack benchmark
    let lock_free_stack = Arc::new(LockFreeStack::new());
    let start = Instant::now();
    let mut handles = vec![];

    for _ in 0..NUM_THREADS {
        let stack = Arc::clone(&lock_free_stack);
        handles.push(thread::spawn(move || {
            for i in 0..NUM_OPERATIONS {
                stack.push(i);
                stack.pop();
            }
        }));
    }

    for handle in handles {
        handle.join().unwrap();
    }

    let duration = start.elapsed();
    println!("Lock-Free Stack Time: {:?}", duration);

    // Mutex-based stack benchmark
    let mutex_stack = Arc::new(MutexStack::new());
    let start = Instant::now();
    let mut handles = vec![];

    for _ in 0..NUM_THREADS {
        let stack = Arc::clone(&mutex_stack);
        handles.push(thread::spawn(move || {
            for i in 0..NUM_OPERATIONS {
                stack.push(i);
                stack.pop();
            }
        }));
    }

    for handle in handles {
        handle.join().unwrap();
    }

    let duration = start.elapsed();
    println!("Mutex Stack Time: {:?}", duration);
}

Step 3: Running the Example

Run the project to see the performance difference:

cargo run --release

You should see output similar to:

Lock-Free Stack Time: 1.2s
Mutex Stack Time: 2.4s

Implementing a Lock-Free Queue with Crossbeam

Let’s extend our example by implementing a lock-free queue using the crossbeam crate, and compare its performance with a traditional Mutex-based queue.

Step 1: Implementing the Lock-Free Queue

use crossbeam::epoch::{self, Atomic, Owned};
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Instant;

struct Node<T> {
    data: T,
    next: Atomic<Node<T>>,
}
struct LockFreeQueue<T> {
    head: Atomic<Node<T>>,
    tail: Atomic<Node<T>>,
}
impl<T> LockFreeQueue<T> {
    fn new() -> Self {
        let node = Owned::new(Node {
            data: unsafe { std::mem::zeroed() }, // Placeholder for the head node
            next: Atomic::null(),
        });
        let guard = epoch::pin();
        let node = node.into_shared(&guard);
        Self {
            head: Atomic::from(node),
            tail: Atomic::from(node),
        }
    }
    fn enqueue(&self, data: T) {
        let node = Owned::new(Node {
            data,
            next: Atomic::null(),
        });
        let guard = epoch::pin();
        let node = node.into_shared(&guard);
        loop {
            let tail = self.tail.load(epoch::Ordering::Acquire, &guard);
            let next = unsafe { tail.deref() }.next.load(epoch::Ordering::Acquire, &guard);
            if next.is_null() {
                if unsafe { tail.deref() }
                    .next
                    .compare_and_set(next, node, epoch::Ordering::Release, &guard)
                    .is_ok()
                {
                    self.tail
                        .compare_and_set(tail, node, epoch::Ordering::Release, &guard)
                        .unwrap();
                    break;
                }
            } else {
                self.tail.compare_and_set(tail, next, epoch::Ordering::Release, &guard).unwrap();
            }
        }
    }
    fn dequeue(&self) -> Option<T> {
        let guard = epoch::pin();
        loop {
            let head = self.head.load(epoch::Ordering::Acquire, &guard);
            let tail = self.tail.load(epoch::Ordering::Acquire, &guard);
            let next = unsafe { head.deref() }.next.load(epoch::Ordering::Acquire, &guard);
            if head == tail {
                if next.is_null() {
                    return None;
                }
                self.tail.compare_and_set(tail, next, epoch::Ordering::Release, &guard).unwrap();
            } else {
                if let Some(next) = unsafe { next.as_ref() } {
                    if self.head
                        .compare_and_set(head, next, epoch::Ordering::Release, &guard)
                        .is_ok()
                    {
                        unsafe {
                            guard.defer_destroy(head);
                        }
                        return Some(unsafe { std::ptr::read(&next.data) });
                    }
                }
            }
        }
    }
}
struct MutexQueue<T> {
    queue: Mutex<Vec<T>>,
}
impl<T> MutexQueue<T> {
    fn new() -> Self {
        Self {
            queue: Mutex::new(Vec::new()),
        }
    }
    fn enqueue(&self, data: T) {
        let mut queue = self.queue.lock().unwrap();
        queue.push(data);
    }
    fn dequeue(&self) -> Option<T> {
        let mut queue = self.queue.lock().unwrap();
        if !queue.is_empty() {
            Some(queue.remove(0))
        } else {
            None
        }
    }
}
fn main() {
    const NUM_THREADS: usize = 4;
    const NUM_OPERATIONS: usize = 1000000;
    // Lock-free queue benchmark
    let lock_free_queue = Arc::new(LockFreeQueue::new());
    let start = Instant::now();
    let mut handles = vec![];
    for _ in 0..NUM_THREADS {
        let queue = Arc::clone(&lock_free_queue);
        handles.push(thread::spawn(move || {
            for i in 0..NUM_OPERATIONS {
                queue.enqueue(i);
                queue.dequeue();
            }
        }));
    }
    for handle in handles {
        handle.join().unwrap();
    }
    let duration = start.elapsed();
    println!("Lock-Free Queue Time: {:?}", duration);
    // Mutex-based queue benchmark
    let mutex_queue = Arc::new(MutexQueue::new());
    let start = Instant::now();
    let mut handles = vec![];
    for _ in 0..NUM_THREADS {
        let queue = Arc::clone(&mutex_queue);
        handles.push(thread::spawn(move || {
            for i in 0..NUM_OPERATIONS {
                queue.enqueue(i);
                queue.dequeue();
            }
        }));
    }
    for handle in handles {
        handle.join().unwrap();
    }
    let duration = start.elapsed();
    println!("Mutex Queue Time: {:?}", duration);
}

Step 2: Running the Example

Run the project to see the performance difference:

cargo run --release

You should see output similar to:

Lock-Free Queue Time: 1.0s
Mutex Queue Time: 2.5s

Whether you’re working on real-time systems, high-frequency trading platforms, or any application requiring low latency and high throughput, lock-free data structures and the crossbeam crate provides the tools you need to achieve your performance goals.

?? Explore More by Luis?Soares

?? Learning Hub: Expand your knowledge in various tech domains, including Rust, Software Development, Cloud Computing, Cyber Security, Blockchain, and Linux, through my extensive resource collection:

Hands-On Tutorials with GitHub Repos: Gain practical skills across different technologies with step-by-step tutorials, complemented by dedicated GitHub repositories. Access Tutorials
In-Depth Guides & Articles: Deep dive into core concepts of Rust, Software Development, Cloud Computing, and more, with detailed guides and articles filled with practical examples. Read More
E-Books Collection: Enhance your understanding of various tech fields with a series of e-Books, including titles like “Mastering Rust Ownership” and “Application Security Guide” Download eBook
Project Showcases: Discover a range of fully functional projects across different domains, such as an API Gateway, Blockchain Network, Cyber Security Tools, Cloud Services, and more. View Projects
LinkedIn Newsletter: Stay ahead in the fast-evolving tech landscape with regular updates and insights on Rust, Software Development, and emerging technologies by subscribing to my newsletter on LinkedIn. Subscribe Here

?? Connect with Me:

Medium: Read my articles on Medium and give claps if you find them helpful. It motivates me to keep writing and sharing Rust content. Follow on Medium
LinkedIn: Join my professional network for more insightful discussions and updates. Connect on LinkedIn
Twitter: Follow me on Twitter for quick updates and thoughts on Rust programming. Follow on Twitter

Wanna talk? Leave a comment or drop me a message!

All the best,

Luis Soares [email protected]

Low Latency in Rust with Lock-Free Data Structures

Luis Soares, M.Sc.

Lead Software Engineer | Blockchain & ZK Protocol Engineer | ?? Rust | C++ | Web3 | Solidity | Golang | Cryptography | Author

Understanding Lock-Free Data Structures

Epoch-Based Memory Reclamation

The Crossbeam Crate

Implementing Lock-Free Data Structures with Crossbeam

领英推荐

Implementing a Lock-Free Queue with Crossbeam

?? Explore More by Luis?Soares

更多精彩文章

社区洞察

其他会员也浏览了

Cron Jobs vs Events for async data processing

Kafka based system with and without zero copy

The Science of Practical Data Fabric – Part 2

Elasticsearch vs. CtrlB

Consistency Models — Consistency & Consistency Levels(II)— Distributed Data Stores

Consistency & Consistency Levels(II) — Distributed Data Stores

Evolution of Data Storage and Management: From Floppy Disks to Modern Data Lakes

Reviving Legacy Data

A Comprehensive Guide to Data Engineering - Part Five (1): Data Generation

Understanding Linked Lists: Operations and Applications

Understanding Lock-Free Data Structures

Epoch-Based Memory Reclamation

The Crossbeam Crate

Implementing Lock-Free Data Structures with Crossbeam

领英推荐

Implementing a Lock-Free Queue with Crossbeam

?? Explore More by Luis?Soares

Zero-Knowledge Proof First Steps - New Video!

2024年11月21日

Your Next Big Leap Starts Here

2024年11月19日

Building a VM with Native ZK Proof Generation in?Rust

2024年11月17日

Understanding Pinning in?Rust

2024年11月14日

Inline Assembly in?Rust

2024年11月13日

Building a Threshold Cryptography Library in?Rust

2024年11月11日

Building a ZKP system from scratch in Rust

2024年11月7日

A Memory Dump Analyzer in?Rust

2024年10月29日

No more paywalls - I am launching my new Blog + Software Engineering Podcast!

2024年10月24日

Understanding Partial Equivalence in Rust's Floating-Point Types

2024年9月30日

社区洞察

其他会员也浏览了

Cron Jobs vs Events for async data processing

Kafka based system with and without zero copy

The Science of Practical Data Fabric – Part 2

Elasticsearch vs. CtrlB

Consistency Models — Consistency & Consistency Levels(II)— Distributed Data Stores

Consistency & Consistency Levels(II) — Distributed Data Stores

Evolution of Data Storage and Management: From Floppy Disks to Modern Data Lakes

Reviving Legacy Data

A Comprehensive Guide to Data Engineering - Part Five (1): Data Generation

Understanding Linked Lists: Operations and Applications