登录查看更多内容

Why C++ Threads Matter Despite the Existence of POSIX Threads

Deepesh Menon

Principal Engineer | Heterogeneous Computing Systems | Virtualization | Embedded Systems

发布日期: 2024年10月30日

When the C++ standards committee introduced built-in language support for threading with C++11, many developers, including myself, asked, "Why do we need C++ threads when POSIX threads have served us well for years?" This curiosity stemmed from the question of whether the added complexity and runtime cost justified the shift from traditional POSIX threads to a new C++ threading model. After some exploration, I’ve come to understand the advantages C++ threads offer, and I'd like to share my insights with those who may have pondered the same question.

Disclaimer: This article is intended as a quick and dirty introduction rather than a fully polished guide. The code snippets here are just for illustration and may require refinement for real-world use. Treat them only as high-level pseudo-code :)

Memory Barriers: The Backbone of Thread Safety

To understand the value of C++ threading, it’s essential to first grasp memory barriers, a crucial concept in multithreaded programming. Modern CPUs, particularly ARM processors, often use out-of-order execution to improve performance. In such architectures, instructions may be executed in a different order than written in code, allowing the CPU to make full use of its pipelines. While this optimization boosts speed, it can cause issues in multithreaded environments, as the order of operations in memory may not match the expected program flow.

For instance, consider a shared variable flag used as a signal between two threads. In one thread, we set up some data, then set flag to 1 to signal the data is ready. In another thread, we check flag to see if the data can be read. Without memory barriers, the compiler or CPU may reorder these instructions, leading to unpredictable behavior.

Example Without Memory Barriers

// Thread 1: Writer
data = 42;       // Step 1: Write data
flag = 1;        // Step 2: Signal data is ready

// Thread 2: Reader
if (flag == 1) {
    // Step 3: Check if data is ready
    use(data);   // Step 4: Use the data
}

Without memory barriers, an ARM CPU may reorder these operations:

The CPU could set flag = 1 before data = 42, leading Thread 2 to access data before it’s actually ready.
Since instructions are optimized in hardware, simply ordering them correctly in software doesn’t guarantee their sequence in memory.

Solution: Using Memory Barriers

Memory barriers ensure that the order of operations is preserved across threads. By enforcing specific points in the code where memory operations cannot be reordered, memory barriers protect against these hazards, especially on out-of-order architectures like ARM.

Here’s how we can use C++ std::atomic to enforce barriers automatically:

#include <atomic>

std::atomic<int> data{0};
std::atomic<int> flag{0};

// Thread 1: Writer
data.store(42, std::memory_order_relaxed);   // Write data
flag.store(1, std::memory_order_release);    // Signal data is ready

// Thread 2: Reader
if (flag.load(std::memory_order_acquire) == 1) {
    int result = data.load(std::memory_order_relaxed);
    use(result);   // Use the data safely
}

Here,

std::memory_order_release on flag in Thread 1 ensures that all prior writes (such as data = 42) complete before flag is updated.
std::memory_order_acquire on flag in Thread 2 prevents subsequent operations from executing until the value of flag is read, ensuring data is valid by the time it is used.

By adding these memory barriers, C++ std::atomic makes sure the code works as expected across different CPU architectures without manual intervention.

1. C++ Standard Memory Model and Atomics

One of the fundamental reasons for adding threading support in C++ was to introduce a standardized memory model. Before C++11, threading in C++ was largely unregulated, and developers often relied on platform-specific solutions like POSIX threads. C++11's std::atomic brought a standardized, cross-platform approach to atomic operations, enabling portable code with built-in memory barriers that ensure visibility and ordering of operations across threads.

In C++:

#include <atomic>

std::atomic<int> shared_data{0};  // Atomic variable

void increment() {
    shared_data.fetch_add(1, std::memory_order_relaxed);
}

Here, the compiler takes care of necessary memory barriers, ensuring that shared_data operations are visible across threads without race conditions. POSIX, on the other hand, lacks an inherent memory model, leaving developers to handle barriers themselves.

Luis Soares, M.Sc. 1 年前

Rust under the hood: the jemalloc Memory Allocation…

Luis Soares, M.Sc. 1 年前

AddressSanitizer (ASan): A Memory Error Detective

Harsahib Singh 7 个月前

2. C11 Standard for Pure C Projects

For projects written purely in C, the C11 standard offers a workaround with <stdatomic.h>, which provides atomic operations similar to C++. This addition is especially useful for developers who want to avoid C++ runtime dependencies but still require thread-safe operations.

Example in C11: (need to really confirm on this :) ?)

#include <stdatomic.h>

atomic_int shared_data = 0;

void increment() {
    atomic_fetch_add(&shared_data, 1);  // Atomic increment
}

While <stdatomic.h> narrows the gap between POSIX threads and C++ threads, it is often unavailable in legacy C environments, where developers must rely on compiler-specific intrinsics or manual memory barriers.

3. POSIX Threads with Compiler Intrinsics

For environments where C11 isn’t available, GCC and Clang provide atomic built-ins, such as __sync_fetch_and_add, allowing POSIX threads to manage atomicity and memory synchronization. Though this approach can achieve thread-safe operations, it depends on compiler-specific extensions, which may reduce portability.

Example with GCC/Clang built-ins:

#include <stdio.h>
#include <pthread.h>

volatile int shared_data = 0;

void* increment(void* arg) {
    __sync_fetch_and_add(&shared_data, 1);
    return NULL;
}

4. Manual Memory Barriers: The Cost of Low-Level Control

In some minimal systems, direct memory barrier instructions are the only option. However, managing these barriers manually is complex and architecture-specific, requiring expertise with assembly instructions. For instance:

On x86, a full memory barrier can be added with asm volatile ("mfence" ::: "memory");.
On ARM, __asm__ volatile("dmb ish" : : : "memory"); provides similar functionality.

While this approach provides maximum control and minimal runtime cost, it is error-prone and difficult to maintain.

5. Why C++ Threads Are Worth the Overhead

C++ threads offer a streamlined, standardized way to handle threading and synchronization across platforms, reducing the need for low-level management of memory barriers. The abstraction provided by std::thread and std::atomic simplifies development and ensures that cross-platform code behaves consistently.

For pure C projects or legacy systems, options like C11’s <stdatomic.h>, compiler intrinsics, and manual barriers provide alternatives, but these solutions require careful handling. C++ threads, on the other hand, wrap these complexities, allowing developers to focus on functionality rather than intricate synchronization details.

In summary:

C++ threads offer portability and simplicity with a built-in memory model.
C11 <stdatomic.h> bridges the gap in pure C projects.
Compiler intrinsic provide thread safety in environments without C11.
Manual memory barriers remain an option for low-level control but demand expertise.

C++ threading models, while adding runtime complexity, answer the need for a standardized, cross-platform approach to multithreading. After exploring these layers of threading in C and C++, I now appreciate why the C++ committee included them. By hiding complexities and offering a reliable memory model, C++ threads make multithreading both safer and more accessible across diverse platforms.

#CPlusPlus #Threading #POSIX #Multithreading #Programming #Concurrency

Muqaddas Iqbal

顶级品牌专家 |社交媒体营销专家@70xvenue |社交媒体管理、平面设计

3 周

Great insights, Deepesh! Your analysis on the necessity of C++ threading is both thought-provoking and timely. Looking forward to engaging more on this vital topic!

1 次回应

Venkatesh Ummadi Setty

Senior Software Engineer at Tata Elxsi

3 周

Useful tips

1 次回应

Patrick BRUNET

Développeur logiciel industriel et embarqué, C, C++, Qt, C#...

3 周

Explicit synchronization using specific primitives is mandatory anyway. At least due to caching, but also for human understanding...

3 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Why C++ Threads Matter Despite the Existence of POSIX Threads

Deepesh Menon

Principal Engineer | Heterogeneous Computing Systems | Virtualization | Embedded Systems

Memory Barriers: The Backbone of Thread Safety

Solution: Using Memory Barriers

1. C++ Standard Memory Model and Atomics

领英推荐

2. C11 Standard for Pure C Projects

3. POSIX Threads with Compiler Intrinsics

4. Manual Memory Barriers: The Cost of Low-Level Control

5. Why C++ Threads Are Worth the Overhead

更多精彩文章

社区洞察

其他会员也浏览了

C++ compilation steps

HLS: My favorite zombie

WASAM compilation

Working with Buildroot to create, compile and run custom programs on target hardware...

Understanding Qualifiers in C++: A Complete Guide

Internal Workings of Go's Concurrency Model

C++20: The Advantages of Modules

Constant Folding in C++

SSA (Static Single Assignment) Property of Intermediate Representation.

CppCon 2018

Memory Barriers: The Backbone of Thread Safety

Solution: Using Memory Barriers

1. C++ Standard Memory Model and Atomics

领英推荐

2. C11 Standard for Pure C Projects

3. POSIX Threads with Compiler Intrinsics

4. Manual Memory Barriers: The Cost of Low-Level Control

5. Why C++ Threads Are Worth the Overhead

Spinlocks vs. Semaphores: Understanding Synchronization Mechanisms

2024年11月25日

Data Structure Selection in Embedded Systems: Maximizing Cache Efficiency and Security

2024年11月5日

C++ References: The Timeless Mark of Simplicity and Elegance

2024年11月2日

Leveraging ARM v9 Confidential Compute Architecture (CCA) for Secure and Isolated Avionics Integrated Modular Avionics (IMA) Applications

2024年10月31日

Cache-Aware Memory Allocation Techniques for RTOS

2024年10月29日

Operating System Synchronization Primitives: Mutex Locks

2024年10月28日

Understanding Spinlocks - How CPU supports Atomic locks

2024年10月27日

The Shift from x86 to ARM in Laptops and Desktops: What's Driving the Trend?

2024年10月20日

Operating System Fundamentals, Part 4 - The Origins

2024年10月17日

Operating System Fundamentals: Part 3 – Software Essentials!

2024年10月16日

社区洞察

其他会员也浏览了

C++ compilation steps

HLS: My favorite zombie

WASAM compilation

Working with Buildroot to create, compile and run custom programs on target hardware...

Understanding Qualifiers in C++: A Complete Guide

Internal Workings of Go's Concurrency Model

C++20: The Advantages of Modules

Constant Folding in C++

SSA (Static Single Assignment) Property of Intermediate Representation.

CppCon 2018