登录查看更多内容

C++ Core Guidelines: Rules to Concurreny and Parallelism

Rainer Grimm

Trainer at Modernes C++

发布日期: 2018年4月23日

This is a cross-post from www.ModernesCpp.com.

C++11 is the first C++ standard that deals with concurrency. The basic building block for concurrency is a thread; therefore, most of the rules are explicitly about threads. This changed dramatically with C++17.

With C++17 we got the parallel algorithms of the Standard Template Library (STL). That means, most of the algorithms of the STL can be executed sequential, parallel, or vectorized. For the curious reader: I have already written two posts to the parallel STL. The post Parallel Algorithms of the Standard Template Library explains the execution policy which you can use the run an existing algorithm sequential, parallel, or parallel C++17 also gave new algorithms which are meant to run in parallel or vectorised. Here are the details: C++17: New Parallel Algorithms of the Standard Template Library.

The concurrency story in C++ goes on. With C++20 we can hope for extended futures, coroutines, transactions, and more. From the bird's eye view, the concurrency facilities of C++11 and C++14 are only the implementations details on which the higher abstraction of C++17 and C++20 are based on. Here is a series of posts about out concurrent future in C++20.

Said that the rules are mainly about threads because neither GCC nor Clang or MSVC has fully implemented the parallel algorithms of the STL. There cannot be best practices written to features that are not available (parallel STL) or even not standardised.

This is the first rule to keep in mind when you read the rules. These rules are about available multithreading in C++11 and C++14. The second rule to keep in mind is that multithreading is very challenging. This means the rules want to give guidance to the novice and not to the experts in this domain. The rules to the memory model will follow in the future.

Now, let's start and dive into the first rule.

CP.1: Assume that your code will run as part of a multi-threaded program

I was astonished I read this rule the first time. Why should I optimise for the special case? To make it clear this rule is mainly about code that is used in libraries, not about the application. And the experience shows that library code is often reused. This means you maybe optimise for the general case, which is fine.

To make the point of the rule clear, here is the small example.

 double cached_computation(double x)
{
    static double cached_x = 0.0;                       // (1)
    static double cached_result = COMPUTATION_OF_ZERO;  // (2)
    double result;

    if (cached_x == x)                                  // (1)
        return cached_result;                           // (2)
    result = computation(x);
    cached_x = x;                                       // (1)
    cached_result = result;                             // (2)
    return result;
}

The function cached_computation is totally fine if it will run in a single-threaded environment. This will not hold for a multithreading environment because the static variables cached_x (1) and cached_result (2) can be used simultaneously be many threads and they are modified during their usage. The C++11 standard adds multithreading semantic to static variables with block scope such as cached_x and cached_result. Static variables with block scope are initialised in C++11 in a thread-safe way.

This is fine but will not help in our case. We will get a data race if we invoke cached_computation simultaneously from many threads. The notion of a data race is quite important in multithreading in C++; therefore, let me write about it.

A data race is a situation, in which at least two threads access a shared variable at the same time. At least one thread tries to modify the variable.

The rest is quite simple. If you have a data race in your program, your program has undefined behaviour. Undefined behaviour means, you can not reason anymore about your program because all can happen. I mean all. In my seminars, I often say: If your program has undefined behaviour, it has cache-fire semantic. Even your computer can cache fire.

If you read the definition of data race quite carefully, you will notice that shared mutable state is necessary for having a data race. Here is a picture to make this observation quite obvious.

So, what can you do to get rid of the data race? Making the static variables cached_x (1) and cached_result (2) immutable (const) makes no sense. This means both static should not be shared. Here are a few ways to achieve this.

Protect both static by their own lock.
Use one lock to protect the entire critical region.
Protect the call to the function cached_computation by a lock.
Make both static thread_local. tread_local guarantees that each thread gets its variable cached_x and cached_result. Such as a static variable is bound to the lifetime of the main-thread, a thread_local variable is bound to the lifetime of its thread.

Here are the variations 1, 2, 3, and 4.

std::mutex m_x;
std::mutex m_result;
double cached_computation(double x){                // (1)
    static double cached_x = 0.0;
    static double cached_result = COMPUTATION_OF_ZERO;
	
    double result;
    {
	std::scoped_lock(m_x, m_result);
        if (cached_x == x) return cached_result;
    }
    result = computation(x);
    {
	std::lock_guard<std::mutex> lck(m_x);
        cached_x = x;
    }
    { 
	std::lock_guard<std::mutex> lck(m_result);  
        cached_result = result;
    }
    return result;
}

std::mutex m;
double cached_computation(double x){                // (2)
    static double cached_x = 0.0;
    static double cached_result = COMPUTATION_OF_ZERO;
    double result;
    {
        std::lock_guard<std::mutex> lck(m);
	if (cached_x == x) return cached_result;
	result = computation(x);
	cached_x = x;
	cached_result = result;
    }
    return result;
}

std::mutex cachedComputationMutex;                  // (3)
{
    std::lock_guard<std::mutex> lck(cachedComputationMutex);
    auto cached = cached_computation(3.33);
}


double cached_computation(double x){                // (4)
    thread_local double cached_x = 0.0;
    thread_local double cached_result = COMPUTATION_OF_ZERO;
    double result;

    if (cached_x == x) return cached_result;
    result = computation(x);
    cached_x = x;
    cached_result = result;
    return result;
}

First, the C++11 standard guarantees that static variables are initialised in a thread-safe way; therefore, I have not to protect their initialisation in all programs.

This version is a little bit tricky because I have to acquire both locks in an atomic step. C++17 supports std::scoped_lock which can lock an arbitrary number of mutexes in an atomic step. In C++11 you have to use instead of an std::unqiue_lock in combination with the function std: My previous post Prefer Locks to Mutexes provides you more details. This solution has a race condition on cached_x and cached_result because they must be accessed atomically.
Version 2 uses a more coarse-grained locking. Usually, you should not use coarse-grained lock such in version but instead use fine-grained locking but in this use-case, it may be fine.
This is the most coarse-grained solution because the entire function is locked. Of course, the downside is that the user of the function is responsible for the synchronisation. In general, that is a bad idea.
Just make the static variables thread_local and you are done

In the end, its a question of performance and your users. Therefore try each variation, measure, and think about the people who should use and maintain your code.

What's next?

This post was just the starting point through a long journey of rules to concurrency in C++. In the next post, I will take about threads and shared state.

要查看或添加评论，请登录

Rainer Grimm的更多文章

My ALS Journey (21/n): ALS Fundraiser by Jen and Jason

2025年3月12日

My ALS Journey (21/n): ALS Fundraiser by Jen and Jason

Today, I want to present you something special. My ALS Journey so far ALS Fundraiser by Jen and Jason https://x.

1 条评论
A Lock-Free Stack: A Hazard Pointer Implementation

2025年3月10日

A Lock-Free Stack: A Hazard Pointer Implementation

Hazard Pointers solve all issues of the previous implementation: A Lock-Free Stack: A Simple Garbage Collector. From…
A Lock-Free Stack: A Simple Garbage Collector

2025年2月24日

A Lock-Free Stack: A Simple Garbage Collector

My next lock-free stack includes a simple garbage collector. I discussed the concurrent execution of more than one…
A Lock-Free Stack: Atomic Smart Pointer

2025年2月17日

A Lock-Free Stack: Atomic Smart Pointer

The easiest way to solve this memory leak issue from the last post is to use a std::shared_ptr. Atomic Smart Pointer…
A Lock-Free Stack: A Complete Implementation

2025年2月10日

A Lock-Free Stack: A Complete Implementation

My last lock-free stack implementation was incomplete. It only supported push operations.

2 条评论
My ALS Journey (20/n): Aids

2025年2月9日

My ALS Journey (20/n): Aids

Today, I would like to introduce all the important aids that allow me and Beatrix to get through the day My ALS…

2 条评论
My Next Mentoring Program: “Generic Programming (Templates) with C++” starts

2025年2月4日

My Next Mentoring Program: “Generic Programming (Templates) with C++” starts

My next mentoring program, “Generic Programming (Templates) with C++,” starts on February 28th. Registration is open.
A Lock-Free Stack: A Simplified Implementation

2025年2月3日

A Lock-Free Stack: A Simplified Implementation

Today, I continue my mini story about lock-free data structures. General Considerations From the outside, the caller’s…

1 条评论
Deferred Reclamation in C++26: Read-Copy Update and Hazard Pointers

2025年1月27日

Deferred Reclamation in C++26: Read-Copy Update and Hazard Pointers

A common problem in concurrency is the so-called ABA problem. That means you read a variable twice, which returns the…
std::format Extension

2025年1月20日

std::format Extension

Displaying the address of an arbitrary pointer in C++ 20 fails but succeeds with C++26. C++20 Only void, const void…

See all articles

C++ Core Guidelines: Rules to Concurreny and Parallelism

Rainer Grimm

Trainer at Modernes C++

CP.1: Assume that your code will run as part of a multi-threaded program

What's next?

Rainer Grimm的更多文章

社区洞察

其他会员也浏览了

C++ transactional memory

Explaining OutOfMemoryError on Overhead Limit Exceeded

Node.js Worker Threads – In a Nutshell

Null: The Billion Dollar Mistake

Parallel Sort

Evolution of Programming Languages and processor architectures : DSL to CYOL, Processors to DSA ( Domain Specific architectures )

Not All Signed Types Are Created Equal: A C/C++ Fun Fact!

Symmetric and Commutative Operators in C++

Deep Dive into Go: Control Statements, Packages, Declarations, and Operators

Strings : C++ Vs Rust vs Kotlin

CP.1: Assume that your code will run as part of a multi-threaded program

What's next?

Rainer Grimm的更多文章

My ALS Journey (21/n): ALS Fundraiser by Jen and Jason

A Lock-Free Stack: A Hazard Pointer Implementation

A Lock-Free Stack: A Simple Garbage Collector

A Lock-Free Stack: Atomic Smart Pointer

A Lock-Free Stack: A Complete Implementation

My ALS Journey (20/n): Aids

My Next Mentoring Program: “Generic Programming (Templates) with C++” starts

A Lock-Free Stack: A Simplified Implementation

Deferred Reclamation in C++26: Read-Copy Update and Hazard Pointers

std::format Extension

社区洞察

其他会员也浏览了

C++ transactional memory

Explaining OutOfMemoryError on Overhead Limit Exceeded

Node.js Worker Threads – In a Nutshell

Null: The Billion Dollar Mistake

Parallel Sort

Evolution of Programming Languages and processor architectures : DSL to CYOL, Processors to DSA ( Domain Specific architectures )

Not All Signed Types Are Created Equal: A C/C++ Fun Fact!

Symmetric and Commutative Operators in C++

Deep Dive into Go: Control Statements, Packages, Declarations, and Operators

Strings : C++ Vs Rust vs Kotlin