C++ Core Guidelines: Rules to Performance

C++ Core Guidelines: Rules to Performance

This is a cross-post from www.ModernesCpp.com.

Before I write about the rules of performance I will do a very simple job. Accessing the elements of a container one by one.

 Here is the last rule to arithmetic.

ES.107: Don’t use unsigned for subscripts, prefer :

Did I say that this is a simple job? Honestly, this was a lie. See what all can go wrong. Here is an example of an std: .

vector<int> vec = /*...*/;

for (int i = 0; i < vec.size(); i += 2)                    // may not be big enough (2)
cout << vec[i] << '\n';
for (unsigned i = 0; i < vec.size(); i += 2)               // risk wraparound       (3)
cout << vec[i] << '\n';
for (auto i = 0; i < vec.size(); i += 2)                   // may not be big enough (2)
cout << vec[i] << '\n';
for (vector<int>::size_type i = 0; i < vec.size(); i += 2) // verbose               (1)
cout << vec[i] << '\n';
for (auto i = vec.size()-1; i >= 0; i -= 2)                // bug                   (4) 
cout << vec[i] << '\n';
for (int i = vec.size()-1; i >= 0; i -= 2)                 // may not be big enough (2)
cout << vec[i] << '\n';

 Scary? Right! Only line (1) is correct. It may happen in the lines (2) that the variable i is too small. The result may be an overflow. This will not hold for the line (3) because i is unsigned. Instead of an overflow, you will get a modulo operation. I wrote about this nice effect in my last post: C++ Core Guidelines: Rules to Statements and Arithmetic. To be more specific, it ES.106.

Line 4 is left. This is my favourite on. What is the problem? The problem is that vec.size() is of type std::size_t. std::size_t is an unsigned type and can, therefore, not represent negative numbers. Imagine what would happen if the vector is empty. This means that vec.size() -1 is -1. The result is that we get the maximum value of type std::size_t.

The program index.cpp shows this strange behaviour.

 // index.cpp

#include <iostream>
#include <vector>

int main(){
    std::cout << std::endl;
    std::vector<int> vec{};
    auto ind1 = vec.size() - 1 ;
    int ind2 = vec.size() -1 ;
    std::cout << "ind1: " << ind1 << std::endl;
    std::cout << "ind2: " << ind2 << std::endl;
    std::cout << std::endl;

  And here is the output:

The guidelines suggest that the variable i should be of type gsl::index.

for (gsl::index i = 0; i < vec.size(); i += 2)             // okcout << vec[i] << '\n';
for (gsl::index i = vec.size()-1; i >= 0; i -= 2)          // okcout << vec[i] << '\n';

If this not an option for you, use the type std::vector<int>::size_type for i.

Performance is the domain of C++! Right? Therefore I was quite curious to write about the rules to performance. But this is hardly possible because most of the rules lack the beef. They just consist of a title and a reason. Sometimes even the reason is missing.

Anyway. Here are the first rules:

 Instead of writing general remarks to general rules I will provide a few examples this rules. Let's start with rules Per.4, Per.5, and Per.6

Per.4: Don’t assume that complicated code is necessarily faster than simple code

Per.5: Don’t assume that low-level code is necessarily faster than high-level code

Per.6: Don’t make claims about performance without measurements

Before I continue to write I have to make a disclaimer: I do not recommend to use the singleton pattern. I only want to show that complicated and low-level code does not always pay off. To prove my point I have to measure the performance.

Long, long ago I wrote about the thread-safe initialisation of the singleton pattern in my post: Thread-safe initialization of a singleton.The key idea of the post was, to invoke the singleton pattern 40.000.000 times from four threads and measure the execution time. The singleton pattern will be initialised in a lazy fashion; therefore, the first call has to initialise it.

I implemented the singleton pattern in various ways. I did it with a std::lock_guard and the function std::call_once in combination with the std::once_flag. I did it with a static variable. I even used atomics and broke the sequential consistency for performance reasons.

To make my pointer clear. I want to show you the easiest implementation and the most challenging one.

The easiest implementation is the so-called Meyers singleton. It is thread-safe because of the C++11-standard guarantees that a static variable with block scope will be initialised in a thread-safe way.

 // singletonMeyers.cpp

#include <chrono>
#include <iostream>
#include <future>

constexpr auto tenMill= 10000000;

class MySingleton{
  static MySingleton& getInstance(){
    static MySingleton instance;                         // (1)
// volatile int dummy{};return instance;
  MySingleton()= default;
  ~MySingleton()= default;
  MySingleton(const MySingleton&)= delete;
  MySingleton& operator=(const MySingleton&)= delete;


std::chrono::duration<double> getTime(){

  auto begin= std::chrono::system_clock::now();
  for (size_t i= 0; i < tenMill; ++i){
      MySingleton::getInstance();                        // (2)
  return std::chrono::system_clock::now() - begin;

int main(){
    auto fut1= std::async(std::launch::async,getTime);
    auto fut2= std::async(std::launch::async,getTime);
    auto fut3= std::async(std::launch::async,getTime);
    auto fut4= std::async(std::launch::async,getTime);
    auto total= fut1.get() + fut2.get() + fut3.get() + fut4.get();
    std::cout << total.count() << std::endl;


Line (1) uses the guarantee of the C++11-runtime that the singleton will be initialised in a thread-safe way. Each of the four threads in the main function invokes 10 million times the singleton in line (2). In total, this makes 40 million calls.

But I can do better. This time I use atomics to make the singleton pattern thread-safe. My implementation is based on the infamous double-checked locking pattern. For the sake of simplicity, I will only show the implementation of the class MySingleton.

class MySingleton{
  static MySingleton* getInstance(){
    MySingleton* sin= instance.load(std::memory_order_acquire);
    if ( !sin ){
      std::lock_guard<std::mutex> myLock(myMutex);
      sin= instance.load(std::memory_order_relaxed);
      if( !sin ){
        sin= new MySingleton();
    // volatile int dummy{};return sin;
  MySingleton()= default;
  ~MySingleton()= default;
  MySingleton(const MySingleton&)= delete;
  MySingleton& operator=(const MySingleton&)= delete;

  static std::atomic<MySingleton*> instance;
  static std::mutex myMutex;

std::atomic<MySingleton*> MySingleton::instance;
std::mutex MySingleton::myMutex;

Maybe you heard that the double-checked locking pattern is broken.  Of course, not my implementation! If you don't believe me, prove it to me. First, you have to study the memory model, think about the acquire-release semantic and think about the synchronisation and ordering constraint that will hold in this implementation. This is not an easy job. But you know, high sophisticated code pays off.

Damn. I forgot the rule Per.6: Here are the performance numbers for the Meyers singleton on Linux. I compiled the program with maximum optimisation. The numbers on Windows were in the same ballpark.

Now I'm curious. What are the numbers for my highly sophisticated code? Let's see which performance we will get with atomics.

50% percent slower! 50% percent slower and we event don't know if the implementation is correct. Disclaimer: The implementation is correct.

Indeed, the Meyers singleton was the fastest and the easiest way to get a thread-safe implementation of the singleton pattern. If you are curious about the details, read my post: Thread-safe initialization of a singleton.

What's next?

There are more than 10 rules to performance left in the guidelines. Although it is quite challenging to write about such general rules I have for my next post a few ideas in mind.


Rainer Grimm的更多文章

  • My ALS Journey (21/n): ALS Fundraiser by Jen and Jason

    My ALS Journey (21/n): ALS Fundraiser by Jen and Jason

    Today, I want to present you something special. My ALS Journey so far ALS Fundraiser by Jen and Jason https://x.

    1 条评论
  • A Lock-Free Stack: A Hazard Pointer Implementation

    A Lock-Free Stack: A Hazard Pointer Implementation

    Hazard Pointers solve all issues of the previous implementation: A Lock-Free Stack: A Simple Garbage Collector. From…

  • A Lock-Free Stack: A Simple Garbage Collector

    A Lock-Free Stack: A Simple Garbage Collector

    My next lock-free stack includes a simple garbage collector. I discussed the concurrent execution of more than one…

  • A Lock-Free Stack: Atomic Smart Pointer

    A Lock-Free Stack: Atomic Smart Pointer

    The easiest way to solve this memory leak issue from the last post is to use a std::shared_ptr. Atomic Smart Pointer…

  • A Lock-Free Stack: A Complete Implementation

    A Lock-Free Stack: A Complete Implementation

    My last lock-free stack implementation was incomplete. It only supported push operations.

    2 条评论
  • My ALS Journey (20/n): Aids

    My ALS Journey (20/n): Aids

    Today, I would like to introduce all the important aids that allow me and Beatrix to get through the day My ALS…

    2 条评论
  • My Next Mentoring Program: “Generic Programming (Templates) with C++” starts

    My Next Mentoring Program: “Generic Programming (Templates) with C++” starts

    My next mentoring program, “Generic Programming (Templates) with C++,” starts on February 28th. Registration is open.

  • A Lock-Free Stack: A Simplified Implementation

    A Lock-Free Stack: A Simplified Implementation

    Today, I continue my mini story about lock-free data structures. General Considerations From the outside, the caller’s…

    1 条评论
  • Deferred Reclamation in C++26: Read-Copy Update and Hazard Pointers

    Deferred Reclamation in C++26: Read-Copy Update and Hazard Pointers

    A common problem in concurrency is the so-called ABA problem. That means you read a variable twice, which returns the…

  • std::format Extension

    std::format Extension

    Displaying the address of an arbitrary pointer in C++ 20 fails but succeeds with C++26. C++20 Only void, const void…

