C++ tidbit #5: const static members
struct S { static const int x = 0; };
int? n = S::x;
const int& m = S::x; <-- link error: undefined reference to `S::x`
This link error is clearly impossible. Not only is `S::x` defined right there in line 1, in line 2 it is used successfully. How could it turn into an undefined-reference one line later?!
This particular rabbit hole dive began with a comment in Andy Soffer's CppOnSea2022 talk, and its depth surprised me. Let's go in.
(Partially made up) C++ Standard History
The old, simple days
In the elden days you'd declare a static data member as part of the class declaration, typically in a header. It wasn't stored as part of any instance and the class declaration was included in many translation units - so you had to communicate a single intended storage for it, by separately defining it in a single translation unit:
// S.h:
struct S { static int x;};
// src1.cpp:
#include "S.h"
int S::x; // optionally: =1
<detour> The selection of translation unit (TU) to use is entirely inconsequential. So why not have the location for S::x be chosen automatically? Well, in the classical compiler/linker separation of responsibilities - neither could make this choice: the compiler processed only one TU at a time, and the linker couldn't 'create' data - just pulled it from TUs into a unified executable. More on that later. </detour>
The Middle Ages
In a second phase, static *const* data members started being advertised as better (type-safe, scoped) alternatives to macros.
# define NUM_WIDGETS 10
struct Widget { } ;
Widget arrWidgets[NUM_WIDGETS]
// could be modernized into:
struct Fidget { static const int nFidgets; }
Fidget arrFidgets[Fidget::nFidgets];
Then a startling discovery was made: in this usage, the static const Fidget::nFidgets doesn't really need any storage! When the compiler has its value (say 10) it is perfectly happy to embed it directly into stack allocations or machine instructions, and not take it from any memory storage. (This was long before constexpr was born).
So, initialization as part of declaration was made legal for static const integers, and no definition in any cpp was required any longer:
struct Fidget { static const int nFidgets = 10;};
Fidget arrFidgets[Fidget::nFidgets];
As is often the case, these good intentions resulted in some unforeseen, hairy side effects. For one, 'A uses B' no longer had one clear meaning. Two forms of 'usage' had to be distinguished: usage that requires storage for B, and usage that doesn't.
Old-style usage, which requires storage, was baptized as "ODR-usage". This is a strong contender to the least-informational C++ term (second only to "RAII"), as connection to the actual One-Definition-Rule is vague at best. If you wanted to:
const int& r = Fidget::nFidgets;
std::vector<int> v;
v.push_back(Fidget::nFidgets); // <-- takes a reference
--you still had to create a definition (==storage) for Fidget::nFidgets in some cpp file. If you stuck with declaration+initializaiton only:
// Fidget.h
struct Fidget { static const int nFidgets = 10;};
You could use Fidget::nFidgets only in non-odr way, such as -
Fidget arr[Fidget::nFidgets] ;
int?n = Fidget::nFidgets ; // <-- think of the rhs as a literal.
// No storage is actually required == non-odr.
Some poor soul had to grep the entire C++ standard for 'use' and change it to odr or non-odr, and sometimes split the wording by case. For the sole benefit of allowing nFidgets=10 at the declaration site. (I think... are you aware of other entities that support only non-odr use?)
Solving the original mystery
Recall we started with -
struct S { static const int x = 0; };
int? n = S::x;
const int& m = S::x; <-- link error: undefined reference to `S::x`
And said that `S::x` is (1) defined in line 1, (2) used in line 2. Both turned out to be small lies: S::x is in fact (1) declared+initialized in line 1, but not defined, (2) used in line 2 only in a weak sense (non-odr). The link failure in line 3 is hopefully clear by now: it is an attempted odr-use for a variable with no definition.
Modern C++ Fix
Remember this detour paragraph above?
why not have the single location for S::x be chosen automatically? Well, in the classical compiler/linker separation of responsibilities - neither could make this choice: the compiler processed only one TU at a time, and the linker couldn't 'create' data - just pulled it from TUs into a unified executable.
That is... almost true. Linkers indeed cannot create data, but they can select one from many copies. Matter of fact they do it all the time: when you mark a function as inline, the compiler creates an instance of it in every TU that includes it and the linker has the ability to merge them all into one, storage and all (say if someone takes the address of it). Along the road towards C++17 another startling discovery was made: this entire inlining apparatus is already in place - we can use it for variables, not just functions!
So, in C++17+ you'd probably want to solve the original error with either -
struct S { static const inline int x = 0; };
Or even more expressively:
struct S { static constexpr int x = 0; };
(constexpr implies inline).
Side Note: Why Just Integers?
Making this legal:
struct S { static const float f = 0; };
could have made the following succeed without any definition for S::f:
float g = S::f;
So why not?
The standard term for allowed types in static consts is Integral constant expressions. These also include enums and a few technical restrictions, but the important bit to note is their usage. Integral constant expressions, and only them, are expected at :
And moreover, these are all non-odr uses.
Integers are special indeed. Float (and other) const statics are just not useful enough to be considered in this context.
Director of Software Engineering @ Speedata.io | C++ Guru and Speaker | ISO C++ standardization group member
2 年Great article ??