С interview questions. Bit-fields
?Note: Almost all examples were compiled using the GCC 12.2 compiler for ARM
Intro
A Bit fields provide convenient access to individual bits of data. They allow you to create objects that are not multiples of a byte.
A bit field cannot exist by itself. It can only be an element of the structure. Bit fields have the following form:
As a?<type> field, int (both signed and unsigned), __Bool or implementation-defined type can be used.
C99 standard (6.7.2.1): A bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation-defined type.
The <name>?is an arbitrary identifier, and <size> is a positive integer that must not exceed the length of <type> in bits:
C99 standard (6.7.2.1): The expression that specifies the width of a bit-field shall be an integer constant expression with a nonnegative value that does not exceed the width of an object of the type that would be specified were the colon and expression omitted. If the value is zero, the declaration shall have no declarator.
We also can use calculation for bit-field width:
Packing of bit fields
The compiler tries to pack the maximum number of bit fields into the size specified in the <type> field. But if the bit field does not fit into the size <type>, then an additional variable is allocated for it.
C99 standard (6.7.2.1): An implementation may allocate any addressable storage unit large enough to hold a bitfield. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified
Example 1:
This bit field will be packed into a single variable uint8_t, and thus the size of the structure will be equal to 1 byte.
Example 2:
This bit field will be packed into two variables uint8_t, i.e. the size of the structure will be 2 bytes, because the b field do nor fit in remain space of a field.
Field Selection <type>
Let's look at a few more examples:
These examples clearly show the importance of choosing the right field <type> for each bit field. For example, if you change the type from uint16_t to uint8_t in the byte structure, this will allow the compiler to pack the entire structure into 1 byte, instead of 2:
However, with the byte1 structure, this focus will no longer work, because even if you replace the type uint16_t with uint8_t, the compiler will still not be able to pack all the bit fields into the size of uint8_t, because 5 bit + 4 bit = 9 bit.
We also cannot reduce the occupied size of the structure byte2, because it is not impossible to change the type from uint16_t to uint8_t for the bit field a, because its size is larger than 8 bits. And changing the type for the bit field b will not do anything, because there is enough free space left in the bit field a to pack both of these fields into a new uint16_t.
And finally, consider the last example with the structure byte3. Here, replacing the type of the bit field b with uint16_t with uint8_t will also not change the size of the structure. Because of the size of the field a, the compiler will not be able to fit the entire structure in 16 bits, and therefore the field a will have a size of 2 bytes as expected, and the field b will have a size of 1 byte. And the compiler will add one additional byte to the end of the structure. Why and why he does it can be read in this article.
Unnamed zero-size bit-fields
Unnamed zero-size bit fields have a special function - they disable packaging.
C99 standard (6.7.2.1): A bit-field declaration with no declarator, but only a colon and a width, indicates an?unnamed bit-field. As a special case, a bit-field structure member with a width of 0?indicates that no further bit-field is to be packed into the unit in which the previous bitfield, if any, was placed.
If you want the compiler not to pack two adjacent bit fields, then you need to use an unnamed zero-size bit field. This field forces the compiler to "indent" to the border of the field of the specified type <type>.
Look at the structure for example:
As you can see, the compiler packed all the bit fields into 1 byte as expected.
Now we will add unnamed zero-size bit fields of various types to this structure and see what will change.
Unnamed field uint8_t
Let's add an unnamed zero-size field of type uint8_t:
As you can see, the size of the structure became equal to 2 bytes due to the fact that the unnamed field do not allow the compiler to pack the bit fields into 1 byte and, therefore, for the bit field b, the compiler was forced to allocate a separate memory cell of type uint8_t.
Unnamed field uint16_t
Let's add an unnamed zero-size field of type uint16_t:
The byte_3 structure contains a uint16_t unnamed zero-length field and accordingly shifts the b field to the next 16-bit boundary and it turns out that the bit field already occupies 3 bytes:
Unnamed field uint32_t
Let's add an unnamed zero-size field of type uint32_t:
The byte_4 structure contains a uint32_t unnamed zero-length field and accordingly shifts the b field to the next 32-bit boundary and it turns out that the bit field already occupies 5 bytes:
When the alignment of an anonymous field does not work
However, if the address of the field is already a multiple of sizeof(<type>) bits, then the unnamed zero-length field will not add a shift:
It doesn't make sense to consider anonymous fields uint8_t because they are always aligned along the border of their type. Therefore, we will focus only on examples with anonymous fields of types uint16_t and uint32_t
Take the structure of byte_3, but this time, between the bit field a and the unnamed bit field, we will add a few more fields of the size necessary to align the unnamed bit field to the boundary <type>.
The byte_3 structure has the uint16_t unnamed field of zero length and, accordingly, should shift the b field to the next 16-bit boundary. However, given the fact that the b field is already aligned along the 16-bit boundary, the anonymous field does nothing and the size of this structure will not change.
It will be the same with the unnamed field uint32_t :
An unnamed field of non-zero size
Only unnamed fields of zero size disables the packing of bit fields. For an unnamed field of non-zero size, ordinary packing rules work.
The structures byte_2, byte_3, byte_4 contain anonymous fields with a length of 1 bit, which are packed into fields a, as the b field. This type of fields can be used to reserve a certain number of bits. For example, when working with hardware protocols, they often have such reserved bits.
Alignment of structures with bit fields
If there are regular fields with bit fields in the structure, then the first bit field will be shifted to the <type> type boundary.
Lets look at the next structure
According to the rule described above, in the example structure, the b field will be aligned to a 4-byte boundary:
However, if you specify a number of bits less than 32 for the b field, then the compiler may (or may not) allocate a type other than uint32_t for this field. For example, for the field b in the form: uint32_t b: 8, a field of the type uint8_t will be allocated, and accordingly there will be no alignment:
The order of the individual bits in the bit fields
Look at an example:
The question arises in what order will the fields b be located in the memory cell?
Like this:
or like this:
And the problem is that the bit packing order is not defined (more precisely, it is implementation defined).
Let's take next structure and look at it at different byte order cpus.
For ARM (LE):
For PowerPC(BE):
A little unexpected because if you look at the assembly files for these two architectures, we will see the following - for ARM (LE):
For PowerPC(BE):
As you can see, the bit sequence is different for different architectures.
For ARM (LE), the 0th bit in a byte has the maximum right position, i.e. offset 0, and for PowerPC(BE), the 0th bit has the maximum left position, i.e. offset 7.
But at the same time when printing printf("b1 = %p\n", byte.raw); the information is output the same for both cpus:
Why?
Because here the compiler comes to our help. Let's add the setting of the bit b0 to one to our code and look at the assembly code again:
ARM (LE):
As you can see here, the compiler generates quite logical code and sets the 0th bit.
PowerPC(BE):
And then the compiler, knowing that the code is generated for the BE architecture, sets the 7th bit, because for this architecture it is considered zero. And as a result, it turns out that thanks to the compiler, we do not notice any differences in working with bit fields for LE and BE. However, firstly, as mentioned above, the order of packing bits is not defined by the standard, and therefore you can not to rely on it. And, secondly, if you are need to work with data received from a processor with a different byte order or with some hardware bit protocol that you communicate with over the network, then the compiler will not help you here and you need to take this into account when working with this protocol through bit fields.
Signed and unsigned bit fields
Bit fields can be either signed or unsigned. Here lies one nuance that I could not understand for a very long time - why do bit fields need a signed data type? And this is because I perceived bit fields precisely as a combination of simple bits, and this is fundamentally wrong because bit fields in the c language are a way of working with data types of a size not a multiple of a byte.?
I.e., a record of the form:
It means not just a set of 4 logically connected bits, it means a data type of 4 bits in size, and the data type can be either signed or unsigned.
C99 standard (6.7.2.1): A bit-field is interpreted as a signed or unsigned integer type consisting of the specified number of bits.
That is, if for the type uint8_t the range of values is [0,255], for the type int8_t the range of values is [-127,127], then for the bit field uint8_t a:4 the range of values will be [0,15], for int8_t a:4 the range of values is [-8,7]. Let's show this by example:
Embedded C programming | Digital, analogue & power electronic design | Prototyping | Engineering training provider
2 年I tend to avoid bitfields. The biggest problem with them is highlighted in your article - too much of their behaviour is implementation defined! Given they are most useful in situations where those fine details are important, they're not as useful feature as they first seem.
Great article. Low level bit twiddling is a bit out of fashion these days but many embedded systems still use bitfields to set all sorts of configurations, so it's still very much relevant there.