登录查看更多内容

Manipulating Bits: Top Hacks

Rohan Verma ???

Software Engineering Consultant ?? | C++, Scala, Go | LLVM | Web Assembly | Elastic Search | OLAP | OLTP | ReactiveX | Web Sockets | NoSQL Databases| Event-Driven Domain

发布日期: 2016年11月1日

In this artice I'm telling you guys few cool hacks for playing with bits, particularly useful in hackathons and coding matches

Compute the sign of an integer

int v;      // we want to find the sign of v
?int sign;   // the result goes here 
?// CHAR_BIT is the number of bits per byte (normally 8).
sign = -(v < 0);  // if v < 0 then -1, else0.// or, to avoid branching on CPUs with flag registers (IA32):
sign = -(int)((unsignedint)((int)v) >> (sizeof(int) * CHAR_BIT - 1));
// or, for one less instruction (but not portable):
sign = v >> (sizeof(int) * CHAR_BIT - 1);

The last expression above evaluates to sign = v >> 31 for 32-bit integers. This is one operation faster than the obvious way, sign = -(v < 0). This trick works because when signed integers are shifted right, the value of the far left bit is copied to the other bits. The far left bit is 1 when the value is negative and 0 otherwise; all 1 bits gives -1. Unfortunately, this behavior is architecture-specific.Alternatively, if you prefer the result be either -1 or +1, then use:

sign = +1 | (v >> (sizeof(int) * CHAR_BIT - 1));  // if v < 0 then -1, else +1

On the other hand, if you prefer the result be either -1, 0, or +1, then use:

sign = (v != 0) | -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));
// Or, for more speed but less portability:
sign = (v != 0) | (v >> (sizeof(int) * CHAR_BIT - 1));  // -1, 0, or +1// Or, for portability, brevity, and (perhaps) speed:
sign = (v > 0) - (v < 0); // -1, 0, or +1
If instead you want to know if something is non-negative, resulting in +1 or else0, then use:sign = 1 ^ ((unsignedint)v >> (sizeof(int) * CHAR_BIT - 1)); // if v < 0 then 0, else 1

Detect if two integers have opposite signs

int x, y;  // input values to compare signs
bool f = ((x ^ y) < 0); //true iff x and y have opposite signs

Compute the integer absolute value (abs) without branching

int v;           // we want to find the absolute value of v
?unsigned int r;  // the result goes here
?int const mask = v >> sizeof(int) * CHAR_BIT - 1;

r = (v + mask) ^ mask;
Patented variation:
r = (v ^ mask) - mask;

Some CPUs don't have an integer absolute value instruction (or the compiler fails to use them). On machines where branching is expensive, the above expression can be faster than the obvious approach, r = (v < 0) ? -(unsigned)v : v, even though the number of operations is the same.

Compute the minimum (min) or maximum (max) of two integers without branching

int x;  // we want to find the minimum of x and y
?int y;   int r;  // the result goes here 

r = y ^ ((x ^ y) & -(x < y)); // min(x, y)

On some rare machines where branching is very expensive and no condition move instructions exist, the above expression might be faster than the obvious approach, r = (x < y) ? x : y, even though it involves two more instructions. (Typically, the obvious approach is best, though.) It works because if x < y, then -(x < y) will be all ones, so r = y ^ (x ^ y) & ~0 = y ^ x ^ y = x. Otherwise, if x >= y, then -(x < y) will be all zeros, so r = y ^ ((x ^ y) & 0) = y. On some machines, evaluating (x < y) as 0 or 1 requires a branch instruction, so there may be no advantage.To find the maximum, use:

r = x ^ ((x ^ y) & -(x < y)); // max(x, y)

Quick and dirty versions:

If you know that INT_MIN <= x - y <= INT_MAX, then you can use the following, which are faster because (x - y) only needs to be evaluated once.
r = y + ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // min(x, y)
r = x - ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // max(x, y)

Note that the 1989 ANSI C specification doesn't specify the result of signed right-shift, so these aren't portable. If exceptions are thrown on overflows, then the values of x and y should be unsigned or cast to unsigned for the subtractions to avoid unnecessarily throwing an exception, however the right-shift needs a signed operand to produce all one bits when negative, so cast to signed there.

Determining if an integer is a power of 2

unsignedint v; // we want to see if v is a power of ?2
?bool f;         // the result goes here 

f = (v & (v - 1)) == 0;
Note that 0is incorrectly considered a power of2 here. To remedy this, use:
f = v && !(v & (v - 1));

Sign extending from a constant bit-width

Sign extension is automatic for built-in types, such as chars and ints. But suppose you have a signed two's complement number, x, that is stored using only b bits. Moreover, suppose you want to convert x to an int, which has more than b bits. A simple copy will work if x is positive, but if negative, the sign must be extended. For example, if we have only 4 bits to store a number, then -3 is represented as1101in binary. If we have 8 bits, then -3 is 11111101. The most-significant bit of the 4-bit representation is replicated sinistrally to fill in the destination when we convert to a representation with more bits; this is sign extending. In C, sign extension from a constant bit-width is trivial, since bit fields may be specified in structs or unions. For example, to convert from 5 bits to an full integer:
int x; // convert this from using 5 bits to a full int
?int r; // resulting sign extended number goes here
?struct {signed int x:5;} s;
r = s.x = x;
The following is a C++ template function that uses the same language feature to convert from B bits in one operation (though the compiler is generating more, of course).template <typename T, unsigned B>
inline T signextend(const T x)
{
  struct {T x:B;} s;
  return s.x = x;
}

int r = signextend<signed int,5>(x);  // sign extend 5 bit number x to r

Sign extending from a variable bit-width

Sometimes we need to extend the sign of a number but we don't know a priori the number of bits, b, in which it is represented. (Or we could be programming in a language like Java, which lacks bitfields.)
unsigned b; // number of bits representing the number in x
?int x;      // sign extend this b-bit number to r
?int r;      // resulting sign-extended number
?int const m = 1U << (b - 1); // mask can be pre-computed if b is fixed

x = x & ((1U << b) - 1);  // (Skip this if bits in x above position b are already zero.)
r = (x ^ m) - m;

The code above requires four operations, but when the bitwidth is a constant rather than variable, it requires only two fast operations, assuming the upper bits are already zeroes.A slightly faster but less portable method that doesn't depend on the bits in x above position b being zero is:

int const m = CHAR_BIT * sizeof(x) - b;
r = (x << m) >> m;

Sign extending from a variable bit-width in 3 operations

The following may be slow on some machines, due to the effort required for multiplication and division. This version is4 operations. If you know that your initial bit-width, b, is greater than 1, you might dothis typeof sign extension in3 operations by using r = (x * multipliers[b]) / multipliers[b], which requires only one array lookup.
unsigned b; // number of bits representing the number in x
int x;      // sign extend this b-bit number to r
int r;      // resulting sign-extended number
?#define M(B) (1U << ((sizeof(x) * CHAR_BIT) - B)) 
?/?/ CHAR_BIT=bits/bytes
s?tatic int const multipliers[] = 
{
  0,     M(1),  M(2),  M(3),  M(4),  M(5),  M(6),  M(7),
  M(8),  M(9),  M(10), M(11), M(12), M(13), M(14), M(15),
  M(16), M(17), M(18), M(19), M(20), M(21), M(22), M(23),
  M(24), M(25), M(26), M(27), M(28), M(29), M(30), M(31),
  M(32)
}; // (add more if using more than 64 bits)
?static int const divisors[] = 
{
  1,    ~M(1),  M(2),  M(3),  M(4),  M(5),  M(6),  M(7),
  M(8),  M(9),  M(10), M(11), M(12), M(13), M(14), M(15),
  M(16), M(17), M(18), M(19), M(20), M(21), M(22), M(23),
  M(24), M(25), M(26), M(27), M(28), M(29), M(30), M(31),
  M(32)
}; // (add more for64 bits)#undef M
r = (x * multipliers[b]) / divisors[b];
The following variation isnot portable, but on architectures that employ an arithmetic right-shift, maintaining the sign, it should be fast.
const int s = -b; //OR:  sizeof(x) * CHAR_BIT - b;
r = (x << s) >> s;

Conditionally set or clear bits without branching

bool f;         // conditional flag
?unsignedint m; // the bit mask
?u?nsigned int w; // the word to modify:
?if (f) w |= m; else w &= ~m; 

w ^= (-f ^ w) & m;

// OR, for superscalar CPUs:
w = (w & ~m) | (-f & m);

On some architectures, the lack of branching can more than make up for what appears to be twice as many operations. For instance, informal speed tests on an AMD Athlon? XP 2100+ indicated it was 5-10% faster. An Intel Core 2 Duo ran the superscalar version about 16% faster than the first.

Conditionally negate a value without branching

If you need to negate only when a flag isfalse, then use the following to avoid branching:
?bool fDontNegate;  // Flag indicating we should not negate v.
int v;             // Input value to negate if fDontNegate is false.
int r;             // result = fDontNegate ? v : -v;

r = (fDontNegate ^ (fDontNegate - 1)) * v;
If you need to negate only when a flag istrue, then use this:
?bool fNegate;  // Flag indicating if we should negate v.
int v;         // Input value to negate if fNegate is true.
int r;         // result = fNegate ? -v : v;

r = (v ^ -fNegate) + fNegate;

Merge bits from two values according to a mask

unsigned int a;    // value to mergein non-masked bits
?unsigned int b;    // value to mergein masked bits
?unsigned int mask; // 1 where bits from b should be selected; 0 where from a.
?unsigned int r;    // result of (a & ~mask) | (b & mask) goes here

r = a ^ ((a ^ b) & mask);

This shaves one operation from the obvious way of combining two sets of bits according to a bit mask. If the mask is a constant, then there may be no advantage.

Counting bits set (naive way)

unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v
?for (c = 0; v; v >>= 1)
{
  c += v & 1;
}

The naive approach requires one iteration per bit, until no more bits are set. So on a 32-bit word with only the high set, it will go through 32 iterations.

Counting bits set by lookup table

static const unsigned char BitsSetTable256[256] = 
{
#   define B2(n) n,     n+1,     n+1,     n+2#   define B4(n) B2(n), B2(n+1), B2(n+1), B2(n+2)#   define B6(n) B4(n), B4(n+1), B4(n+1), B4(n+2)
    B6(0), B6(1), B6(1), B6(2)
};

unsigned int v; // count the number of bits set in 32-bit value v
?unsigned int c; // c is the total bits set in v
?// ?Option 1:
c = BitsSetTable256[v & 0xff] + 
    BitsSetTable256[(v >> 8) & 0xff] + 
    BitsSetTable256[(v >> 16) & 0xff] + 
    BitsSetTable256[v >> 24]; 

// Option 2:
?unsigned char * p = (unsignedchar *) &v;
c = BitsSetTable256[p[0]] + 
    BitsSetTable256[p[1]] + 
    BitsSetTable256[p[2]] +	
    BitsSetTable256[p[3]];


// To initially generate the table algorithmically:
BitsSetTable256[0] = 0;
for (int i = 0; i < 256; i++)
{
  BitsSetTable256[i] = (i & 1) + BitsSetTable256[i / 2];
}

Counting bits set, Brian Kernighan's way

unsigned int v; // count the number of bits setin v
?unsigned int c; // c accumulates the total bits setin v
?for (c = 0; v; c++)
{
  v &= v - 1; // clear the least significant bit set
}

Brian Kernighan's method goes through as many iterations as there are set bits. So if we have a 32-bit word with only the high bit set, then it will only go once through the loop.Published in 1988, the C Programming Language 2nd Ed. (by Brian W. Kernighan and Dennis M. Ritchie) mentions this in exercise 2-9.

Counting bits set in 14, 24, or 32-bit words using 64-bit instructions

unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v// option 1, for at most 14-bit values in v:
c = (v * 0x200040008001ULL & 0x111111111111111ULL) % 0xf;

// option 2, for at most 24-bit values in v:
c =  ((v & 0xfff) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
c += (((v & 0xfff000) >> 12) * 0x1001001001001ULL & 0x84210842108421ULL) 
     % 0x1f;

// option 3, for at most 32-bit values in v:
c =  ((v & 0xfff) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
c += (((v & 0xfff000) >> 12) * 0x1001001001001ULL & 0x84210842108421ULL) % 
     0x1f;
c += ((v >> 24) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;

This method requires a 64-bit CPU with fast modulus division to be efficient.

Counting bits set, in parallel

unsigned int v; // count bits set in this (32-bit value)
unsigned int c; // store the total here
?static const int S[] = {1, 2, 4, 8, 16}; // Magic Binary Numbersstatic const int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF, 0x0000FFFF};
c = v - ((v >> 1) & B[0]);c = ((c >> S[1]) & B[1]) + (c & B[1]);
c = ((c >> S[2]) + c) & B[2];
c = ((c >> S[3]) + c) & B[3];
c = ((c >> S[4]) + c) & B[4];
TheB array, expressed as binary, is:
B[0] = 0x55555555 = 01010101010101010101010101010101B[1] = 0x33333333 = 00110011001100110011001100110011B[2] = 0x0F0F0F0F = 00001111000011110000111100001111B[3] = 0x00FF00FF = 00000000111111110000000011111111B[4] = 0x0000FFFF = 00000000000000001111111111111111

We can adjust the method for larger integer sizes by continuing with the patterns for the Binary Magic Numbers, B and S. If there are k bits, then we need the arrays S and B to be ceil(lg(k)) elements long, and we must compute the same number of expressions for c as S or B are long. For a 32-bit v, 16 operations are used.The best method for counting bits in a 32-bit integer v is the following:

v = v - ((v >> 1) & 0x55555555);                    // reuse input as temporary
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);     // temp
c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count

The best bit counting method takes only 12 operations, which is the same as the lookup-table method, but avoids the memory and potential cache misses of a table. It is a hybrid between the purely parallel method above and the earlier methods using multiplies (in the section on counting bits with 64-bit instructions), though it doesn't use 64-bit instructions. The counts of bits set in the bytes is done in parallel, and the sum total of the bits set in the bytes is computed by multiplying by 0x1010101 and shifting right 24 bits.

A generalization of the best bit counting method to integers of bit-widths upto 128 (parameterized by type T) is this:

v = v - ((v >> 1) & (T)~(T)0/3);                           // temp
v = (v & (T)~(T)0/15*3) + ((v >> 2) & (T)~(T)0/15*3);      // temp
v = (v + (v >> 4)) & (T)~(T)0/255*15;                      // temp
c = (T)(v * ((T)~(T)0/255)) >> (sizeof(T) - 1) * CHAR_BIT; // count

Count bits set (rank) from the most-significant bit upto a given position

The following finds the the rank of a bit, meaning it returns the sum of bits that are set to 1 from the most-signficant bit downto the bit at the given position.  
uint64_t v;       // Compute the rank (bits set) in v from the MSB to pos.
?unsigned int pos; // Bit position to count bits upto.
?uint64_t r;       // Resulting rank of bit at pos goes here.
?// Shift out bits after given position.
  r = v >> (sizeof(v) * CHAR_BIT - pos);
  // Count set bits in parallel.// r = (r & 0x5555...) + ((r >> 1) & 0x5555...);
  r = r - ((r >> 1) & ~0UL/3);
  // r = (r & 0x3333...) + ((r >> 2) & 0x3333...);
  r = (r & ~0UL/5) + ((r >> 2) & ~0UL/5);
  // r = (r & 0x0f0f...) + ((r >> 4) & 0x0f0f...);
  r = (r + (r >> 4)) & ~0UL/17;
  // r = r % 255;
  r = (r * (~0UL/255)) >> ((sizeof(v) - 1) * CHAR_BIT);

Select the bit position (from the most-significant bit) with the given count (rank)

The following 64-bit code selects the position of the rth 1 bit when counting from the left. In other words if we start at the most significant bit and proceed to the right, counting the number of bits set to 1 until we reach the desired rank, r, then the position where we stop is returned. If the rank requested exceeds the count of bits set, then 64is returned. The code may be modified for32-bit or counting from the right. 
  uint64_t v;          // Input value to find position with rank r.
  unsigned int r;      // Input: bit's desired rank [1-64].
  unsigned int s;      // Output: Resulting position of bit with rank r [1-64]
  uint64_t a, b, c, d; // Intermediate temporaries for bit count.
  unsigned int t;      // Bit count temporary.// Do a normal parallel bit count for a 64-bit integer,                     // but store all intermediate steps.                                        // a = (v & 0x5555...) + ((v >> 1) & 0x5555...);
  a =  v - ((v >> 1) & ~0UL/3);
  // b = (a & 0x3333...) + ((a >> 2) & 0x3333...);
  b = (a & ~0UL/5) + ((a >> 2) & ~0UL/5);
  // c = (b & 0x0f0f...) + ((b >> 4) & 0x0f0f...);c = (b + (b >> 4)) & ~0UL/0x11;// d = (c & 0x00ff...) + ((c >> 8) & 0x00ff...);
  d = (c + (c >> 8)) & ~0UL/0x101;
  t = (d >> 32) + (d >> 48);
  // Now do branchless select!                                                
  s  = 64;
  // if (r > t) {s -= 32; r -= t;}
  s -= ((t - r) & 256) >> 3; r -= (t & ((t - r) >> 8));
  t  = (d >> (s - 16)) & 0xff;
  // if (r > t) {s -= 16; r -= t;}
  s -= ((t - r) & 256) >> 4; r -= (t & ((t - r) >> 8));
  t  = (c >> (s - 8)) & 0xf;
  // if (r > t) {s -= 8; r -= t;}
  s -= ((t - r) & 256) >> 5; r -= (t & ((t - r) >> 8));
  t  = (b >> (s - 4)) & 0x7;
  // if (r > t) {s -= 4; r -= t;}
  s -= ((t - r) & 256) >> 6; r -= (t & ((t - r) >> 8));
  t  = (a >> (s - 2)) & 0x3;
  // if (r > t) {s -= 2; r -= t;}
  s -= ((t - r) & 256) >> 7; r -= (t & ((t - r) >> 8));
  t  = (v >> (s - 1)) & 0x1;
  // if (r > t) s--;
  s -= ((t - r) & 256) >> 8;
  s = 65 - s;

If branching is fast on your target CPU, consider uncommenting the if-statements and commenting the lines that follow them.

Computing parity the naive way

unsignedint v;       // word value to compute the parity of
?bool parity = false;  // parity will be the parity of v
?while (v)
{
  parity = !parity;
  v = v & (v - 1);
}

The above code uses an approach like Brian Kernigan's bit counting, above. The time it takes is proportional to the number of bits set.

Compute parity by lookup table

static const bool ParityTable256[256] = 
{
#   define P2(n) n, n^1, n^1, n#   define P4(n) P2(n), P2(n^1), P2(n^1), P2(n)#   define P6(n) P4(n), P4(n^1), P4(n^1), P4(n)
    P6(0), P6(1), P6(1), P6(0)
};

unsigned char b;  // byte value to compute the parity of 
?bool parity = ParityTable256[b];

// OR, for 32-bit words:
?unsigned int v;
v ^= v >> 16;
v ^= v >> 8;
bool parity = ParityTable256[v & 0xff];

// Variation:
?unsignedchar * p = (unsignedchar *) &v;
parity = ParityTable256[p[0] ^ p[1] ^ p[2] ^ p[3]];

Compute parity of a byte using 64-bit multiply and modulus division

unsigned char b;  // byte value to compute the parity of
?bool parity =(((b * 0x0101010101010101ULL) & 0x8040201008040201ULL) % 0x1FF) & 1;

The method above takes around 4 operations, but only works on bytes.

Compute parity of word with a multiply

The following method computes the parity of the 32-bit valuein only 8 operations using a multiply.    
unsigned int v; //32-bit word
    v ^= v >> 1;
    v ^= v >> 2;
    v = (v & 0x11111111U) * 0x11111111U;
    return (v >> 28) & 1;
Also for64-bits, 8 operations are still enough.   
   unsigned long long v; //64-bit word
    v ^= v >> 1;
    v ^= v >> 2;
    v = (v & 0x1111111111111111UL) * 0x1111111111111111UL;
    return (v >> 60) & 1;

Compute parity in parallel

unsigned int v;  // word value to compute the parity of
v ^= v >> 16;
v ^= v >> 8;
v ^= v >> 4;
v &= 0xf;
return (0x6996 >> v) & 1;

The method above takes around 9 operations, and works for 32-bit words. It may be optimized to work just on bytes in 5 operations by removing the two lines immediately following "unsigned int v;". The method first shifts and XORs the eight nibbles of the 32-bit value together, leaving the result in the lowest nibble of v. Next, the binary number 0110 1001 1001 0110 (0x6996 in hex) is shifted to the right by the value represented in the lowest nibble of v. This number is like a miniature 16-bit parity-table indexed by the low four bits in v. The result has the parity of v in bit 1, which is masked and returned.

Swapping values with subtraction and addition

#define SWAP(a,b)((&(a)==&(b))||\(((a)-=(b)),((b)+=(a)),((a)=(b)-(a))))

This swaps the values of a and b without using a temporary variable. The initial check for a and b being the same location in memory may be omitted when you know this can't happen. (The compiler may omit it anyway as an optimization.) If you enable overflows exceptions, then pass unsigned values so an exception isn't thrown. The XOR method that follows may be slightly faster on some machines. Don't use this with floating-point numbers (unless you operate on their raw integer representations).

Swapping values with XOR

#define SWAP(a, b) (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))

This is an old trick to exchange the values of the variables a and b without using extra space for a temporary variable.On January 20, 2005, Iain A. Fleming pointed out that the macro above doesn't work when you swap with the same memory location, such as SWAP(a[i], a[j]) with i == j. So if that may occur, consider defining the macro as (((a) == (b)) || (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))).

Swapping individual bits with XOR

unsigned int i, j; // positions of bit sequences to swap]
unsigned int n;    // number of consecutive bits in each sequence
unsigned int b;    // bits to swap reside in bunsigned
int r;    // bit-swapped result goes hereunsigned
?int x = ((b >> i) ^ (b >> j)) & ((1U << n) - 1); // XOR temporary
r = b ^ ((x << i) | (x << j));

As an example of swapping ranges of bits suppose we have have b = 00101111 (expressed in binary) and we want to swap the n = 3 consecutive bits starting at i = 1 (the second bit from the right) with the 3 consecutive bits starting at j = 5; the result would be r = 11100011 (binary).This method of swapping is similar to the general purpose XOR swap trick, but intended for operating on individual bits. The variable x stores the result of XORing the pairs of bit values we want to swap, and then the bits are set to the result of themselves XORed with x. Of course, the result is undefined if the sequences overlap.

Reverse bits the obvious way

unsigned int v;     // input bits to be reversed
?unsigned int r = v; // r will be reversed bits of v; first get LSB of v
?int s = sizeof(v) * CHAR_BIT - 1; // extra shift needed at end
?for (v >>= 1; v; v >>= 1)
{   
  r <<= 1;
  r |= v & 1;
  s--;
}
r <<= s; // shift when v's highest bits are zero

Reverse bits in word by lookup table

static const unsignedchar BitReverseTable256[256] = 
{
#   define R2(n)     n,     n + 2*64,     n + 1*64,     n + 3*64#   define R4(n) R2(n), R2(n + 2*16), R2(n + 1*16), R2(n + 3*16)#   define R6(n) R4(n), R4(n + 2*4 ), R4(n + 1*4 ), R4(n + 3*4 )
    R6(0), R6(2), R6(1), R6(3)
};

unsigned int v; // reverse 32-bit value, 8 bits at time
?unsigned int c; // c will get v reversed
?/?/ Option 1:
c = (BitReverseTable256[v & 0xff] << 24) | 
    (BitReverseTable256[(v >> 8) & 0xff] << 16) | 
    (BitReverseTable256[(v >> 16) & 0xff] << 8) |
    (BitReverseTable256[(v >> 24) & 0xff]);

// Option 2:
?unsignedchar * p = (unsignedchar *) &v;
unsignedchar * q = (unsignedchar *) &c;
q[3] = BitReverseTable256[p[0]]; 
q[2] = BitReverseTable256[p[1]]; 
q[1] = BitReverseTable256[p[2]]; 
q[0] = BitReverseTable256[p[3]];

The first method takes about 17 operations, and the second takes about 12, assuming your CPU can load and store bytes easily.

Reverse the bits in a byte with 3 operations (64-bit multiply and modulus division):

unsignedchar b; // reverse this (8-bit) byte
 
b = (b * 0x0202020202ULL & 0x010884422010ULL) % 1023;

The multiply operation creates five separate copies of the 8-bit byte pattern to fan-out into a 64-bit value. The AND operation selects the bits that are in the correct (reversed) positions, relative to each 10-bit groups of bits. The multiply and the AND operations copy the bits from the original byte so they each appear in only one of the 10-bit sets. The reversed positions of the bits from the original byte coincide with their relative positions within any 10-bit set. The last step, which involves modulus division by 2^10 - 1, has the effect of merging together each set of 10 bits (from positions 0-9, 10-19, 20-29, ...) in the 64-bit value. They do not overlap, so the addition steps underlying the modulus division behave like or operations.

Rohan Verma ???

8 年

Please note by hackathons I meant Open Source competitions and Maker competition, involving embedded systems.

要查看或添加评论，请登录

Rohan Verma ???的更多文章

Performance per Watt - Part 1: SIMD, C++ and C#

2018年5月27日

Performance per Watt - Part 1: SIMD, C++ and C#

As a Systems programmer, a lot of the individuals from Embedded Software and Game Development field approach me for…
Use Signed Integer:Time to dump unsigned types ?

2017年10月1日

Use Signed Integer:Time to dump unsigned types ?

Signed Integer are generally used to do 2's complement arithmetic but also we can add two 2's complement numbers using…

3 条评论
A Compile-Time implementation of Selection-Sort & Quicksort in C++17

2017年6月28日

A Compile-Time implementation of Selection-Sort & Quicksort in C++17

So let me start this article by outlining the difference between Compile-Time and Run-Time implementation of a program,…
Code Bleed :Garbage Collector,JVM & A language that I hate

2016年12月26日

Code Bleed :Garbage Collector,JVM & A language that I hate

Please follow this article before reading any of this article G1: One Garbage Collector To Rule Them All also listen to…
SQL is better than NoSQL

2016年11月7日

SQL is better than NoSQL

MySQL is a better NoSQL. When considering a NoSQL use case, such as key/value storage, MySQL makes more sense in terms…
OpenCV: Detect features by SIFT

2016年10月30日

OpenCV: Detect features by SIFT

In this article I'm going to show you guys an image detector, known as SIFT transforms. Please read this book for…
Linus Torvalds : Double pointer approach

2016年10月28日

Linus Torvalds : Double pointer approach

This article is about the interview from Linus Torvalds published in SlashDot, the Linus talked about removing the…
Competitive Programming - Algorithms

2016年10月26日

Competitive Programming - Algorithms

This article is dedicated to Harsha Suryanarayana a.k.

4 条评论

See all articles

Compute the sign of an integer

Detect if two integers have opposite signs

Compute the integer absolute value (abs) without branching

Compute the minimum (min) or maximum (max) of two integers without branching

Quick and dirty versions:

Determining if an integer is a power of 2

Sign extending from a constant bit-width

Sign extending from a variable bit-width

Sign extending from a variable bit-width in 3 operations

Conditionally set or clear bits without branching

Conditionally negate a value without branching

Merge bits from two values according to a mask

Counting bits set (naive way)

Counting bits set by lookup table

Counting bits set, Brian Kernighan's way

Counting bits set in 14, 24, or 32-bit words using 64-bit instructions

Counting bits set, in parallel

Count bits set (rank) from the most-significant bit upto a given position

Select the bit position (from the most-significant bit) with the given count (rank)

Computing parity the naive way

Compute parity by lookup table

Compute parity of a byte using 64-bit multiply and modulus division

Compute parity of word with a multiply

Compute parity in parallel

Swapping values with subtraction and addition

Swapping values with XOR

Swapping individual bits with XOR

Reverse bits the obvious way

Reverse bits in word by lookup table

Reverse the bits in a byte with 3 operations (64-bit multiply and modulus division):

Rohan Verma ???的更多文章

Performance per Watt - Part 1: SIMD, C++ and C#

Use Signed Integer:Time to dump unsigned types ?

A Compile-Time implementation of Selection-Sort & Quicksort in C++17

Code Bleed :Garbage Collector,JVM & A language that I hate

SQL is better than NoSQL

OpenCV: Detect features by SIFT

Linus Torvalds : Double pointer approach

Competitive Programming - Algorithms

社区洞察

其他会员也浏览了

TEEs: Intel SGX attestation

A step-by-step guide to install Intel Advisor and analyze a sample application and find out where Vectorization matters the most

I Accidentally Discovered the Stuxnet of Keyboards while Writing a Userspace HID Driver

Improving performance in the binary file analyzer

When it's private in c++ not really private

Pre-Post rtl simulation a case study

Aarch64 and x86_64 Platforms Introduction: Oct 24, 2024.

Don't write the fastest code

Pico Hacking (Part 15 - Debugging double)