Counting the digits of 64-bit integers
Given an integer in software, you may want to know how many decimal digits it needs. For example, the integer 100 requires 3 digits, the integer 9999 requires 4 digits.
It would be an easy problem if we could compute the logarithm in base 10 of an integer quickly. Unfortunately, our computers work in base 2 and it is faster to use a base 2 logarithm.
There are fast techniques to solve the digit-counting problem using few instructions. For 32-bit integers, a classical trick found in references such as Hacker’s Delight is as follows:
int digit_count(uint32_t x) {
static uint32_t table[] = {9, 99, 999, 9999, 99999,
999999, 9999999, 99999999, 999999999};
int y = (9 * int_log2(x)) >> 5;
y += x > table[y];
return y + 1;
}
where the int_log2 function simply computes the integer logarithm in base 2. In base 2, there are fast functions to compute the logarithm… if you use GCC on Clang in C or C++, the following might do:
int int_log2(uint64_t x) { return 63 - __builtin_clzll(x | 1); }
Observe that I compute the bitwise OR of the input with the integer 1: I want to avoid computing the logarithm of zero. If you have a C++20 capable compiler, you can implement int_log2 with std::bit_width(x|1).
There is a faster function which I credit to Willets:
int alternative_digit_count(uint32_t x) {
static uint64_t table[] = {
4294967296, 8589934582, 8589934582, 8589934582, 12884901788,
12884901788, 12884901788, 17179868184, 17179868184, 17179868184,
21474826480, 21474826480, 21474826480, 21474826480, 25769703776,
25769703776, 25769703776, 30063771072, 30063771072, 30063771072,
34349738368, 34349738368, 34349738368, 34349738368, 38554705664,
38554705664, 38554705664, 41949672960, 41949672960, 41949672960,
42949672960, 42949672960};
return (x + table[int_log2(x)]) >> 32;
}
What if you have 64-bit integers instead of 32-bit integers?
You can generalize the same function. First of all, we have the conventional fast function:
int digit_count(uint64_t x) {
static uint64_t table[] = {9,
99,
999,
9999,
99999,
999999,
9999999,
99999999,
999999999,
9999999999,
99999999999,
999999999999,
9999999999999,
99999999999999,
999999999999999ULL,
9999999999999999ULL,
99999999999999999ULL,
999999999999999999ULL,
9999999999999999999ULL};
int y = (19 * int_log2(x) >> 6);
y += x > table[y];
return y + 1;
}
And then we can port the faster alternative to 64-bit integers, replacing a 64-bit lookup table with a 128-bit lookup table:
int alternative_digit_count(uint64_t x) {
static uint64_t table[64][2] = {
{ 0x01, 0xfffffffffffffff6ULL },
{ 0x01, 0xfffffffffffffff6ULL },
{ 0x01, 0xfffffffffffffff6ULL },
{ 0x01, 0xfffffffffffffff6ULL },
{ 0x02, 0xffffffffffffff9cULL },
{ 0x02, 0xffffffffffffff9cULL },
{ 0x02, 0xffffffffffffff9cULL },
{ 0x03, 0xfffffffffffffc18ULL },
{ 0x03, 0xfffffffffffffc18ULL },
{ 0x03, 0xfffffffffffffc18ULL },
{ 0x04, 0xffffffffffffd8f0ULL },
{ 0x04, 0xffffffffffffd8f0ULL },
{ 0x04, 0xffffffffffffd8f0ULL },
{ 0x04, 0xffffffffffffd8f0ULL },
{ 0x05, 0xfffffffffffe7960ULL },
{ 0x05, 0xfffffffffffe7960ULL },
{ 0x05, 0xfffffffffffe7960ULL },
{ 0x06, 0xfffffffffff0bdc0ULL },
{ 0x06, 0xfffffffffff0bdc0ULL },
{ 0x06, 0xfffffffffff0bdc0ULL },
{ 0x07, 0xffffffffff676980ULL },
{ 0x07, 0xffffffffff676980ULL },
{ 0x07, 0xffffffffff676980ULL },
{ 0x07, 0xffffffffff676980ULL },
{ 0x08, 0xfffffffffa0a1f00ULL },
{ 0x08, 0xfffffffffa0a1f00ULL },
{ 0x08, 0xfffffffffa0a1f00ULL },
{ 0x09, 0xffffffffc4653600ULL },
{ 0x09, 0xffffffffc4653600ULL },
{ 0x09, 0xffffffffc4653600ULL },
{ 0x0a, 0xfffffffdabf41c00ULL },
{ 0x0a, 0xfffffffdabf41c00ULL },
{ 0x0a, 0xfffffffdabf41c00ULL },
{ 0x0a, 0xfffffffdabf41c00ULL },
{ 0x0b, 0xffffffe8b7891800ULL },
{ 0x0b, 0xffffffe8b7891800ULL },
{ 0x0b, 0xffffffe8b7891800ULL },
{ 0x0c, 0xffffff172b5af000ULL },
{ 0x0c, 0xffffff172b5af000ULL },
{ 0x0c, 0xffffff172b5af000ULL },
{ 0x0d, 0xfffff6e7b18d6000ULL },
{ 0x0d, 0xfffff6e7b18d6000ULL },
{ 0x0d, 0xfffff6e7b18d6000ULL },
{ 0x0d, 0xfffff6e7b18d6000ULL },
{ 0x0e, 0xffffa50cef85c000ULL },
{ 0x0e, 0xffffa50cef85c000ULL },
{ 0x0e, 0xffffa50cef85c000ULL },
{ 0x0f, 0xfffc72815b398000ULL },
{ 0x0f, 0xfffc72815b398000ULL },
{ 0x0f, 0xfffc72815b398000ULL },
{ 0x10, 0xffdc790d903f0000ULL },
{ 0x10, 0xffdc790d903f0000ULL },
{ 0x10, 0xffdc790d903f0000ULL },
{ 0x10, 0xffdc790d903f0000ULL },
{ 0x11, 0xfe9cba87a2760000ULL },
{ 0x11, 0xfe9cba87a2760000ULL },
{ 0x11, 0xfe9cba87a2760000ULL },
{ 0x12, 0xf21f494c589c0000ULL },
{ 0x12, 0xf21f494c589c0000ULL },
{ 0x12, 0xf21f494c589c0000ULL },
{ 0x13, 0x7538dcfb76180000ULL },
{ 0x13, 0x7538dcfb76180000ULL },
{ 0x13, 0x7538dcfb76180000ULL },
{ 0x13, 0x7538dcfb76180000ULL },
};
int log = int_log2(x);
uint64_t low = table[log][1];
uint64_t high = table[log][0];
return (x + low < x ) + high;
}
Though my 64-bit code may look mysterious, it is doing the same thing as the original 32-bit function: it adds the input to looked up value, and keep the most significant bits.
I have included a new simpler 64-bit version based on work by Kendall Willets, this time assuming that the compiler supports C++20:
int alternative_digit_count_two_tables(uint64_t x) {
static int digits[65] = {19, 19, 19, 19, 18, 18, 18,
17, 17, 17, 16, 16, 16,
16, 15, 15, 15, 14, 14, 14,
13, 13, 13, 13, 12, 12,
12, 11, 11, 11, 10, 10, 10,
10, 9, 9, 9, 8, 8,
8, 7, 7, 7, 7, 6, 6,
6, 5, 5, 5, 4, 4,
4, 4, 3, 3, 3, 2, 2,
2, 1, 1, 1, 1, 1};
static uint64_t table[65] = {18446744073709551615ULL,
18446744073709551615ULL,
18446744073709551615ULL,
18446744073709551615ULL,
999999999999999999ULL,
999999999999999999ULL,
999999999999999999ULL,
99999999999999999ULL,
99999999999999999ULL,
99999999999999999ULL,
9999999999999999ULL,
9999999999999999ULL,
9999999999999999ULL,
9999999999999999ULL,
999999999999999ULL,
999999999999999ULL,
999999999999999ULL,
99999999999999ULL,
99999999999999ULL,
99999999999999ULL,
9999999999999ULL,
9999999999999ULL,
9999999999999ULL,
9999999999999ULL,
999999999999ULL,
999999999999ULL,
999999999999ULL,
99999999999ULL,
99999999999ULL,
99999999999ULL,
9999999999ULL,
9999999999ULL,
9999999999ULL,
9999999999ULL,
999999999ULL,
999999999ULL,
999999999ULL,
99999999ULL,
99999999ULL,
99999999ULL,
9999999ULL,
9999999ULL,
9999999ULL,
9999999ULL,
999999ULL,
999999ULL,
999999ULL,
99999ULL,
99999ULL,
99999ULL,
9999ULL,
9999ULL,
9999ULL,
9999ULL,
999ULL,
999ULL,
999ULL,
99ULL,
99ULL,
99ULL,
9ULL,
9ULL,
9ULL,
9ULL,
0ULL};
int log = std::countl_zero(x);
uint64_t low = table[log];
uint64_t high = digits[log];
return (x > low) + high;
}
What we want to know is which functions are fast?
To test it out, I generate thousands of random integers and I just sum up their digit counts. I record the number of instructions per integer:
In the 32-bit case, the alternative approach saves you 3 or 4 instructions per integer. But in the 64-bit case, the saving appears to be just one instruction which is not necessarily worth it. However, the simpler alternative in the 64-bit case uses significantly fewer instructions.
Next, let us report the number of CPU cycles per integer. I run the tests four times, and I get some variation on one of the tests (Apple platform, 32-bit alternative).
My timings suggest that the conventional (and simpler approach) may use more instructions but could still be preferable in practice. It is especially true in the 64-bit case where the alternative approach is significantly more complicated, and much slower in my tests. The simpler alternative in the 64-bit case is the fastest ?on the x64 system (Ice Lake) but slower on the ARM system.