Counting the digits of 64-bit integers

Counting the digits of 64-bit integers

Given an integer in software, you may want to know how many decimal digits it needs. For example, the integer 100 requires 3 digits, the integer 9999 requires 4 digits.

It would be an easy problem if we could compute the logarithm in base 10 of an integer quickly. Unfortunately, our computers work in base 2 and it is faster to use a base 2 logarithm.

There are fast techniques to solve the digit-counting problem using few instructions. For 32-bit integers, a classical trick found in references such as Hacker’s Delight is as follows:

int digit_count(uint32_t x) {
   static uint32_t table[] = {9, 99, 999, 9999, 99999,
    999999, 9999999, 99999999, 999999999};
    int y = (9 * int_log2(x)) >> 5;
    y += x > table[y];
    return y + 1;
}        

where the int_log2 function simply computes the integer logarithm in base 2. In base 2, there are fast functions to compute the logarithm… if you use GCC on Clang in C or C++, the following might do:

int int_log2(uint64_t x) { return 63 - __builtin_clzll(x | 1); }
        

Observe that I compute the bitwise OR of the input with the integer 1: I want to avoid computing the logarithm of zero. If you have a C++20 capable compiler, you can implement int_log2 with std::bit_width(x|1).

There is a faster function which I credit to Willets:

int alternative_digit_count(uint32_t x) {
  static uint64_t table[] = {
      4294967296,  8589934582,  8589934582,  8589934582,  12884901788,
      12884901788, 12884901788, 17179868184, 17179868184, 17179868184,
      21474826480, 21474826480, 21474826480, 21474826480, 25769703776,
      25769703776, 25769703776, 30063771072, 30063771072, 30063771072,
      34349738368, 34349738368, 34349738368, 34349738368, 38554705664,
      38554705664, 38554705664, 41949672960, 41949672960, 41949672960,
      42949672960, 42949672960};
  return (x + table[int_log2(x)]) >> 32;
}        

What if you have 64-bit integers instead of 32-bit integers?

You can generalize the same function. First of all, we have the conventional fast function:

int digit_count(uint64_t x) {
  static uint64_t table[] = {9,
                             99,
                             999,
                             9999,
                             99999,
                             999999,
                             9999999,
                             99999999,
                             999999999,
                             9999999999,
                             99999999999,
                             999999999999,
                             9999999999999,
                             99999999999999,
                             999999999999999ULL,
                             9999999999999999ULL,
                             99999999999999999ULL,
                             999999999999999999ULL,
                             9999999999999999999ULL};
  int y = (19 * int_log2(x) >> 6);
  y += x > table[y];
  return y + 1;
}        

And then we can port the faster alternative to 64-bit integers, replacing a 64-bit lookup table with a 128-bit lookup table:

int alternative_digit_count(uint64_t x) {
static uint64_t table[64][2] = {
    { 0x01, 0xfffffffffffffff6ULL },
    { 0x01, 0xfffffffffffffff6ULL },
    { 0x01, 0xfffffffffffffff6ULL },
    { 0x01, 0xfffffffffffffff6ULL },
    { 0x02, 0xffffffffffffff9cULL },
    { 0x02, 0xffffffffffffff9cULL },
    { 0x02, 0xffffffffffffff9cULL },
    { 0x03, 0xfffffffffffffc18ULL },
    { 0x03, 0xfffffffffffffc18ULL },
    { 0x03, 0xfffffffffffffc18ULL },
    { 0x04, 0xffffffffffffd8f0ULL },
    { 0x04, 0xffffffffffffd8f0ULL },
    { 0x04, 0xffffffffffffd8f0ULL },
    { 0x04, 0xffffffffffffd8f0ULL },
    { 0x05, 0xfffffffffffe7960ULL },
    { 0x05, 0xfffffffffffe7960ULL },
    { 0x05, 0xfffffffffffe7960ULL },
    { 0x06, 0xfffffffffff0bdc0ULL },
    { 0x06, 0xfffffffffff0bdc0ULL },
    { 0x06, 0xfffffffffff0bdc0ULL },
    { 0x07, 0xffffffffff676980ULL },
    { 0x07, 0xffffffffff676980ULL },
    { 0x07, 0xffffffffff676980ULL },
    { 0x07, 0xffffffffff676980ULL },
    { 0x08, 0xfffffffffa0a1f00ULL },
    { 0x08, 0xfffffffffa0a1f00ULL },
    { 0x08, 0xfffffffffa0a1f00ULL },
    { 0x09, 0xffffffffc4653600ULL },
    { 0x09, 0xffffffffc4653600ULL },
    { 0x09, 0xffffffffc4653600ULL },
    { 0x0a, 0xfffffffdabf41c00ULL },
    { 0x0a, 0xfffffffdabf41c00ULL },
    { 0x0a, 0xfffffffdabf41c00ULL },
    { 0x0a, 0xfffffffdabf41c00ULL },
    { 0x0b, 0xffffffe8b7891800ULL },
    { 0x0b, 0xffffffe8b7891800ULL },
    { 0x0b, 0xffffffe8b7891800ULL },
    { 0x0c, 0xffffff172b5af000ULL },
    { 0x0c, 0xffffff172b5af000ULL },
    { 0x0c, 0xffffff172b5af000ULL },
    { 0x0d, 0xfffff6e7b18d6000ULL },
    { 0x0d, 0xfffff6e7b18d6000ULL },
    { 0x0d, 0xfffff6e7b18d6000ULL },
    { 0x0d, 0xfffff6e7b18d6000ULL },
    { 0x0e, 0xffffa50cef85c000ULL },
    { 0x0e, 0xffffa50cef85c000ULL },
    { 0x0e, 0xffffa50cef85c000ULL },
    { 0x0f, 0xfffc72815b398000ULL },
    { 0x0f, 0xfffc72815b398000ULL },
    { 0x0f, 0xfffc72815b398000ULL },
    { 0x10, 0xffdc790d903f0000ULL },
    { 0x10, 0xffdc790d903f0000ULL },
    { 0x10, 0xffdc790d903f0000ULL },
    { 0x10, 0xffdc790d903f0000ULL },
    { 0x11, 0xfe9cba87a2760000ULL },
    { 0x11, 0xfe9cba87a2760000ULL },
    { 0x11, 0xfe9cba87a2760000ULL },
    { 0x12, 0xf21f494c589c0000ULL },
    { 0x12, 0xf21f494c589c0000ULL },
    { 0x12, 0xf21f494c589c0000ULL },
    { 0x13, 0x7538dcfb76180000ULL },
    { 0x13, 0x7538dcfb76180000ULL },
    { 0x13, 0x7538dcfb76180000ULL },
    { 0x13, 0x7538dcfb76180000ULL },
};
  int log = int_log2(x);
  uint64_t low = table[log][1];
  uint64_t high = table[log][0];
  return (x + low < x ) + high;
}
        

Though my 64-bit code may look mysterious, it is doing the same thing as the original 32-bit function: it adds the input to looked up value, and keep the most significant bits.

I have included a new simpler 64-bit version based on work by Kendall Willets, this time assuming that the compiler supports C++20:

int alternative_digit_count_two_tables(uint64_t x) {
  static int digits[65] = {19, 19, 19, 19, 18, 18, 18, 
                           17, 17, 17, 16, 16, 16,
                           16, 15, 15, 15, 14, 14, 14, 
                           13, 13, 13, 13, 12, 12,
                           12, 11, 11, 11, 10, 10, 10, 
                           10, 9,  9,  9,  8,  8,
                           8,  7,  7,  7,  7,  6,  6,
                           6,  5,  5,  5,  4,  4,
                           4,  4,  3,  3,  3,  2,  2,
                           2,  1,  1,  1,  1,  1};
  static uint64_t table[65] = {18446744073709551615ULL,
                               18446744073709551615ULL,
                               18446744073709551615ULL,
                               18446744073709551615ULL,
                               999999999999999999ULL,
                               999999999999999999ULL,
                               999999999999999999ULL,
                               99999999999999999ULL,
                               99999999999999999ULL,
                               99999999999999999ULL,
                               9999999999999999ULL,
                               9999999999999999ULL,
                               9999999999999999ULL,
                               9999999999999999ULL,
                               999999999999999ULL,
                               999999999999999ULL,
                               999999999999999ULL,
                               99999999999999ULL,
                               99999999999999ULL,
                               99999999999999ULL,
                               9999999999999ULL,
                               9999999999999ULL,
                               9999999999999ULL,
                               9999999999999ULL,
                               999999999999ULL,
                               999999999999ULL,
                               999999999999ULL,
                               99999999999ULL,
                               99999999999ULL,
                               99999999999ULL,
                               9999999999ULL,
                               9999999999ULL,
                               9999999999ULL,
                               9999999999ULL,
                               999999999ULL,
                               999999999ULL,
                               999999999ULL,
                               99999999ULL,
                               99999999ULL,
                               99999999ULL,
                               9999999ULL,
                               9999999ULL,
                               9999999ULL,
                               9999999ULL,
                               999999ULL,
                               999999ULL,
                               999999ULL,
                               99999ULL,
                               99999ULL,
                               99999ULL,
                               9999ULL,
                               9999ULL,
                               9999ULL,
                               9999ULL,
                               999ULL,
                               999ULL,
                               999ULL,
                               99ULL,
                               99ULL,
                               99ULL,
                               9ULL,
                               9ULL,
                               9ULL,
                               9ULL,
                               0ULL};
  int log = std::countl_zero(x);
  uint64_t low = table[log];
  uint64_t high = digits[log];
  return (x > low) + high;
}        

What we want to know is which functions are fast?

To test it out, I generate thousands of random integers and I just sum up their digit counts. I record the number of instructions per integer:


In the 32-bit case, the alternative approach saves you 3 or 4 instructions per integer. But in the 64-bit case, the saving appears to be just one instruction which is not necessarily worth it. However, the simpler alternative in the 64-bit case uses significantly fewer instructions.

Next, let us report the number of CPU cycles per integer. I run the tests four times, and I get some variation on one of the tests (Apple platform, 32-bit alternative).


My timings suggest that the conventional (and simpler approach) may use more instructions but could still be preferable in practice. It is especially true in the 64-bit case where the alternative approach is significantly more complicated, and much slower in my tests. The simpler alternative in the 64-bit case is the fastest ?on the x64 system (Ice Lake) but slower on the ARM system.

My code is available on GitHub.


要查看或添加评论,请登录

Daniel Lemire的更多文章