登录查看更多内容

Counting the digits of 64-bit integers

Daniel Lemire

Computer Scientist

发布日期: 2025年1月7日

Given an integer in software, you may want to know how many decimal digits it needs. For example, the integer 100 requires 3 digits, the integer 9999 requires 4 digits.

It would be an easy problem if we could compute the logarithm in base 10 of an integer quickly. Unfortunately, our computers work in base 2 and it is faster to use a base 2 logarithm.

There are fast techniques to solve the digit-counting problem using few instructions. For 32-bit integers, a classical trick found in references such as Hacker’s Delight is as follows:

int digit_count(uint32_t x) {
   static uint32_t table[] = {9, 99, 999, 9999, 99999,
    999999, 9999999, 99999999, 999999999};
    int y = (9 * int_log2(x)) >> 5;
    y += x > table[y];
    return y + 1;
}

where the int_log2 function simply computes the integer logarithm in base 2. In base 2, there are fast functions to compute the logarithm… if you use GCC on Clang in C or C++, the following might do:

int int_log2(uint64_t x) { return 63 - __builtin_clzll(x | 1); }

Observe that I compute the bitwise OR of the input with the integer 1: I want to avoid computing the logarithm of zero. If you have a C++20 capable compiler, you can implement int_log2 with std::bit_width(x|1).

There is a faster function which I credit to Willets:

int alternative_digit_count(uint32_t x) {
  static uint64_t table[] = {
      4294967296,  8589934582,  8589934582,  8589934582,  12884901788,
      12884901788, 12884901788, 17179868184, 17179868184, 17179868184,
      21474826480, 21474826480, 21474826480, 21474826480, 25769703776,
      25769703776, 25769703776, 30063771072, 30063771072, 30063771072,
      34349738368, 34349738368, 34349738368, 34349738368, 38554705664,
      38554705664, 38554705664, 41949672960, 41949672960, 41949672960,
      42949672960, 42949672960};
  return (x + table[int_log2(x)]) >> 32;
}

What if you have 64-bit integers instead of 32-bit integers?

You can generalize the same function. First of all, we have the conventional fast function:

int digit_count(uint64_t x) {
  static uint64_t table[] = {9,
                             99,
                             999,
                             9999,
                             99999,
                             999999,
                             9999999,
                             99999999,
                             999999999,
                             9999999999,
                             99999999999,
                             999999999999,
                             9999999999999,
                             99999999999999,
                             999999999999999ULL,
                             9999999999999999ULL,
                             99999999999999999ULL,
                             999999999999999999ULL,
                             9999999999999999999ULL};
  int y = (19 * int_log2(x) >> 6);
  y += x > table[y];
  return y + 1;
}

And then we can port the faster alternative to 64-bit integers, replacing a 64-bit lookup table with a 128-bit lookup table:

int alternative_digit_count(uint64_t x) {
static uint64_t table[64][2] = {
    { 0x01, 0xfffffffffffffff6ULL },
    { 0x01, 0xfffffffffffffff6ULL },
    { 0x01, 0xfffffffffffffff6ULL },
    { 0x01, 0xfffffffffffffff6ULL },
    { 0x02, 0xffffffffffffff9cULL },
    { 0x02, 0xffffffffffffff9cULL },
    { 0x02, 0xffffffffffffff9cULL },
    { 0x03, 0xfffffffffffffc18ULL },
    { 0x03, 0xfffffffffffffc18ULL },
    { 0x03, 0xfffffffffffffc18ULL },
    { 0x04, 0xffffffffffffd8f0ULL },
    { 0x04, 0xffffffffffffd8f0ULL },
    { 0x04, 0xffffffffffffd8f0ULL },
    { 0x04, 0xffffffffffffd8f0ULL },
    { 0x05, 0xfffffffffffe7960ULL },
    { 0x05, 0xfffffffffffe7960ULL },
    { 0x05, 0xfffffffffffe7960ULL },
    { 0x06, 0xfffffffffff0bdc0ULL },
    { 0x06, 0xfffffffffff0bdc0ULL },
    { 0x06, 0xfffffffffff0bdc0ULL },
    { 0x07, 0xffffffffff676980ULL },
    { 0x07, 0xffffffffff676980ULL },
    { 0x07, 0xffffffffff676980ULL },
    { 0x07, 0xffffffffff676980ULL },
    { 0x08, 0xfffffffffa0a1f00ULL },
    { 0x08, 0xfffffffffa0a1f00ULL },
    { 0x08, 0xfffffffffa0a1f00ULL },
    { 0x09, 0xffffffffc4653600ULL },
    { 0x09, 0xffffffffc4653600ULL },
    { 0x09, 0xffffffffc4653600ULL },
    { 0x0a, 0xfffffffdabf41c00ULL },
    { 0x0a, 0xfffffffdabf41c00ULL },
    { 0x0a, 0xfffffffdabf41c00ULL },
    { 0x0a, 0xfffffffdabf41c00ULL },
    { 0x0b, 0xffffffe8b7891800ULL },
    { 0x0b, 0xffffffe8b7891800ULL },
    { 0x0b, 0xffffffe8b7891800ULL },
    { 0x0c, 0xffffff172b5af000ULL },
    { 0x0c, 0xffffff172b5af000ULL },
    { 0x0c, 0xffffff172b5af000ULL },
    { 0x0d, 0xfffff6e7b18d6000ULL },
    { 0x0d, 0xfffff6e7b18d6000ULL },
    { 0x0d, 0xfffff6e7b18d6000ULL },
    { 0x0d, 0xfffff6e7b18d6000ULL },
    { 0x0e, 0xffffa50cef85c000ULL },
    { 0x0e, 0xffffa50cef85c000ULL },
    { 0x0e, 0xffffa50cef85c000ULL },
    { 0x0f, 0xfffc72815b398000ULL },
    { 0x0f, 0xfffc72815b398000ULL },
    { 0x0f, 0xfffc72815b398000ULL },
    { 0x10, 0xffdc790d903f0000ULL },
    { 0x10, 0xffdc790d903f0000ULL },
    { 0x10, 0xffdc790d903f0000ULL },
    { 0x10, 0xffdc790d903f0000ULL },
    { 0x11, 0xfe9cba87a2760000ULL },
    { 0x11, 0xfe9cba87a2760000ULL },
    { 0x11, 0xfe9cba87a2760000ULL },
    { 0x12, 0xf21f494c589c0000ULL },
    { 0x12, 0xf21f494c589c0000ULL },
    { 0x12, 0xf21f494c589c0000ULL },
    { 0x13, 0x7538dcfb76180000ULL },
    { 0x13, 0x7538dcfb76180000ULL },
    { 0x13, 0x7538dcfb76180000ULL },
    { 0x13, 0x7538dcfb76180000ULL },
};
  int log = int_log2(x);
  uint64_t low = table[log][1];
  uint64_t high = table[log][0];
  return (x + low < x ) + high;
}

Though my 64-bit code may look mysterious, it is doing the same thing as the original 32-bit function: it adds the input to looked up value, and keep the most significant bits.

I have included a new simpler 64-bit version based on work by Kendall Willets, this time assuming that the compiler supports C++20:

int alternative_digit_count_two_tables(uint64_t x) {
  static int digits[65] = {19, 19, 19, 19, 18, 18, 18, 
                           17, 17, 17, 16, 16, 16,
                           16, 15, 15, 15, 14, 14, 14, 
                           13, 13, 13, 13, 12, 12,
                           12, 11, 11, 11, 10, 10, 10, 
                           10, 9,  9,  9,  8,  8,
                           8,  7,  7,  7,  7,  6,  6,
                           6,  5,  5,  5,  4,  4,
                           4,  4,  3,  3,  3,  2,  2,
                           2,  1,  1,  1,  1,  1};
  static uint64_t table[65] = {18446744073709551615ULL,
                               18446744073709551615ULL,
                               18446744073709551615ULL,
                               18446744073709551615ULL,
                               999999999999999999ULL,
                               999999999999999999ULL,
                               999999999999999999ULL,
                               99999999999999999ULL,
                               99999999999999999ULL,
                               99999999999999999ULL,
                               9999999999999999ULL,
                               9999999999999999ULL,
                               9999999999999999ULL,
                               9999999999999999ULL,
                               999999999999999ULL,
                               999999999999999ULL,
                               999999999999999ULL,
                               99999999999999ULL,
                               99999999999999ULL,
                               99999999999999ULL,
                               9999999999999ULL,
                               9999999999999ULL,
                               9999999999999ULL,
                               9999999999999ULL,
                               999999999999ULL,
                               999999999999ULL,
                               999999999999ULL,
                               99999999999ULL,
                               99999999999ULL,
                               99999999999ULL,
                               9999999999ULL,
                               9999999999ULL,
                               9999999999ULL,
                               9999999999ULL,
                               999999999ULL,
                               999999999ULL,
                               999999999ULL,
                               99999999ULL,
                               99999999ULL,
                               99999999ULL,
                               9999999ULL,
                               9999999ULL,
                               9999999ULL,
                               9999999ULL,
                               999999ULL,
                               999999ULL,
                               999999ULL,
                               99999ULL,
                               99999ULL,
                               99999ULL,
                               9999ULL,
                               9999ULL,
                               9999ULL,
                               9999ULL,
                               999ULL,
                               999ULL,
                               999ULL,
                               99ULL,
                               99ULL,
                               99ULL,
                               9ULL,
                               9ULL,
                               9ULL,
                               9ULL,
                               0ULL};
  int log = std::countl_zero(x);
  uint64_t low = table[log];
  uint64_t high = digits[log];
  return (x > low) + high;
}

What we want to know is which functions are fast?

To test it out, I generate thousands of random integers and I just sum up their digit counts. I record the number of instructions per integer:

In the 32-bit case, the alternative approach saves you 3 or 4 instructions per integer. But in the 64-bit case, the saving appears to be just one instruction which is not necessarily worth it. However, the simpler alternative in the 64-bit case uses significantly fewer instructions.

Next, let us report the number of CPU cycles per integer. I run the tests four times, and I get some variation on one of the tests (Apple platform, 32-bit alternative).

My timings suggest that the conventional (and simpler approach) may use more instructions but could still be preferable in practice. It is especially true in the 64-bit case where the alternative approach is significantly more complicated, and much slower in my tests. The simpler alternative in the 64-bit case is the fastest ?on the x64 system (Ice Lake) but slower on the ARM system.

My code is available on GitHub.

查看更多评论

要查看或添加评论，请登录

Daniel Lemire的更多文章

Multiplying by the inverse is not the same as the division

2025年3月21日

Multiplying by the inverse is not the same as the division

In school, we learn that the division is the same as multiplying the inverse (or reciprocal), that is x / y = x (1/y)…

19 条评论
Speeding up C++ code with template lambdas

2025年3月15日

Speeding up C++ code with template lambdas

Let us consider a simple C++ function which divides all values in a range of integers: A division between two integers…
An overview of parallel programming (Go edition)

2025年3月9日

An overview of parallel programming (Go edition)

In practice, the software we write runs on several processors. Unfortunately, much of what we take for granted on a…
How fast can you open 1000 files?

2025年3月1日

How fast can you open 1000 files?

Jarred Sumner, the main author of the Bun JavaScript engine, commented a few days ago on X that opening many files on…

1 条评论
AVX-512 gotcha: avoid compressing words to memory with AMD Zen 4 processors

2025年2月15日

AVX-512 gotcha: avoid compressing words to memory with AMD Zen 4 processors

Convention computer instructions operate on a single piece of data at once (e.g.

4 条评论
Thread-safe memory copy

2025年2月7日

Thread-safe memory copy

A common operation in software is the copy of a block of memory. In C/C++, we often call the function memcpy for this…

2 条评论
Programmer time and the pitfalls of wasteful work

2025年1月30日

Programmer time and the pitfalls of wasteful work

Programmer time is precious. This realization should shape our approach to software development, focusing our efforts…
Regular expressions can blow up!

2025年1月25日

Regular expressions can blow up!

Regular expressions, often abbreviated as regex, are a powerful tool for pattern matching within text. For example, the…

6 条评论
Checking whether an ARM NEON register is zero

2025年1月20日

Checking whether an ARM NEON register is zero

Your phone probably runs on 64-bit ARM processors. These processors are ubiquitous: they power the Nintendo Switch…
JavaScript hashing speed comparison: MD5 versus SHA-256

2025年1月11日

JavaScript hashing speed comparison: MD5 versus SHA-256

Hashing algorithms convert input data into a fixed-size string of characters, known as a hash value or digest. These…

See all articles

Daniel Lemire的更多文章

Multiplying by the inverse is not the same as the division

Speeding up C++ code with template lambdas

An overview of parallel programming (Go edition)

How fast can you open 1000 files?

AVX-512 gotcha: avoid compressing words to memory with AMD Zen 4 processors

Thread-safe memory copy

Programmer time and the pitfalls of wasteful work

Regular expressions can blow up!

Checking whether an ARM NEON register is zero

JavaScript hashing speed comparison: MD5 versus SHA-256