登录查看更多内容

Parsing tiny and very large floating-point values: a programming-language comparison

Daniel Lemire

Computer Scientist

发布日期: 2024年8月26日

Most programming languages support floating-point numbers. You typically have the ability to turn a string into a floating-point number. E.g., “3.1416” could be parsed as a number close to pi. However strings typically cannot be represented exactly or at all. For example, “1e-1000” is too small and “1e1000” is too large for even 64-bit floating-point types.

Most languages represent these strings as the number zero and the value ‘infinity’. Let us consider Python as an example:

>>> float("1e-1000")
0.0
>>> float("1e1000")
inf

The Go language gives the same result with the caveat that 1e1000 triggers an error (which you can ignore). Consider the following Go code:

package main

import (
    "fmt"
    "strconv"
)

func main() {
    f, err := strconv.ParseFloat("1e-1000", 64)
    fmt.Println(f, err)
    f, err = strconv.ParseFloat("1e1000", 64)
    fmt.Println(f, err)
}

It prints out:

0 
+Inf strconv.ParseFloat: parsing "1e1000": value out of range

The C language also gives you 0 and infinity, but both are consider out of range. Let us consider the following C code…

#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
int main(void) {
? const char *p = "1e-1000 1e1000";
? printf("Parsing '%s':\n", p);
? char *end;
? for (double f = strtod(p, &end); p != end; f = strtod(p, &end)) {
? ? printf("'%.*s' -> ", (int)(end-p), p);
    p = end;
? ? if (errno == ERANGE){
? ? ? printf("range error, got ");
      errno = 0;
? ?}
? ?printf("%f\n", f);
?}
}

It prints out:

领英推荐

MicroPython For micro:bit (Part 6: Classes)

Kevin Thomas 4 年前

Queens game using CP

Alireza Soroudi, PhD 7 个月前

21 hot programming trends—and 21 going cold

r R. 7 年前

Parsing '1e-1000 1e1000':
'1e-1000' -> range error, got 0.000000
' 1e1000' -> range error, got inf

What about C++? Let us consider the following code.

#include <cstdio>
#include <charconv>
#include <string>
int main() {
  for(std::string str : {"1e-1000", "1e1000"}) {
   double value = -1;
   printf("parsing %s\n", str.c_str());
   auto r = std::from_chars(str.data(), str.data() + str.size(), value);
   if(r.ec == std::errc::result_out_of_range) { printf("out of range "); }
   printf("%f\n", value);
  }
  return EXIT_SUCCESS;
}

What does this output?

The answer is: “it depends”.

Under LLVM/libc++, the code does not build because it is still lacking support for floating parsing. (You are expected to use the C function strtod.)

Under Visual Studio, you get the same result as C:

parsing 1e-1000
out of range 0.000000
parsing 1e1000
out of range inf

GCC/glibc++ relies on the fast_float library which follows the same behavior. However, the GCC folks added a special case handling, and they discard the parsed value before returning, thus you cannot distinguish 1e1000 and 1e-1000 when parsing strings with GCC/glibc++: both are unknown values ‘out of range’. You get the following with GCC:

parsing 1e-1000
out of range -1.000000
parsing 1e1000
out of range -1.00000

The -1 value is just the bogus value that I had used to initialize the variable. The C++ language architects are aware of this issue, search for Floating point from_chars API does not distinguish between overflow and underflow.

Interestingly, both GCC/glibc++ and Microsoft Visual Studio will happily return infinity when parsing the string "inf" and not trigger an out of range error.

Jeremie Desgagne-Bouchard

Science Advisor at Evovest | Actuary

6 个月

Thanks, I wasn't aware of that parsing behavior in Julia. Turns out to error on under/overfloat, but happy with inf/Inf/-inf/Inf (I'd guess for the same reason as LLVM/libc++?): julia> parse(Float64, "1e1000") ERROR: ArgumentError: cannot parse "1e1000" as Float64 julia> parse(Float64, "-Inf") -Inf

1 次回应

要查看或添加评论，请登录

Daniel Lemire的更多文章

Multiplying by the inverse is not the same as the division

2025年3月21日

Multiplying by the inverse is not the same as the division

In school, we learn that the division is the same as multiplying the inverse (or reciprocal), that is x / y = x (1/y)…

10 条评论
Speeding up C++ code with template lambdas

2025年3月15日

Speeding up C++ code with template lambdas

Let us consider a simple C++ function which divides all values in a range of integers: A division between two integers…
An overview of parallel programming (Go edition)

2025年3月9日

An overview of parallel programming (Go edition)

In practice, the software we write runs on several processors. Unfortunately, much of what we take for granted on a…
How fast can you open 1000 files?

2025年3月1日

How fast can you open 1000 files?

Jarred Sumner, the main author of the Bun JavaScript engine, commented a few days ago on X that opening many files on…

1 条评论
AVX-512 gotcha: avoid compressing words to memory with AMD Zen 4 processors

2025年2月15日

AVX-512 gotcha: avoid compressing words to memory with AMD Zen 4 processors

Convention computer instructions operate on a single piece of data at once (e.g.

4 条评论
Thread-safe memory copy

2025年2月7日

Thread-safe memory copy

A common operation in software is the copy of a block of memory. In C/C++, we often call the function memcpy for this…

2 条评论
Programmer time and the pitfalls of wasteful work

2025年1月30日

Programmer time and the pitfalls of wasteful work

Programmer time is precious. This realization should shape our approach to software development, focusing our efforts…
Regular expressions can blow up!

2025年1月25日

Regular expressions can blow up!

Regular expressions, often abbreviated as regex, are a powerful tool for pattern matching within text. For example, the…

6 条评论
Checking whether an ARM NEON register is zero

2025年1月20日

Checking whether an ARM NEON register is zero

Your phone probably runs on 64-bit ARM processors. These processors are ubiquitous: they power the Nintendo Switch…
JavaScript hashing speed comparison: MD5 versus SHA-256

2025年1月11日

JavaScript hashing speed comparison: MD5 versus SHA-256

Hashing algorithms convert input data into a fixed-size string of characters, known as a hash value or digest. These…

See all articles

Parsing tiny and very large floating-point values: a programming-language comparison

Daniel Lemire

Computer Scientist

领英推荐

Daniel Lemire的更多文章

社区洞察

其他会员也浏览了

Navigating Rust's Path System: The Role of the Double Colon`::`

The Motivation for the Plato Programming Language

DRY Don't Repeat Yourself

Article 1: Mastering C++ - A Comprehensive Guide for Beginner

DART : An Overview

Article 2: Mastering C++ Basics - Naming Conventions, Constants, Input Handling, Arrays, and Type Casting

Getting Started with Lua

Mutability and Immutability of Variables and their values in JS

Understanding C# Series: Delegates are cool and easy! part?1

领英推荐

Daniel Lemire的更多文章

Multiplying by the inverse is not the same as the division

Speeding up C++ code with template lambdas

An overview of parallel programming (Go edition)

How fast can you open 1000 files?

AVX-512 gotcha: avoid compressing words to memory with AMD Zen 4 processors

Thread-safe memory copy

Programmer time and the pitfalls of wasteful work

Regular expressions can blow up!

Checking whether an ARM NEON register is zero

JavaScript hashing speed comparison: MD5 versus SHA-256

社区洞察

其他会员也浏览了

Navigating Rust's Path System: The Role of the Double Colon`::`

The Motivation for the Plato Programming Language

DRY Don't Repeat Yourself

Article 1: Mastering C++ - A Comprehensive Guide for Beginner

DART : An Overview

Article 2: Mastering C++ Basics - Naming Conventions, Constants, Input Handling, Arrays, and Type Casting

Getting Started with Lua

Mutability and Immutability of Variables and their values in JS

Understanding C# Series: Delegates are cool and easy! part?1