Half-Precision floating point in C#

Half-Precision floating point in C#

Recently I encountered a problem in a system where we needed to use floating point but we had just two bytes memory for it.?

Normal float type in C# is 4 bytes but "Half" type introduces 2 bytes floating point. During using this type I encountered a problem so I started to investigate a little bit about it and I found it very interesting so I decided to share the result with you.

Half-Precision floating point in C#

In C#, there is a type called Half which is used to represent a Half-Precision floating point. Look at this example:

Half h1 = (Half) 1.0;
Half h2 = (Half) 1.5;

Console.WriteLine($”{h1} + {h2} = {h3}”);        

The output will be”

1 + 1.5 = 2.5        

It looks simple, isn’t it? OK, let’s look at another example:

Half h1 = (Half) 1000.0;
Half h2 = (Half) 0.123;

Console.WriteLine($”{h1} + {h2} = {h3}”);        

And the output is not we really expected:

1000 + 0.123 = 1000        

Why? Well the answer is not simple, we need to go deep into the structure of Half-Precision format.

What is Half-Precision floating point?

Based on Wikipedia:

In computing, half precision (sometimes called FP16 or float16) is a binary floating-point computer number format that occupies 16 bits (two bytes in modern computers) in computer memory. It is intended for storage of floating-point values in applications where higher precision is not essential, in particular image processing and neural networks.

Simply that means Half-precision floating point is a way to define floating point numbers only in two bytes.

Binary representation of Floating Points

Before going into the detailed structure of Half-Precision floating point, I would like to talk about how a floating point is represented in binary, although I guess most of you are familiar with it.

As you know, in binary, numbers are represented using only two digits: 0 and 1, and the place values are powers of 2 (just like in decimal, place values are powers of 10).

A floating number can be divided into two parts: integer part and fractional part for example in 3.5, the integer part is 3 and the fractional part is 0.5.

Let’s look at each part individually.

Integer part:

In out 10 based number each position of digits has a value of power of 10s

… 10^8 10^7 10^6 10^5 10^4 10^3 10^2 10^1 10^0        

For example 1357 is equal to 1 x 10^3 + 3 x 10^2 + 5 x 10^1 + 7 x 10^0

Binary numbers are the same except that each location has a value of power of 2

2^8 2^7 2^6 2^5 2^4 2^3 2^2 2^1 2^0        

So 1001 is equal to 1 x 2^3 + 0 x 2^2 + 0 ^ 2^1 + 1 * 2^00 = 8 + 0 + 0 + 1 = 9

So 3 is equal to 11 in binary

Fractional part:

The fractional part is the same except that the exponent is negative:

0.346 = 3 x 10^-1 + 4 x 10^-2 + 6 x 10^-3        

It is the same for binary:

0.101 = 1 x 2^-1 + 0 x 2^-2 + 1 x 2^-3 = 0.5 + 0 + 0.125 = 0.625        

So 0.5 in decimal is equivalent to 0.1 in binary

Ok so 3.5 in decimal is 11.1 in binary.

So far so good.

Structure of a Half-Precision floating point

Suppose we have 12345. There are many different way to show this number:

12345 = 123.45 * 10^2?        

123.45 is called significand and 102 is called power term.

We cal also note this number as:

123.45 = 1.2345 * 10^4?        

Which is called a scientific notation. In this notation the significand is always between 1.0 and 10 and is called normal significand.

There is another notation for the same number:

123.45 = 0.12345 * 10^5?        

If the significand is always between 0 to 1 it is called true normal significand.

There are several ways to be able to fit floating points in two bytes but the most common standard is IEEE 754-2008.

Before going into this standard we need to?

Based on this standard the 16 bit is divided into 3 parts:

  • 1 sign bit: indicating whether the number is positive or negative.
  • 5 exponent bits: representing the magnitude of the number.
  • 10 mantissa (fraction) bits: representing the fractional part of the number.


The Half format, also called binary16 (IEEE 754), has various valid ranges depending on whether it represents:

  1. Normalized numbers
  2. Subnormal (denormalized) numbers
  3. Zero
  4. Infinity
  5. NaN (Not a Number)

The following table that I took from Wikipedia explains these categories very well:

Closer look to this table shows that:

Zero

If the Exponent part is 000002 and the significand part is also zero then the number will be zero.

Subnormal numbers

If the Exponent part is 000002 so we have a subnormal numbers that their values calculated with this formula:?

(?1)^signbit × 2^?14 × 0.significantbits        

The first part ((?1)^signbit ) generates positive or negative numbers and the second part (2^?14) is fixed. The last part (0.significantbits) is significand.

In this category (subnormal numbers) the smallest positive number is:??

(?1)^0 × 2^?14 × 0.0000000001 = 1 x 2^-14 x 1/1024 = 0.000000059604645        

And the biggest positive number is:?

(?1)^0 × 2^?14 × 0.1111111111 = 1 x 2^-14 x 1023/1024 = 0.000060975552        

Note that because of the bitwise nature if this representation, we can not move from 0.000000059604645 to 0.000000059604646 because the next significand is 0.0000000010 which give us 0.000000119209289550781 that means in this category we don’t have any possible number between the minimum and maximum.

Normal numbers

If the exponent is between 000012 and 111102 then we have normal number which is calculated with this formula:

(?1)^signbit × 2^(exponent?15) × 1.significantbits?        

For simplicity I just talk about positive numbers.?

The smallest positive normal number is 2^-14 * (1 + 0/1024) = 0.00006103515625

The biggest positive normal number is 2^15 * (1 + 1023/1024) = 65504

Infinit

If the exponent is 111112 and significand is 0 it interprets as infinit number.

NaN

And finally if the exponent is 111112? and the significand is anything except 0 then it interprets as invalid number of Not a Number or NaN.

Back to problem

Now that we have a pretty good understanding of Half-precision floating point structure let’s back to our problem and see why 1000 + 0.123 is still 1000

1000 is belongs to normal number category and it stores like this:

1000(based 10) = 0110001111010000(based 2) = 0 + 224-15 * (1 + 11110100002/102410) = 29 + (1 + 976/1024)

0.123(based 10) = 0010111111011111(based 2)

The nearest next number to 1000(based 10) is 0110001111010001(based 2) because we just changed the lowest significant bit and it is 1000.5

So we can see that 1000.123 is not available and at this level our accuracy is around 0.5.

Example

To have a better view on Half-precision floating point, I wrote a Console app that has several commands.?

range: Using this command you can see all available “Half” numbers between two given numbers.

minmax: This command calculates minimum and maximum value for Half numbers, minimum and maximum of positive Subnormal numbers category in Half and minimum and maximum of positive Normal numbers category in Half.

binary: This command shows the binary representation of a Half number.

convert: Usign this command you can convert any number to a nearest valid Half number

add: This command can be used to add to Half number and show the result.

You can find the source code of the console app in my GitHub in the following link:

https://github.com/amirdoosti6060/HalfPrecisionFloat/tree/master

Why Half-Precision Floating point

So if we can not generate every number, why should we use Half-Precision floating point? Is it useful?

Short answer is that they are primarily used in scenarios where reduced precision is acceptable, but storage and computational efficiency are critical.?

Here is detailed answer:

Reduced Memory Usage:

  • Smaller storage: A Half occupies 2 bytes (16 bits) of memory, while a float takes 4 bytes (32 bits), and a double takes 8 bytes (64 bits). This makes Half useful in memory-constrained environments.
  • Efficient data transmission: In systems that need to transmit large datasets, such as image or 3D model data, using Half can reduce bandwidth consumption.

Increased Computational Speed:

  • Many modern hardware architectures (especially GPUs) have dedicated support for FP16 arithmetic. Processing data in FP16 can be significantly faster than using higher precision floating points, particularly in vectorized operations.
  • Lower power consumption: FP16 operations can be more power-efficient, especially in embedded systems and mobile devices.

Applications Where Full Precision Isn't Necessary:

  • In some domains, the additional precision provided by 32-bit or 64-bit floating point numbers is unnecessary. FP16 can provide adequate precision in these cases while benefiting from performance and memory savings.

Practical Usages of Half-Precision Floating Point

  1. Machine Learning & Neural Networks:
  2. Graphics and Game Development:
  3. Embedded Systems and Mobile Devices:
  4. Video and Image Processing:
  5. Scientific Simulations:

Conclusion

Half-precision floating-point numbers (FP16) offer a trade-off between precision and efficiency, making them suitable for applications where high performance and low memory usage are more important than numerical accuracy. They are most useful in areas like machine learning, graphics, embedded systems, and signal processing, where lower precision is acceptable for the task at hand. But using it in sensitive applications needs more attention.


#floatingpoint #halfprecisionfloatingpoint #csharp #dotnet


Vahid Tavakolpour

Senior Java Developer @ Code Nomads

5 个月

Nice finding, thanks for sharing!

Meysam Karimi

Java Software Engineer

5 个月

??

要查看或添加评论,请登录

Amir Doosti的更多文章

  • Network Programming in c# - Part 2 (HTTP programming)

    Network Programming in c# - Part 2 (HTTP programming)

    In the previous article I talked about Socket programming. Today, I’m going to cover another type of network…

  • Network Programming in C# - Part 1

    Network Programming in C# - Part 1

    Network programming in C# involves using .NET libraries to communicate between systems over a network.

    2 条评论
  • Locking (Synchronization) in C#

    Locking (Synchronization) in C#

    Concurrency and multithreading are powerful features in modern programming, but they bring challenges, especially in…

    6 条评论
  • Plotting in C# (Part 4 - ScottPlot)

    Plotting in C# (Part 4 - ScottPlot)

    ScottPlot is an open-source, .NET-based charting library designed for creating high-performance, interactive plots in…

  • Plotting in C# (Part 3 - OxyPlot)

    Plotting in C# (Part 3 - OxyPlot)

    OxyPlot is a lightweight, open-source plotting library designed specifically for .NET applications, supporting…

    2 条评论
  • Plotting in C#.Net (Part2 - LiveCharts2)

    Plotting in C#.Net (Part2 - LiveCharts2)

    LiveCharts is a versatile and modern charting library that supports a variety of charts and visualizations with smooth…

  • Plotting in C#.Net (Part 1 - General)

    Plotting in C#.Net (Part 1 - General)

    Plotting is a crucial tool for data analysis, visualization, and communication. There are many reasons why we need to…

    2 条评论
  • Working with Excel files in .Net

    Working with Excel files in .Net

    Using Excel files in software applications is common for several reasons, as they provide a practical and versatile…

  • ReadOnly Collections vs Immutable Collections

    ReadOnly Collections vs Immutable Collections

    In C#, both readonly collections and immutable collections aim to prevent modifications to the collection, but they…

  • Deconstructor in C#

    Deconstructor in C#

    I have often heard developers use deconstructor and destructor interchangeably while these words have completely two…

社区洞察

其他会员也浏览了