FLOATING POINT ARITHMETIC
NJR Muniraj
Professor and Dean, Department of Electronics and Communication Engineering, SNS College of Technology
Floating Point Arithmetic Signal processing is done in two categories of numbers. One is based on fixed point arithmetic and another one is based on floating point arithmetic. Since most of the Signal processing techniques necessitate large dynamic signal range to achieve accuracy, fixed point representation provides unsatisfactory results.
IEEE single precision format, that uses 32 bits, has been used for the proposed ICA algorithm. The 32 bit Floating Point number (F) is represented by (1) F= (-1)S X 2 E-127 X (1.M) (1) The sign field (-1)S is used to specify the sign of the real number. Exponent field (2 E-127) is represented by using a bias of 127. It is a 8 bit quantity. The third field (1.M) is normalized binary significand with a hidden integer bit 1. Since leading one in the mantissa is implicit, it does not appear in the representation. Addition, multiplication, subtraction, division and square root operations are carried out following the appropriate algorithms of single precision IEEE 754 standard.