Floating point problem in computer

3 min readMay 16, 2021

What is a float ?

The problem lies deeper, beyond languages, in how computers store data, particularly numbers. Let’s start with a little theory of how numbers are represented in general.

Wikipedia says that floating-point arithmetic uses “formulaic representation of real numbers as an approximation,” and goes on to explain the formula

A number is, in general, represented approximately to a fixed number of significant digits (the significand) and scaled using an exponent in some fixed base. A number that can be represented exactly is of the following form:
1.2345 = 12345 x 10^-4

Float to computer…

let’s see how those numbers are represented in computers. That is dictated by IEEE 754, a standard established in 1985. IEEE 754 defines many different formats. The one most relevant to us is the double precision format.

Double precision is a binary format — meaning that the base is 2 — that occupies 64 bits:

Decimals are converted in two steps: first the integral part, then the fractional part. The integral part is zero, so that’s easy. The fraction not so much: we’ll have to multiply it by two, subtract the integral part of the result, and keep doing this until we reach zero:

0.3 x 2 = 0.6 → 0
0.6 x 2 = 1.2 → 1
0.2 x 2 = 0.4 → 0
0.4 x 2 = 0.8 → 0
0.8 x 2 = 1.6 → 1
0.6 x 2 = 1.2 → 1
0.2 x 2= 0.4 → 0( we’re back at step three)

It turns out that decimal 0.3 converts to binary 0.01(0011) repeating.

Error

To better understand the problem of binary floating point rounding errors, examples from our well-known decimal system can be used. The fraction 1/3 looks very simple. Its result is a little more complicated: 0.333333333…with an infinitely repeating number of 3s. Even in our well-known decimal system, we reach such limitations where we have too many digits. We often shorten (round) numbers to a size that is convenient for us and fits our needs. For example, 1/3 could be written as 0.333.

What happens if we want to calculate (1/3) + (1/3)? If we add the results 0.333 + 0.333, we get 0.666. However, if we add the fractions (1/3) + (1/3) directly, we get 0.6666666. Again, with an infinite number of 6s, we would most likely round it to 0.667.

This example shows that if we are limited to a certain number of digits, we quickly loose accuracy. After only one addition, we already lost a part that may or may not be important (depending on our situation). If we imagine a computer system that can only represent three fractional digits, the example above shows that the use of rounded intermediate results could propagate and cause wrong end results.

Solution for floating point Error

The BigDecimal class provides operations on double numbers for arithmetic, scale handling, rounding, comparison, format conversion and hashing. It can handle very large and very small floating point numbers with great precision but compensating with the time complexity a bit.

Input : double a=0.03;
        double b=0.04;
        double c=b-a;
        System.out.println(c);
Output :0.009999999999999998

after using bigdesimal

Input : BigDecimal a = new BigDecimal("0.03");
        BigDecimal b = new BigDecimal("0.04");
        BigDecimal c = b.subtract(a);
        System.out.println(c);
Output :0.01

Floating point problem in computer

Float to computer…

Written by Anuradha Gunasinghe