C: Floating point approximations

How is it that even exact mathematical solutions involving real numbers (which are not really real, they are assumed to be real) are only approximations?

Note: We are here ignoring discrete mathematics that involve only integers.

Slide rule were once used to make computations. I always called them slide rulers, but the name appears to be slide rule, not slide ruler.

When using a slide rule, it was very evident that any computation involving real numbers was an approximation.

Computers approximate results too, but this may not be as obvious. Any floating point number (e.g., that represent real numbers) has an inherent rounding error.

What are the values of the following (using a computer program)?

1/10 = 1/10 2/10 = 1/10 + 1/10 3/10 = 1/10 + 1/10 + 1/10 4/10 = 1/10 + 1/10 + 1/10 + 1/10 5/10 = 1/10 + 1/10 + 1/10 + 1/10 + 1/10 6/10 = 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 7/10 = 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 8/10 = 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 9/10 = 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 10/10 = 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10

Note: These are rational numbers, not irrational or transcendental numbers.

Here is the result from a mathematics point of view.

1/10 = 0.1 2/10 = 0.2 3/10 = 0.3 4/10 = 0.4 5/10 = 0.5 6/10 = 0.6 7/10 = 0.7 8/10 = 0.8 9/10 = 0.9 10/10 = 1.0

Here is the result from a computation point of view.

1/10 = 0.10000000000000001 2/10 = 0.20000000000000001 3/10 = 0.30000000000000004 4/10 = 0.40000000000000002 5/10 = 0.50000000000000000 6/10 = 0.59999999999999998 7/10 = 0.69999999999999996 8/10 = 0.79999999999999993 9/10 = 0.89999999999999991 10/10 = 0.99999999999999989

Note: Even when symbolic math can solve a problems, any attempt to compute real answers involves the same approximations.

Here is the Lua program code that computes the above results.

sum1 = 0.0 for count1=1,10 do sum1 = sum1 + 0.1 print("\n\t".. count1 .. "/10 = " .. string.format("%1.17f", sum1)) end

The difference is small at each step but the difference can add up to a significant amount over many iterations.

Note: One can fix this specific instance by using a fixed decimal notation, but that only works, in base 10, for numbers and increments that are multiples of 2 or 5, the prime factors of 10. These rounding errors, or finite approximations, are the source of a field called chaos theory, where roundoff errors can cause little errors in approximations to, over time, create real and apparently random fluctuations in results.

Rounding errors need to be addressed in fields of computer science such as numerical analysis.

Newton's theory of gravity was a triumph of human thought and of mathematics.

Newton's theory allows the exact mathematical solution of the interrelated motions of two heavenly bodies.

In classical mechanics, the two-body problem is to determine the motion of two point particles that interact only with each other. (Wikipedia)

If the sun and earth are considered point masses, then Newton's gravitational equation can be used to determine exactly the attraction and orbit, etc., of the earth around the sun. More precisely, they both go around each other.

Assume that the bodies are collapsed to points.

The theory works well for sun and earth.

Likewise, if the earth and moon earth are considered point masses, then Newton's gravitational equation can be used to determine exactly the attraction and orbit, etc., of the moon around the earth. More precisely, they both go around each other.

The theory works well for earth and moon.

However ...

The above are called two body problems.

Sun, Earth and Moon

Unfortunately, no one has ever been able to mathematically solve the three body problem.

For example, the sun, earth, and moon.

So how can a problem such as weather prediction be solved exactly? It can't.

Weather is not really "predicted" (exactly)
Weather is "forecast" (approximately)

Why do we not just approximate it over time to get the solution? We can't.

Any attempt to approximate solutions result in round-off errors that accumulate and can cause the same model to vary widely based in the initial conditions.

A floating point representation approximates the mathematical real numbers.

In c, a floating point approximation is called a double number. There are two primary ways to represent real number literal approximations.

A decimal notation, with an integer and fractional part separated by a decimal point, such as 1234.567
An exponential notation, also with an integer and fractional part, but followed by a scale factor (representing the power of 10). For the scientific (mathematical) notation 1.234567 x 10², the exponential notation (real approximation) is 1.234567E+02.

It is good practice to always include a decimal point when expressing a real number approximation and to write nonempty integer and fractional parts. So, write 1.0 instead of 1. or 1, and write 0.1 instead of .1. Floating point numbers are only an approximation to the mathematical real numbers. Consider the real (rational) number 2/3. In base ten, this is represented as

0.666666...

Note: The mathematical repetend notation is a finite representation of an infinite object. At some point, a computer (unless it is directly representing rational numbers as numerator and denominator) must round the stored value. This introduces a small roundoff, or truncation, error. Someone once said that real numbers are a lot like sand piles. Every time you move one, you lose a little sand and you pick up a little dirt. So on most computers, when you write 0.1, you will not get an exact representation of the mathematical 1/10, but something like

0.0999999999...

This is because computers, for efficiency, usually represent numbers in base two, and 1/10 cannot be exactly represented without roundoff error.

It is good programming practice not to mix integer and real arithmetic when writing arithmetic expressions. In all such expressions, make explicit conversions.

When should you use integers and when should you use reals? When something can be counted, you should represent that value with an integer. People can be counted; we do not speak of 0.5 of a person. When something cannot be counted, but can be measured, you should represent that value with a real number approximation. It is not reasonable to count grains of sand or molecules of water, so sand piles and water should be measured (approximated as a real number). But as soon as the water is put into gallon containers to be sold, the containers containing the measured water can be counted. Statistical results are measures that are used for approximation and making decisions, and should be approximated by real numbers. How would you represent the following - count or measure?

amount of dirt in a dump-truck
number of loads of dirt removed each day
dollars and cents of the federal budget
size of the average family

The following are standard arithmetic operators for real number approximations.

Addition using binary infix arithmetic operator +
Subtraction using binary infix arithmetic operator -
Multiplication using binary infix arithmetic operator *
Division using binary infix arithmetic operator /

These operations work in the same manner as integer arithmetic, except that real division is not the same as integer quotient and remainder (real division includes the decimal point and fractional part; the remainder is not defined).

When many languages added a floating point approximation data type, it was, as in C, called float - for floating point approximation

Later, additional precision was added. Since it was a double precision floating point approximation, the data type was, as in C, called double.

Never use a float data type unless you have a compelling reason.

When working with dollars and cents, never use a double. When forced to use a double (as in JavaScript) be very careful.

In general, the following approach can be used.

Input: Get dollars and cents and convert everything to cents.
Process: Do all processing in terms of cents, not dollars and cents.
Output: Convert cents to dollars and cents.

Never compare two double floating approximations for equality or inequality.

If you are using a double as an integer, this can work. But in general, such comparisons can cause undesired effects.

Example: Using assert for equality of double values in, say, a CS 101 programming class using C.

Here is the C code.

#include <stdio.h>
#include <stdbool.h>
#include <assert.h>

int main() {
	int i1 = 0;
	int n1 = 10;
	double d1 = 0.0;
	while (i1 != n1) {
		d1 += 0.1;
		i1++;
		printf("i1=%d d1=[%30.28lf]\n",i1,d1);
		}
	/// the following assertion will fail
	/// assert(d1 == 1.0);
	return 0;
	}

Here is the output of the C code.

i1=1 d1=[0.1000000000000000055511151231]
i1=2 d1=[0.2000000000000000111022302463]
i1=3 d1=[0.3000000000000000444089209850]
i1=4 d1=[0.4000000000000000222044604925]
i1=5 d1=[0.5000000000000000000000000000]
i1=6 d1=[0.5999999999999999777955395075]
i1=7 d1=[0.6999999999999999555910790150]
i1=8 d1=[0.7999999999999999333866185225]
i1=9 d1=[0.8999999999999999111821580300]
i1=10 d1=[0.9999999999999998889776975375]

Here is the Java code.

public class _01 {

	public _01() {
		int i1 = 0;
		int n1 = 10;
		double d1 = 0.0;
		while (i1 != n1) {
			d1 += 0.1;
			i1++;
			System.out.print(String.format("i1=%d d1=[%30.28f]\n",i1,d1));
			}
		}

	public static void main(String [] args) {
		new _01();
		}
	}

Here is the output of the Java code.

i1=1 d1=[0.1000000000000000000000000000]
i1=2 d1=[0.2000000000000000000000000000]
i1=3 d1=[0.3000000000000000400000000000]
i1=4 d1=[0.4000000000000000000000000000]
i1=5 d1=[0.5000000000000000000000000000]
i1=6 d1=[0.6000000000000000000000000000]
i1=7 d1=[0.7000000000000000000000000000]
i1=8 d1=[0.7999999999999999000000000000]
i1=9 d1=[0.8999999999999999000000000000]
i1=10 d1=[0.9999999999999999000000000000]

Here is the Python code.

i1 = 0
n1 = 10
d1 = 0.0
while i1 != n1:
	d1 += 0.1
	i1 += 1
	print("i1={0:d} d1=[{1:30.28f}]".format(i1,d1))

Here is the output of the Python code.

i1=1 d1=[0.1000000000000000055511151231]
i1=2 d1=[0.2000000000000000111022302463]
i1=3 d1=[0.3000000000000000444089209850]
i1=4 d1=[0.4000000000000000222044604925]
i1=5 d1=[0.5000000000000000000000000000]
i1=6 d1=[0.5999999999999999777955395075]
i1=7 d1=[0.6999999999999999555910790150]
i1=8 d1=[0.7999999999999999333866185225]
i1=9 d1=[0.8999999999999999111821580300]
i1=10 d1=[0.9999999999999998889776975375]

In practice, to compare double values for equality or inequality, one picks a very small value such that if the value is within this range, the values are considered equal (or unequal if out of this range).