As promised, the post that will reveal the details of the support of IEEE 754 in H++.

If after reading this post, you think you will need a more in depth discusion of IEEE 754 and its relation with the programming languages, please read Michael L. Overton’s excellent book called: Numerical Computing with IEEE Floating Point Arithmetic. A revision of the current IEEE standard can be located in IEEE 754: Revision of 2008.

As Overton explains in his book, the support for IEEE 754 and floating point in general has been inconsistent in both the hardware (the cpus), and finally the software (the programming languages). Programming Languages like C/C++ and Fortran are among others the ones with full support for the IEEE 754 specification. But that was not this way from the beginning.

The Floating Point support in H++ is very simple: I just have float and double types both with the same size and precision: 64 bits (1 bit for sign, 11 bits for the exponent, 52 bits for the significant or the mantissa). This was made this way just for simplicity when implementing the H++ Code Generator for x86 32-bits Assembly.

So, you can declare local or static variables with the keywords **float** and **double, **and will have the same double precision known from the double data type. In the future, that will change to reflect the standard: float will have 32-bits and double 64-bits of size.

Some of the floating point support tests I was writing in sample H++ programs, are provided as C programs in the Michael L. Overton’s book; if you want to do the same testing for C/C++ programs, please visit the links : Programs in this book.

Another lack of feature in H++ is in the print format; H++ supports only Fixed Format, which is very restrictive, but it works. The Exponential Format is the missing feature actually in my implementation. Here are examples of what I’m saying:

Fixed Format: 0.000542368921 (the only print format supported in H++ actually)

Exponential Format: 5.42368921E-4 (a more easy to handle print format, also known as Scientific Notation)

I’m thinking in dedicating a couple of days to implement that format, but I really don’t know when I’ll start to make it.

**Testing the H++ Compiler**

One of the tests you can find in this book, is the reciprocal power of 2, like in this H++ program:

This little program outputs the reciprocal powers of 2; so one detail I have to explain that shows the differences of this implementation with the original found in Overton, is that the condition was made against this tiny number: 0.00000000000000001 and not against 0.0 as the original one; the reason is simple: because I just implemented the Fixed Format for output, all the number below this 1.0E-17 are all printed as 0.0 until x really underflows.

The output:

The final results are:

After the power 2^-55 and down, the number printed is zero; if the number would be 0.0 in the while condition expression, the results are all zero from here until it really underflows to 0.0; the reason is the precision I’m using: 64-bits of precision.

A test that passed Ok, was the one involving 1.0 added to the reciprocal powers of 2:

The results were as expected:

This phenomena is very interesting because when you have isolated the reciprocal power in H++, it does not behave very well; but when you add it to 1.0, the results are as expected.

One interesting test, is the one related to the approximations as explained in Overton; here is the test H++ program:

As you can see from the formula, 1/ (1/x), the result should be X in all cases:

This test was successfull in H++. Here is why this test is very important: The binary representation for 1.0 and 0.99999999 are totally different besides they are the same value; let’s see these numbers as 32-bits float values in their binary format:

**Short Real sign exponent mantissa hex-value**

1.000 = 0 01111110 11111111111111111111110 0x3F7FFFFE <—- 0.9999999

As you could see, the number and its approximation have different binary representations; but if you continue adding more digits to the approximation, the process of rounding will take place here, and the number finally will change from .9999999…. to 1.0; you just have to add another digit 9, to convert the 0.99999999 in 1.0:

**Short Real sign exponent mantissa hex-value**

1.000 = 0 01111111 00000000000000000000000 0x3F800000 <—- was once 0.99999999

If you want to have an in depth explanation of this phenomena, please read Overton’s book: Numerical Computing with IEEE Floating Point Arithmetic.

**Fractions with Expressions of Zero Result, as the Divisor**

Another interesting subject, is the one that relates to fractions with zero as divisor, one operation which is invalid in mathematics.

It’s well known that in an expression with fractions or divisions, where the denominator expression would result in zero, cannot be resolved inmediatelly; so, from Calculus, you know that some limits must be approximated, when they cannot be solved by simple substitution of x in the polynomiuns.

They are a lot of tools in calculus to solve that problem, and one is approximation; if you cannot use 3 as value of x, you could use values from the left like these: 2.4, 2.5, 2.7, 2.9, 2.95, 2.98, 2.995, 2.9995; also you can use values from the right, like these: 3.0000099, 3.00009, 3.0001, 3.001, 3.01, 3.1. Also, tools like L’Hopital’s rule, is well known to be applied in this cases, to avoid the approximation or numerical analysis.

But, one of the features of the IEEE 754 is that you can have two different approaches: allow this result to become infinite, or throw an exception, following the rules of Division by Zero.

In the following example, what you can have as result, is zero; that will help you understand the problem in your formula, or in your program:

y = 1/x; //this will result in y = +oo|-oo (plus|minus infinite)

Based on the first approach, that expression would result in +oo (plus infinite) or -oo (minus infinite); so, if you continue operating with this value, the result could be infinite. If you then do this to the previous result:

z = 1/y; //this will result in z == 0.0

The new result would be zero.

The second approach, is to throw an IEEE 754 Floating Point exception called: Division by Zero; so you’ll need to catch the exception to understand the problem, and avoid the program to terminate unexpectedly:

**try**{

y = 1/x; //this expression causes the exception when x == 0.0;

z = 1/y;

}**catch**()

{

printf(“Division by Zero has ocurred!”);

}

Here is an H++ program that shows the results of the Total Resistance, for different values of R1 and R2 (the two resistances in the circuit):

To get the programs in C, please visit Michael L. Overton’s site.

The output of this little program is:

As you can see, when R2 == 0.0, the total resistance in the circuit is 0.0. That means that:

More advanced tests like Approximating a Derivative by a Difference Quotient, can be found in the Overton’s book. Also, a phenomena called Cancellation is well explained in that book, so I recommend you to read it to have a better understanding of the issues involved with Compilers and the expected results.

Here is the output:

This is the H++ program that generates that output:

**Approximating the Euler’s e Constant**

Let’s do another test: let’s find Euler’s e constant in an H++ program:

As you observe, the result depends on the size of n; when n–>oo, e will have a more accurate value; here is the result:

You can see that, when I applied the Natural Logarithm to the value of e I found with n = 10E+8, the result was close to 1.0; the approximation was very good for that value of n.

Now, let’s do the same thing, but this time, using the Taylor series formula:

Now, making a few changes to return the e value from each function, and compare the cancellation digits, to determine how many digits are correct in the two aproximations, we get these results:

Now you can notice that the two aproximations are valid for the first 6 digits after the decimal point. In this example, the more accurate was the Taylor Series formula.

**H++ Floating Point Math Library**

As a explained in older posts, the Math library supports almost all the fundamental functions, primitives and intrinsic functions; most of them were implemented at the x86 assembly level, to leaverage the existing implementation in the Floating Point Unit (FPU) in the x86 microprocessors.

**Fundamental Math Functions**

Here are some of the implemented functions:

Most of these functions are already implemented in the FPU.

You can also note the constants that are already computed in the FPU, and the most common Trigonometric Functions.

I also implemented the y^x function, and e^x:

**Inverse Functions**

Here you can see, how I implemented the Inverse ArcSin in x86 Assembly Language, for the H++ Library:

**Hyperbolic Functions**

Finally, the other more advanced functions, I decided to implement them directly in H++ because of matter of time:

This code is compiled and is available in the H++ library for any math operations that require them.

**Inverse Hyperbolic Functions**

Some aliases you can use, without referring to the full name of a static function for these hyperbolic and inverse hyperbolic functions, was defined:

**Having implemented these later functions in H++ saved me a lot time, which I dedicated to finish some important issues including bug fixes.**

**Conclusion**

The full support of IEEE 754 is still very distant in H++, but as you could see, there is a lot of work already done in the H++ compiler, and fewer things are missing. I highly recommend you to read Overton‘s book: Numerical Computing with IEEE Floating Point Arithmetic, if you are serious about Floating Point and the IEEE 754 Standard.

I hope this post can help you understand the basics of Floating Point, and the support in H++.

Bye, bye.

## Leave a Reply