The instruction set shall support 16-bit fixed point single precision, 32-bit fixed point double precision, 32-bit floating point and 48-bit floating point extended precision data in 2's complement representation.
Single precision 16-bit fixed point data shall be represented as a 16-bit 2's complement integer number with the most significant bit (MSB) as the sign bit:
Examples of single precision fixed point numbers are shown in Table I.
Double precision 32-bit fixed point data shall be represented as a 32-bit 2's complement integer number with the most significant bit (MSB) of the first word as the sign bit.
MSB LSB ------------------------------------------------------------------- | S| (MSH) | (LSH) | ------------------------------------------------------------------- 0 1 15 16 31
Examples of machine representation for double precision fixed point numbers are shown in Table II.
Table II. Double Precision Fixed Point Numbers
Integer | 32-Bit Hexadecimal Word |
---|---|
2,147,483,647 | 7 F F F F F F F |
1,073,741,824 | 4 0 0 0 0 0 0 0 |
2 | 0 0 0 0 0 0 0 2 |
1 | 0 0 0 0 0 0 0 1 |
0 | 0 0 0 0 0 0 0 0 |
-1 | F F F F F F F F |
-2 | F F F F F F F E |
-1,073,741,825 | C 0 0 0 0 0 0 0 |
-2,147,483,647 | 8 0 0 0 0 0 0 1 |
-2,147,483,648 | 8 0 0 0 0 0 0 0 |
All operands for fixed point adds, subtracts, multiplies and divides are integer. A fixed point overflow shall be defined as arithmetic overflow if the result is greater than 7FFF16 or less than 800016 for single precision and greater than 7FFF FFFF16 or less than 8000 000016 for double precision.
On fixed point operations which cause overflow, the operation shall be performed to completion as if the MSBs are present and the 16 LSBs for single precision or the 32 LSBs for double precision shall be retained in the proper register(s). Division by zero shall produce a fixed point overflow and return results of all zeros.
Floating point data shall be represented as a 32-bit quantity consisting of a 24-bit 2's complement mantissa and an 8-bit 2's complement exponent.
MSB LSB MSB LSB ------------------------------------------------------------------ | S| Mantissa | Exponent | ------------------------------------------------------------------ 0 1 23 24 31
Floating point numbers are represented as a fractional mantissa times 2 raised to the power of the exponent. All floating point numbers are assumed normalized or floating point zero at the beginning of a floating point operation and the results of all floating point operations are normalized (a normalized floating point number has the sign of the mantissa and the next bit of opposite value) or floating point zero. A floating point zero is defined as 0000 000016, that is, a zero mantissa and a zero exponent (0016). An extended floating point zero is defined as 0000 0000 000016, that is, a zero mantissa and a zero exponent. Some examples of the machine representation for 32-bit floating point numbers are shown in Table III.
Table III. 32-Bit Floating Point Numbers
Decimal Number | Hexadecimal Notation |
---|---|
Mantissa Exp | |
0.9999998 x 2127 | 7FFF FF 7F |
0.5 x 2127 | 4000 00 7F |
0.625 x 24 | 5000 00 04 |
0.5 x 21 | 4000 00 01 |
0.5 x 20 | 4000 00 00 |
0.5 x 2-1 | 4000 00 FF |
0.5 x 2-128 | 4000 00 80 |
0.0 x 20 | 0000 00 00 |
-1.0 x 20 | 8000 00 00 |
-0.5000001 x 2-128 | BFFF FF 80 |
-0.7500001 x 24 | 9FFF FF 04 |
Extended floating point data shall be represented as a 48-bit quantity consisting of a 40-bit 2's complement mantissa and an 8-bit 2's complement exponent. The exponent bits 24 to 31 lay between the split mantissa bits 0 to 23 and bits 32 to 47. The most significant bit of the mantissa is the sign bit 0, and the least significant bit of the mantissa is bit 47.
---------------------------------------------------- |S| Mantissa MS |Exponent| Mantissa LS | ---------------------------------------------------- 0 1 23 24 31 32 47
Some examples of the machine representation of 48-bit extended floating point numbers are shown in Table IV.
Table IV. 48-Bit Extended Floating Point Numbers
Decimal Number | Mantissa (MS) | Exp | Mantissa (LS) |
---|---|---|---|
0.5 x 2127 | 400000 | 7F | 0000 |
0.5 x 20 | 400000 | 00 | 0000 |
0.5 x 2-1 | 400000 | FF | 0000 |
0.5 x 2-128 | 400000 | 80 | 0000 |
-1.0 x 2127 | 800000 | 7F | 0000 |
-1.0 x 20 | 800000 | 00 | 0000 |
-1.0 x 2-1 | 800000 | FF | 0000 |
-1.0 x 2-128 | 800000 | 80 | 0000 |
0.0 x 20 | 000000 | 00 | 0000 |
-0.75 x 2-1 | A00000 | FF | 0000 |
For both floating point and extended floating point numbers, an overflow is defined as an exponent overflow and an underflow is defined as an exponent underflow.
All operands for floating point instructions must be normalized or a floating point zero. A floating point overflow shall be defined as exponent overflow if the exponent is greater than 7F16. The results of an operation which causes a floating point overflow shall be the largest positive number if the sign of the resulting mantissa was positive, or shall be the smallest negative number if the sign of the resulting mantissa was negative. Underflow shall be defined as exponent underflow if the exponent is less than 8016. The results of an operation which causes a floating point underflow shall be floating point zero. Separate interrupts are set for overflow and underflow. Only the floating point instructions shall set the underflow interrupt.