Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

20.3 Floating Point Numbers

Most computer hardware has support for two different kinds of numbers: integers (...-3, -2, -1, 0, 1, 2, 3...) and floating-point numbers. Floating-point numbers have three parts: the mantissa, the exponent, and the sign bit. The real number represented by a floating-point value is given by (s ? -1 : 1) · 2^e · M where s is the sign bit, e the exponent, and M the mantissa. See Floating Point Concepts, for details. (It is possible to have a different base for the exponent, but all modern hardware uses 2.)

Floating-point numbers can represent a finite subset of the real numbers. While this subset is large enough for most purposes, it is important to remember that the only reals that can be represented exactly are rational numbers that have a terminating binary expansion shorter than the width of the mantissa. Even simple fractions such as 1/5 can only be approximated by floating point.

Mathematical operations and functions frequently need to produce values that are not representable. Often these values can be approximated closely enough for practical purposes, but sometimes they can't. Historically there was no way to tell when the results of a calculation were inaccurate. Modern computers implement the IEEE 754 standard for numerical computations, which defines a framework for indicating to the program when the results of calculation are not trustworthy. This framework consists of a set of exceptions that indicate why a result could not be represented, and the special values infinity and not a number (NaN).


 
 
  Published under the terms of the GNU General Public License Design by Interspire