🌊 What Does “Floating Point” Even Mean?
Let’s start simple.
When you write the number 12345, that’s an integer — it doesn’t have any fractional part.
But numbers like 3.14, 0.00045, or 2.71828 are real numbers — they include fractions or decimals.
Now, computers are great with integers, but real numbers can get very large or very tiny.
For example:
- Distance from Earth to Sun ≈ 1.496 × 10⁸ km
- Radius of an atom ≈ 5 × 10⁻¹¹ m
That’s a huge range!
Clearly, we can’t represent all these numbers just by fixed decimal places — we’d run out of bits!
So, computers use floating-point representation — where the decimal (or binary) point “floats” to where it’s needed.
🧮 The Floating-Point Format
A floating-point number is stored in this general scientific form:
[
N = (-1)^S \times M \times 2^E
]
Let’s break it down in plain English:
| Part | Name | Meaning |
|---|---|---|
| S | Sign bit | 0 means positive, 1 means negative |
| M | Mantissa (or Significand) | Represents the actual digits of the number |
| E | Exponent | Decides where the binary (or decimal) point “floats” |
🧠 Think of It Like This:
If we write 6.022 × 10²³, the number 6.022 is like the mantissa, and 23 is the exponent.
It’s the same idea in computers — just using base 2 (binary) instead of base 10.
💾 IEEE 754 Standard (The Common Format)
Almost every computer today uses the IEEE 754 standard for floating-point representation.
It defines two main types:
| Type | Total Bits | Sign | Exponent | Mantissa |
|---|---|---|---|---|
| Single Precision | 32 bits | 1 | 8 | 23 |
| Double Precision | 64 bits | 1 | 11 | 52 |
🧩 Example: Single-Precision (32-bit) Format
| 1 bit | 8 bits | 23 bits |
| Sign | Exponent | Mantissa (Fraction) |
So a binary number like this:
0 10000010 10100000000000000000000
is interpreted as:
- Sign = 0 → Positive
- Exponent = 10000010₂ = 130₁₀
- Mantissa = 1.101 × 2⁰ (The hidden ‘1’ is always assumed)
- Actual number = 1.101 × 2^(130−127) = 1.101 × 2³ = 13.0
✏️ Step-by-Step Example
Let’s represent -5.75 in IEEE 754 (single precision):
- Convert to binary:
5.75 = 101.11₂ - Normalize it:
101.11 = 1.0111 × 2²
→ Mantissa = 0111
→ Exponent = 2 + 127 = 129 = 10000001₂
→ Sign = 1 (since it’s negative) - Combine all parts:
1 | 10000001 | 01110000000000000000000
That’s your 32-bit floating-point representation of -5.75. ✅
⚙️ Floating-Point Arithmetic Operations
Just like integers, floating-point numbers can be added, subtracted, multiplied, or divided.
But since the point “floats,” the process takes a few extra steps.
Let’s explore them one by one in easy terms.
➕ Floating-Point Addition (and Subtraction)
- Align the exponents
The numbers must have the same exponent before adding.
Example: 1.23 × 10² + 4.56 × 10³ → shift 1.23 to 0.123 × 10³. - Add (or subtract) the mantissas
Once the exponents match, add the fractional parts normally. - Normalize the result
If the sum is not in normalized form (1.xxx × 2ⁿ), shift and adjust the exponent. - Round if necessary
Keep precision within 23 (or 52) bits.
✖️ Floating-Point Multiplication
- Add exponents (after removing the bias)
- Multiply the mantissas
- Normalize and round
- Set sign bit based on rule:
(+) × (+) = (+)
(+) × (–) = (–)
(–) × (–) = (+)
Example:
(1.1 × 2³) × (1.0 × 2²) = (1.1 × 1.0) × 2⁵ = 1.1 × 2⁵
➗ Floating-Point Division
- Subtract the exponents
- Divide the mantissas
- Normalize and round the result
- Set the sign accordingly
⚖️ Precision and Rounding
Because mantissa bits are limited (23 or 52), not all decimal numbers can be represented exactly.
For example, 0.1 (decimal) becomes an endless binary fraction.
So computers round off the result to fit the available bits — leading to tiny errors called rounding errors.
That’s why sometimes in programming you’ll see weird outputs like 0.30000000000004 — it’s not a bug, it’s just rounding at work!
🧭 Floating-Point Representation Diagram
Here’s a simple visual showing how a floating-point number is structured:
+---------------------------------------------------------------+
| Sign (1 bit) | Exponent (8 bits) | Mantissa (23 bits) |
+---------------------------------------------------------------+
↓ ↓ ↓
Negative Controls “float” Holds digits
or Positive (power of 2) of the number
And in formula form:
[
\text{Value} = (-1)^{Sign} \times (1 + Fraction) \times 2^{(Exponent – Bias)}
]
💬 Everyday Analogy
Think of floating-point numbers like scientific notation in your calculator.
When your calculator shows 3.14E+02, it really means 3.14 × 10² = 314.
Computers do the same — but with base 2 (binary) and fixed-size memory spaces.