Chomsky Normal Form — Theory of Computation
🌱 What Is Chomsky Normal Form?
A CFG is said to be in Chomsky Normal Form (CNF) when every production rule follows one of only two shapes:
✔ 1. A → BC
A non-terminal produces two non-terminals.
✔ 2. A → a
A non-terminal produces one terminal symbol.
✔ Special case:
The start symbol may also allow:
S → ε
(only if the language includes the empty string)
That’s all.
No long rules.
No mixing terminals with variables.
No shortcuts.
CNF makes grammar extremely structured.
💡 Why Do We Convert Grammars into CNF?
Even though CNF may look restrictive, it gives us many advantages:
⭐ 1. Parsing becomes mechanical
Algorithms like CYK work only with CNF.
⭐ 2. Parse trees become perfectly binary
Every split produces exactly two branches, making tree shapes uniform.
⭐ 3. It helps in theoretical proofs
Many results in automata theory rely on CNF’s clean structure.
⭐ 4. It removes unnecessary mess
Rules become short, clear, and easy to follow.
Think of CNF like turning messy handwritten notes into a clean, organized digital document.
🎯 A Grammar That is NOT in CNF
Let’s take a small CFG:
S → aBB
B → bB | bb
This breaks CNF rules because:
aBBmixes terminals and variablesbBmixes terminal + variablebbcontains two terminals- Some rules have more than two symbols
CNF allows none of that.
🔧 How CNF Fixes the Grammar
When converting a grammar to CNF, we break down complex rules into small, neat pieces.
For example:
S → aBB
would eventually be rewritten using new helper symbols:
S → X B
X → a B
These new non-terminals exist purely to enforce CNF structure.
The meaning stays the same — only the shape changes.
🌳 A Simple CNF Grammar and Parse Tree
Consider the following clean CNF grammar:
S → AB
A → a
B → b
It generates the string "ab".
Here is its parse tree:
S
/ \
A B
| |
a b
Every internal node splits into two children.
Every leaf is a terminal.
This neat binary shape is exactly what CNF aims for.
🧠 Important Things to Remember
Here’s a quick checklist in plain English:
✔ Productions must be either A → BC or A → a
Nothing else is allowed.
✔ No mixing terminals and non-terminals on the same right side
So rules like A → aB are not allowed.
✔ Long rules must be broken into pieces
A production like A → BCD must be rewritten step-by-step.
✔ Only the start symbol may produce ε
And only if the language includes the empty string.
