Automata and Formal Languages
1. Finite Automata
1.1 Deterministic Finite Automaton (DFA)
A DFA is a 5-tuple where:
- = finite set of states
- = input alphabet
- = transition function
- = start state
- = set of accepting states
Language of :
Example: DFA for strings ending in “ab” over :
States: {q0, q1, q2}q0 → (a) → q1, q0 → (b) → q0q1 → (a) → q1, q1 → (b) → q2q2 → (a) → q1, q2 → (b) → q0Start: q0, Accept: q2Extended transition function:
1.2 Nondeterministic Finite Automaton (NFA)
An NFA is a 5-tuple where:
- = transition function (returns a set of states)
Key difference from DFA: Multiple transitions per symbol, -transitions allowed.
Example: NFA for strings containing “aba” over :
States: {q0, q1, q2, q3}q0 → (a) → {q0, q1}q1 → (b) → q2q2 → (a) → q3Start: q0, Accept: q31.3 NFA to DFA Conversion (Subset Construction)
Convert NFA to equivalent DFA :
SUBSET_CONSTRUCTION(N): D.states = 2^Q D.start = ε-closure({q0}) mark D.start as unexplored while unexplored states exist: pick unexplored state S for each symbol a in Σ: T = ε-closure(∪ δ(q, a) for each q in S) D.transition[S][a] = T if T not yet in D.states: add T as unexplored mark S as explored D.accept = {S : S ∩ F ≠ ∅}Time: states in worst case. Often much smaller in practice.
1.4 DFA Minimization
Minimize a DFA to the smallest equivalent DFA.
MINIMIZE(DFA): // Remove unreachable states remove states not reachable from start
// Partition into accepting and non-accepting P = {F, Q \ F}
repeat: P" = {} for each group G in P: split G into subgroups where for all a in Σ: δ(p,a) and δ(q,a) are in same group of P add subgroups to P' until P' == P return DFA with P' as statesResult: Unique minimal DFA (up to isomorphism). with Hopcroft’s algorithm.
1.5 DFA vs NFA Equivalence
Theorem: DFAs and NFAs recognize exactly the same class of languages: the regular languages.
2. Regular Expressions
2.1 Definition
Regular expressions over alphabet :
- : Empty language.
- : Language .
- (for ): Language .
- (concatenation): .
- (union/alternation): .
- (Kleene star): .
- : One or more repetitions.
2.2 RE to NFA Conversion (Thompson’s Construction)
| RE Form | Construction |
|---|---|
| Single non-accepting state | |
| Start = accept (single state) | |
| Start → → accept | |
| New start with to starts of and ; accepts merge via | |
| Connect accept of to start of via | |
| New start/accept with loops through |
Properties:
- RE has at most states in the NFA.
- NFA may have -transitions but no more than 2 outgoing transitions per state.
2.3 DFA to RE Conversion
DFA_TO_RE(DFA): rename states q1, q2, ..., qn add new start state qs and new accept state qf for each pair (qi, qj), eliminate states one by one update transition labels using state elimination formula:
// Eliminating state qk: // For each path qi → qk → qj: // new label = (label(qi,qk)) (label(qk,qk))* (label(qk,qj)) ∪ label(qi,qj)2.4 Closure Properties of Regular Languages
Regular languages are closed under:
| Operation | Proof Method |
|---|---|
| Union | RE: ; NFA: parallel construction |
| Intersection | DFA: product construction |
| Complement | DFA: swap accept/non-accept states |
| Concatenation | RE: |
| Kleene star | RE: |
| Reversal | RE: reverse expression; NFA: reverse arrows |
| Difference |
3. Pumping Lemma for Regular Languages
3.1 Statement
If is a regular language, then there exists a constant (the pumping length) such that any string with can be divided into satisfying:
- (y is non-empty)
- for all
3.2 Using the Pumping Lemma (Proof by Contradiction)
To prove is not regular:
- Assume is regular. Let be the pumping length.
- Choose a string with (strategically).
- Show that for any division with , , some .
- Contradiction → is not regular.
Example:
Let . Since , for some . Then . Therefore is not regular.
Example:
Same approach. Not regular.
Example:
Choose . Pumping (within first symbols) breaks the palindrome structure.
3.3 Common Non-Regular Languages
| Language | Not Regular Because |
|---|---|
| Need to count ‘s vs ‘s | |
| Need to match three counts | |
| Need unbounded memory | |
| Primes are non-regular | |
| Need to count |
4. Context-Free Grammars
4.1 Definition
A CFG is a 4-tuple where:
- = finite set of variables (nonterminals)
- = finite set of terminals (alphabet)
- = finite set of production rules of form where ,
- = start variable
Language of :
4.2 Examples
Balanced parentheses:
S → (S)S → SSS → ε:
S → aSbS → εis NOT context-free (proven by pumping lemma for CFLs).
4.3 Chomsky Normal Form (CNF)
Every production is of the form:
- (two variables)
- (one terminal)
- (only for start symbol)
Conversion:
- Eliminate -productions (except ).
- Eliminate unit productions ().
- Eliminate useless symbols (unreachable and non-generating).
- Convert remaining productions to CNF.
4.4 Parse Trees
A parse tree for a string generated by a CFG:
- Root: labeled
- Internal nodes: labeled with variables
- Leaves: labeled with terminals (left to right form )
- Children of node : labeled with the right-hand side of a production
4.5 Ambiguity
A CFG is ambiguous if some string has two or more different parse trees (equivalently, different leftmost derivations).
Example:
E → E + EE → E * EE → (E)E → aThe string has two parse trees: and .
Unambiguous version (precedence):
E → E + T | TT → T * F | FF → (E) | aInherent ambiguity: Some CFLs have no unambiguous grammar (e.g., ).
5. Pushdown Automata
5.1 Definition
A PDA is a 6-tuple where:
- = stack alphabet
Each transition: read input symbol (or ), pop stack symbol (or ), push symbol(s) (or ), change state.
Deterministic PDA (DPDA): For every configuration, at most one move is possible.
5.2 PDA for
Q = {q0, q1, q2}, F = {q2}, Σ = {a,b}, Γ = {S, $}
δ(q0, a, $) = {(q0, a$)} // push a for each a readδ(q0, a, a) = {(q0, aa)}δ(q0, b, a) = {(q1, ε)} // pop a for each b readδ(q1, b, a) = {(q1, ε)}δ(q1, ε, $) = {(q2, $)} // accept when stack has only $5.3 Equivalence
Theorem: CFGs and PDAs recognize the same class of languages: the context-free languages.
Key difference from regular languages: DPDAs are strictly weaker than nondeterministic PDAs.
5.4 Closure Properties of CFLs
| Operation | Closed? |
|---|---|
| Union | Yes |
| Concatenation | Yes |
| Kleene star | Yes |
| Intersection | No* |
| Complement | No |
| Intersection with regular | Yes |
*Not closed under general intersection, but closed under intersection with a regular language (using product of PDA and DFA).
6. Pumping Lemma for Context-Free Languages
6.1 Statement
If is a CFL, then there exists such that any with can be divided into satisfying:
- (v and y not both empty)
- for all
6.2 Application
Example: is not context-free.
Let . Since , the substring cannot span all three letter types.
- If contains only ‘s and ‘s: pumping breaks the -- balance.
- If contains only ‘s and ‘s: same issue.
- If contains only one letter type: pumping breaks all three counts.
6.3 CFLs vs Non-CFLs
Not CFL (require three matching counts or more):
| Language | Why Not CFL |
|---|---|
| Three matching counts | |
| Three ordered counts | |
| Copy language (non-CFL) |
7. Turing Machines
7.1 Definition
A Turing machine is a 7-tuple where:
- = states
- = input alphabet (, blank symbol)
- = tape alphabet
- = start, = halting states
Operation: Read symbol under head, write symbol, move left/right, change state. Computation halts on accept or reject.
7.2 TM Variants
All are equivalent in power to the standard TM:
| Variant | Description |
|---|---|
| Multi-tape TM | Multiple tapes, each with its own head |
| Nondeterministic TM | Multiple possible transitions |
| Enumerators | Print out strings of a language |
| Multi-dimensional tape | Tape extended to 2D+ grid |
Theorem: Multi-tape TMs, nondeterministic TMs, and standard TMs recognize the same class of languages.
7.3 Turing-Recognizable (Recursively Enumerable)
A language is Turing-recognizable (RE) if some TM accepts it (may loop on non-members).
7.4 Turing-Decidable (Recursive)
A language is decidable if some TM halts on all inputs (accepts members, rejects non-members).
8. Church-Turing Thesis
8.1 Statement
Any algorithmic procedure can be simulated by a Turing machine.
This is a thesis (not a theorem) — it cannot be proven because “algorithmic procedure” is an informal notion.
8.2 Implications
- TM defines the boundary of what is computable.
- No more powerful computational model exists (in terms of computable functions).
- Equivalent models: lambda calculus, -recursive functions, Post systems, register machines.
9. Decidability
9.1 Decidable Languages
| Language | Decidable? | Algorithm |
|---|---|---|
| Yes | Simulate DFA on | |
| Yes | CYK algorithm, $O(n^3 | |
| Yes | Check reachable accepting states | |
| Yes | Check for non-emptiness | |
| No | Diagonalization proof |
9.2 The Halting Problem
is undecidable.
Proof (diagonalization):
Assume H is a decider for A_TM.Define D(M): run H(M, <M>) if H accepts: reject if H rejects: accept
Run D on D: D(D) accepts iff D(D) rejects.Contradiction. Therefore H cannot exist.9.3 Reductions for Undecidability
To prove is undecidable, reduce a known undecidable language to .
Common undecidable problems:
| Problem | Reduction From |
|---|---|
| PCP (Post Correspondence Problem) | |
| Hilbert’s 10th Problem (Diophantine equations) |
9.4 Rice’s Theorem
Theorem: Any non-trivial property of the language of a Turing machine is undecidable.
A property is non-trivial if it holds for some TMs and not for others.
Examples of undecidable properties (by Rice’s theorem):
- Does accept any string?
- Is finite?
- Is regular?
- Does ?
Not covered by Rice’s theorem: Properties about the machine’s behavior (not its language), such as “does run for more than 100 steps on input ?“
10. Common Pitfalls
Confusing NFA and DFA complexity. NFA simulation requires tracking multiple states simultaneously ( worst case for subset construction), but the NFA itself may be exponentially smaller than any equivalent DFA.
Wrong string choice for pumping lemma proofs. The string must be long enough () and specifically chosen to make pumping impossible. A poor choice leads to failing the proof.
Forgetting that CFLs are not closed under complement and intersection. Unlike regular languages, CFLs lose these closure properties. of a CFL may not be context-free.
Assuming decidability based on the existence of an algorithm. An algorithm that may not halt (e.g., simulating a TM) does not establish decidability. The TM must halt on all inputs.
Misapplying Rice’s theorem. Rice’s theorem applies only to properties of the language recognized by a TM, not properties of the TM’s code or computation behavior.
Confusing Turing-recognizable and decidable. Every decidable language is Turing-recognizable, but not vice versa. The halting problem is recognizable but not decidable.
Assuming more tape heads make TMs more powerful. Multi-tape TMs are equivalent to single-tape TMs. No variant of the standard TM increases computational power (only efficiency).
Worked Examples
Example 1: Designing a Finite Automaton
Problem: Design a DFA over {0, 1} that accepts strings containing the substring “010”. Solution: States: q0 (start, scanning for first 0), q1 (saw 0, waiting for 1), q2 (saw 01, waiting for 0), q3 (accepting, saw 010). Transitions: q0 on 0 -> q1, q0 on 1 -> q0. q1 on 1 -> q2, q1 on 0 -> q1. q2 on 0 -> q3, q2 on 1 -> q0. q3 on 0 -> q3, q3 on 1 -> q3. The DFA has 4 states and transitions back to earlier states when partial matches are broken.
Example 2: Context-Free Grammar for a Language
Problem: Write a CFG that generates the language L = {a^n b^n c^n : n >= 1}. Solution: S -> aBC. B -> aBB (this ensures more a’s push B’s onto the middle). C -> cD. D -> cDD (this ensures more c’s match). B -> b (terminal). D -> d (terminal). Wait — this generates a^n b^n c^m which is wrong. The language a^n b^n c^n is not context-free (proven by the pumping lemma for CFLs). No CFG exists for this language. This is a common exam trick question.
Summary
- DFA and NFA recognize the same regular languages, convertible via subset construction.
- Regular expressions are equivalent to finite automata (Thompson’s construction).
- Pumping lemma for regular languages proves non-regularity by contradiction.
- CFGs generate context-free languages, equivalent to PDAs.
- Ambiguity means multiple parse trees; some CFLs are inherently ambiguous.
- Pumping lemma for CFLs proves non-context-freeness (e.g., ).
- Turing machines are the most powerful model; equivalent to lambda calculus and recursive functions.
- Church-Turing thesis: TMs capture all algorithmic computation.
- Halting problem is undecidable (diagonalization). Rice’s theorem generalizes undecidability.
Cross-References
| Topic | Link |
|---|---|
| Compilers | View |
| Complexity Theory | View |
| Algorithm Design | View |