Regular Languages
2.1 Finite Automata
A deterministic finite automaton (DFA) is a 5-tuple where:
- is a finite set of states.
- is the input alphabet.
- is the transition function.
- is the start state.
- is the set of accepting (final) states.
accepts a string if the sequence of states ends in a state in .
Example. DFA for strings over containing the substring 01:
| State | ||
|---|---|---|
A nondeterministic finite automaton (NFA) allows multiple transitions from a state on the same Symbol, and -transitions (transitions without consuming input). Formally, .
accepts if there exists some path of transitions consuming that ends in a state in .
Worked Example: DFA for binary numbers divisible by 3
Design a DFA over that accepts exactly those strings (interpreted as binary Numbers, most significant bit first) that are divisible by 3.
We track the value of the number read so far, modulo 3. After reading prefix Let . Reading a new bit appends to the right: the new value is .
States: (remainder 0), (remainder 1), (remainder 2). Start state: (the empty prefix has value 0). Accept state: .
| State | ||
|---|---|---|
Correctness. By induction on input length. Base: , DFA is in . Step: if after the DFA is in (where ), Then reading moves to Which equals .
Worked Example: NFA for strings ending in `01`
Construct an NFA over that accepts strings ending in the substring 01.
States: (start), (have seen a 0), (accept, have seen 01).
| State | ||
|---|---|---|
Accept: . The NFA is nondeterministic because from on input 0It can either stay In or move to .
2.2 Equivalence of DFA and NFA
Theorem 2.1. For every NFA There exists a DFA such that .
Proof (subset construction). Given NFA Construct DFA where:
- (each state of is a subset of states of ).
- .
- for .
- .
Every string accepted by has some path through ‘s states. The subset construction tracks all Possible states could be in after reading each symbol. accepts exactly when at least one Possibility in leads to acceptance.
Corollary 2.2. NFA, DFA, and NFA with -transitions all recognise exactly the class of Regular languages.
Worked Example: Subset construction (NFA to DFA)
Convert the NFA from the “strings ending in 01” example to a DFA via the subset construction.
NFA states: , , No -transitions.
Start state: .
| DFA State | Accept? | ||
|---|---|---|---|
| No | |||
| No | |||
| Yes |
The DFA has 3 reachable states. , And are unreachable.
2.3 Regular Expressions
Definition. Regular expressions over are defined inductively:
- (empty set), (empty string), and for each are regex.
- If and are regex, then , And are regex.
- Nothing else is a regex.
Shorthand: means . means .
Examples:
- : strings containing
00. - : strings with exactly two
0S. - : all strings over .
- : the empty language.
Regex identities. The following algebraic laws hold for regular expressions and can be used To simplify or reason about them:
| Identity | Description |
|---|---|
| Annihilator for concatenation | |
| Identity for concatenation | |
| Identity for union | |
| Kleene star of empty set | |
| Kleene star of empty string | |
| Idempotence of Kleene star | |
| Associativity of concatenation | |
| Commutativity of union | |
| Associativity of union | |
| Idempotence of union | |
| Distributivity |
These identities can be used to prove that two regex describe the same language, analogous to Simplifying algebraic expressions. Two regex and are equivalent (written ) If .
2.4 Equivalence of Regular Expressions and Finite Automata
Theorem 2.3 (Kleene). A language is regular if and only if it is described by some regular Expression.
Proof (sketch).
Regex to NFA: Induction on the structure of the regex.
- Base cases: , , each have trivial NFAs.
- Union: Combine two NFAs with a new start state and -transitions to each.
- Concatenation: Connect the accept states of the first NFA to the start state of the second via -transitions.
- Kleene star: Add -transitions from a new start state to the NFA’s start and from the NFA’s accept states back to the start state.
NFA to regex: State elimination. Systematically eliminate states of the NFA one at a time, Updating the transition labels (which are regex) between remaining states. The final label from start To accept is the equivalent regex.
Thompson’s construction. The regex-to-NFA translation can be made fully explicit. For each Sub-expression of the regex, we build a small NFA fragment with exactly one entry and one exit State, connected by -transitions. The construction guarantees that the NFA has at most One accept state, no transitions into the start state, and no transitions out of the accept state.
Theorem 2.3a (Thompson’s construction correctness). For every regular expression over Thompson’s construction produces an NFA with And has States and transitions.
Proof. By structural induction on .
Base cases:
: has a start state and an accept state with no transitions. .
: has a start state, an accept state, and an -transition between them. .
for : has a start state, an accept state, and a transition on . .
Inductive cases:
: By IH, and . Thompson adds a new start and accept with -transitions to the start states of and And -transitions from their accept states to . Any accepting path goes through exactly one sub-NFA, so .
: Thompson connects the accept state of to the start state of via -transitions. A string iff where and I.e., .
: Thompson adds a new start and accept With -transitions from to (allowing zero repetitions) and from to the start of And from the accept of back to . Any accepting path corresponds to zero or more traversals of So .
In all cases, the number of states added is a constant per operator, so .
2.5 DFA Minimisation and the Myhill-Nerode Theorem
Theorem 2.4 (Myhill-Nerode). The following are equivalent for a language :
- is regular.
- The Myhill-Nerode equivalence relation has finitely many equivalence classes.
- is the union of some of the equivalence classes.
Definition. Two strings are distinguishable with respect to if there exists Such that exactly one of is in . The Myhill-Nerode equivalence is: iff and are not distinguishable.
Proof of Theorem 2.4.
(1) (2): Let be a DFA for . Define iff (i.e., reaches the same state on and ). Then has at most equivalence classes. We show . If Then for All , So iff Meaning . Conversely, if There exists with and (or vice versa), so Hence .
(2) (3): Trivial, since consists of all strings whose equivalence class is one That contains at least one string in .
(3) (1): Suppose has finitely many equivalence classes . Construct a DFA with one state per equivalence class, start state for Transition And accept states = those classes contained in . This DFA is Well-defined because is a right congruence: if Then for all .
Worked Example: Myhill-Nerode classes for $L = \{0^n 1^n : n \geq 0\}$
We show is not regular by exhibiting infinitely many pairwise distinguishable strings.
Claim: the strings are pairwise distinguishable with respect to .
Proof. For with Consider the suffix . Then:
- .
- (since ).
Therefore . Since there are infinitely many such strings, has Infinitely many equivalence classes, so is not regular by the Myhill-Nerode theorem.
This …/1-number-and-algebra/3_proof-and-logic technique is often cleaner than the pumping lemma, as it avoids case analysis over Decompositions.
Corollary 2.5. The minimum-state DFA for has exactly as many states as there are Equivalence classes of .
Minimisation algorithm:
1. Start with two partitions: F (accepting states) and Q \ F (non-accepting).2. Repeatedly refine: for each pair (p, q) in the same partition, if delta(p, a) and delta(q, a) are in different partitions for some a, split p and q into different partitions.3. Repeat until no further splitting occurs.4. Each partition becomes a state in the minimised DFA.This algorithm runs in time where is the number of states and .
2.6 Pumping Lemma for Regular Languages
Theorem 2.6 (Pumping Lemma). If is regular, then there exists a constant (the pumping Length) such that for every string with , can be decomposed as satisfying:
- (the pumped part is non-empty).
- (the pumpable part is within the first symbols).
- for all .
Proof. Let be a DFA for with states. Set . For any string of length The path through visits at least states, so by the pigeonhole principle, some state is Visited twice within the first symbols. The substring between the two visits can be repeated Or removed () without affecting acceptance.
Using the pumping lemma to prove non-regularity. To show is not regular, assume it is, let Be the pumping length, and exhibit a string with such that every Decomposition satisfying (1) and (2) has some with .
Example. is not regular.
Proof. Assume is regular with pumping length . Let . By (2), So consists only of 0S. Let . Then (since ). Contradiction.
Example. is not regular.
Proof. Assume pumping length . Let . Since , for Some . Then (the two halves have different lengths).
Example. is not regular.
Proof. Assume is regular with pumping length . Since regular languages are closed under Complement, would also be regular. Then would be regular (since is regular And regular languages are closed under intersection). But is not regular — Contradiction.
Worked Example: Proving $\{w : n_0(w) = n_1(w)\}$ is not regular
Let .
Proof. Assume is regular with pumping length . Let . By (2), So for some . Then Which has zeros and ones. Since So . Contradiction.
Worked Example: $L = \{a^{n!} : n \geq 0\}$ is not regular
Proof. Assume pumping length . Let with . By (2), So for some . Choose (an integer since ). Then . But is not a factorial for (since for And for any ). Hence .
:::caution Common Pitfall The pumping lemma says that for every decomposition satisfying (1) and (2), there exists a Pumping that fails. To prove non-regularity, you must show that all valid decompositions lead To a contradiction. A single decomposition that works is insufficient to disprove the lemma. The converse of the pumping lemma is false: if a language satisfies the pumping condition, it is Not necessarily regular.
2.7 Closure Properties of Regular Languages
Regular languages are closed under:
| Operation | Proof technique |
|---|---|
| Union | NFA union construction |
| Intersection | DFA product construction |
| Complement | Swap accepting and non-accepting states |
| Concatenation | NFA concatenation construction |
| Kleene star | NFA star construction |
| Reversal | Reverse all transitions, swap start/F |
| Difference | |
| Homomorphism | Apply the homomorphism to each symbol |
| Inverse homomorphism | Pre-image construction |
Theorem 2.7 (Intersection via product construction). If and are regular, then is regular.
Proof. Let and . Construct where . Then accepts iff both and accept I.e., .
Theorem 2.7. If is regular and is not regular, then may or may not be Regular. Closure properties do not apply when one operand is non-regular.
:::