The
Limits
& The Logic
Before we build computers, we must ask: What can be computed? From simple state machines to the universal Turing Machine.
Fundamental Concepts
Before diving into machines, we must define the mathematical building blocks of computation: Sets, Alphabets, and Languages.
Sets
A collection with a countable number of distinct elements.
S = {a, b, c}
A collection with an endless number of elements.
Z = {..., -1, 0, 1, ...}
Alphabets & Strings
A finite, non-empty set of symbols.
Σ = {0, 1}
A finite sequence of symbols from an alphabet.
w = 01011
Formal Language
In TOC, a Language is simply a set of strings chosen from some alphabet Σ*.
- Σ*: The set of all possible strings over Σ.
- Language L: Any subset of Σ*.
Note: A language can be finite (e.g., {"cat", "dog"}) or infinite (e.g., all words starting with 'a').
Common Notation Reference
Set of symbols (e.g., {0, 1})
Empty string (length 0)
Language with no strings
Zero or more concat
Transition Function
Number of chars in w
w ∈ L (w is in L)
L1 ⊆ L2
Finite Automata (DFA)
A machine with a limited amount of memory. It is always in one of a finite number of states. It reads input character by character and transitions between states.
Interactive DFA: Odd Number of 1s
Problem: Construct a DFA over Σ = {0, 1} that accepts all strings ending with "01".
Logic
- We need to track the progress of the suffix "01".
- q0 (Start): Haven't seen '0' yet.
- q1: Just saw '0'. (Potential start of match)
- q2 (Accept): Saw '0' then '1'. (Match found!)
Transition Table
| State | Input 0 | Input 1 |
|---|---|---|
| -> q0 | q1 | q0 |
| q1 | q1 | q2 |
| * q2 | q1 | q0 |
Formal Definition
A DFA is a 5-tuple (Q, Σ, δ, q0, F):
- Q: Finite set of states.
- Σ: Finite set of input symbols (Alphabet).
- δ: Transition function (Q × Σ → Q).
- q0: Start state.
- F: Set of accept states.
Non-Deterministic Finite Automata (NFA)
An NFA is like a DFA but with superpowers: it can be in multiple states at once. It can transition to multiple states on the same input, or move without any input at all (Epsilon transitions).
Epsilon Transitions (ε)
The ability to jump between states without consuming any input character. This models "spontaneous" state changes.
Equivalence & Power
Surprisingly, every NFA has an equivalent DFA.
Intuition (Subset Construction): A single state in the equivalent DFA corresponds to a set of states the NFA could be in at once. If the NFA has N states, the DFA could have 2N states.
Applications of Automata
- Lexical Analysis: Compilers use DFAs to scan code and identify keywords (if, while, int).
- Text Search: Algorithms like KMP use finite automata for fast string matching.
- Protocol Verification: Verifying network protocols by modeling states.
Regular Expressions & Limits
Any language accepted by a DFA is a "Regular Language". These can be described concisely using Regular Expressions (RegEx). But regular languages have limits—they cannot memory-dependent tasks like counting.
Operations
ab: Concatenation (a then b)a|b: Union (a or b)a*: Kleene Star (zero or more a's)
Example
Email validation pattern:
^[a-z0-9]+@[a-z]+\.[a-z]{2,3}$
Live Regex Tester
Problem: Write a Regular Expression for the set of all strings over {a, b} that start with 'a' and end with 'b'.
- a: Must start with 'a'.
- (a|b)*: Any combination of 'a's and 'b's in the middle (zero or more).
- b: Must end with 'b'.
Concept: Star (*) vs. Plus (+)
The Kleene Star (*) means "zero or more", while the Kleene Plus (+) means "one or more".
Relation: L⁺ = L L* (or L* minus ε)
The Pumping Lemma
A rigorous proof technique for showing non-regularity.
Show: L = { 0n1n | n ≥ 0 } is not regular.
- Assume L is regular. Let p be the pumping length.
- Choose string w = 0p1p. (Note: |w| = 2p ≥ p).
- Split w into xyz such that |xy| ≤ p and |y| > 0.
- Since |xy| ≤ p, the substring y must consist entirely of 0s.
- Pump it: Consider xy2z. This string has more 0s than 1s.
- Contradiction: xy2z ∉ L, but Pumping Lemma says it must be. Therefore, L is not regular.
Context-Free Languages & PDAs
DFAs cannot count (e.g., matching open/close parentheses). For this, we need a Pushdown Automaton (PDA)—a machine with an infinite stack memory. These correspond to Context-Free Languages (CFLs).
Context-Free Grammars (CFG)
A CFG is defined by 4-tuple (V, T, P, S). It defines valid strings by substitution rules.
S → ε
Generates: "", "ab", "aabb", "aaabbb"...
A grammar is "ambiguous" if a string has more than one valid parse tree. This is bad for compilers (dangling else problem)!
Pushdown Automata (PDA)
A nondeterministic finite automaton with an extra component: a stack. The stack provides "LIFO" memory, allowing the machine to remember an arbitrary amount of information.
LIFO (Last In, First Out)
Intuition: A PDA can non-deterministically "guess" transitions and use the stack to verify if the guess was correct.
Chomsky Normal Form (CNF)
A simplified way to write CFGs. In CNF, every rule must be either:
- A non-terminal producing two non-terminals (A → BC)
- A non-terminal producing a terminal (A → a)
Useful for algorithms like CYK (parsing).
Applications of CFGs
- Programming Syntax: Most programming languages (Java, C++) are defined by CFGs.
- Parsers: Browsers use CFGs to parse HTML properties.
- XML/JSON: Nested data structures are context-free.
Problem: Given the grammar S → 0S1 | ε, derive the string "0011".
Result: The string "0011" is generated by this grammar (Language is 0n1n).
The Chomsky Hierarchy
Proposed by Noam Chomsky, this categorizes grammars by their power. Each level strictly contains the one below it.
| Type | Language Class | Automaton | Production Rules |
|---|---|---|---|
| Type 0 | Recursively Enumerable Unrestricted Grammar | Turing Machine |
α → β (No restrictions) |
| Type 1 | Context-Sensitive CSG | Linear Bounded |
αAβ → αγβ (Length increasing) |
| Type 2 | Context-Free CFG | Pushdown Automaton |
A → γ (LHS is single variable) |
| Type 3 | Regular Regular Grammar | Finite Automaton |
A → a | A → aB (Right/Left Linear) |
The Turing Machine
A mathematical model of a general purpose computer. If a problem can be solved by an algorithm, it can be solved by a Turing Machine.
"Everything that is intuitively computable can be computed by a Turing Machine." This connects physical computing to mathematical logic.
Recursive Languages (Decidable)
A language is Recursive if a Turing Machine exists that will accept valid strings and always halt and reject invalid strings.
Recursively Enumerable
A language is Recursively Enumerable if a TM accepts valid strings, but might run forever (loop) on invalid strings.
Universal Turing Machine (UTM)
A "standard" TM takes an input string and runs. A Universal TM takes two inputs: a description of machine M and an input w, and simulates M on w.
This is the theoretical basis for software. Your computer (a UTM) runs programs (input machines).
Problem: Trace a Turing Machine designed to increment a binary number (e.g., transform "101" to "110").
| Step | State | Tape Content (Head is Bold) | Action |
|---|---|---|---|
| 1 | q_start | 101 | Move Right to end |
| 2 | q_right | 101 | Move Right |
| 3 | q_right | 101 | Move Right |
| 4 | q_carry | 101 | Read 1 -> Write 0, Carry Left |
| 5 | q_carry | 100 | Read 0 -> Write 1, Done |
| 6 | q_halt | 110 | HALT |
Linear Bounded Automata (LBA)
A restricted Turing Machine where the tape size is limited to a constant multiple of the input size.
Significance: LBAs recognize Context-Sensitive Languages (Type 1). They are less powerful than standard TMs but more powerful than PDAs.
Turing Machine: Binary Flips
Undecidability
The Halting Problem
Is there an algorithm that can inspect any code and determine if it runs forever?
Answer: No. Turing proved this is logically impossible.
Post Correspondence Problem (PCP)
Another famous undecidable problem. Given a set of dominoes with strings on top and bottom, can you arrange them so the top string equals the bottom string?
a
ba
For an arbitrary set, there is no algorithm to decide this.
P vs NP
Class P
Problems solvable in polynomial time.
- Shortest Path
- Matrix Multiplication
- Alphabetic Sorting
Class NP
Problems verifiable in polynomial time.
- Sudoku
- Protein Folding
- Traveling Salesman
The Millennium Prize Problems
In 2000, the Clay Mathematics Institute designated seven "Millennium Prize Problems" as the most significant open questions in mathematics.
- P vs NP (Computer Science)
- Riemann Hypothesis (Number Theory)
- Navier-Stokes Existence (Fluid Dynamics)
- Poincaré Conjecture (Topology - Solved by Perelman, 2003)
- Yang-Mills Existence (Physics)
- Birch and Swinnerton-Dyer Conjecture (Algebraic Geometry)
- Hodge Conjecture (Algebraic Geometry)
Solving any of these awards a $1,000,000 prize. P vs NP is often considered the most important problem in theoretical computer science, with profound implications for cryptography, AI, and optimization.
Key Definitions
Automata Theory
The study of abstract machines and what problems they can solve.
Finite Automata (DFA)
A machine with limited memory that is always in one of a finite number of states.
Regular Expression (Regex)
A sequence of characters that describes a search pattern.
Context-Free Grammar (CFG)
A system of rules for generating valid strings in a language, supporting nesting.
Pushdown Automaton
A finite automaton with an infinite stack memory.
Turing Machine
A mathematical model of a general-purpose computer with an infinite tape.
Halting Problem
The problem of determining whether a program will finish running or run forever (proven undecidable).
P vs NP
The open question of whether every problem whose solution can be quickly verified can also be quickly solved.
Chomsky Hierarchy
A containment hierarchy of classes of formal grammars (Regular < Context-Free < Context-Sensitive < Recursively Enumerable).
Undecidability
The property of a decision problem for which it is impossible to construct an algorithm that always leads to a correct yes-or-no answer.