Theory of Computation

The Limits
& The Logic

Before we build computers, we must ask: What can be computed? From simple state machines to the universal Turing Machine.

00 / Basics

Fundamental Concepts

Before diving into machines, we must define the mathematical building blocks of computation: Sets, Alphabets, and Languages.

Sets

Finite Set

A collection with a countable number of distinct elements.

S = {a, b, c}
Infinite Set

A collection with an endless number of elements.

Z = {..., -1, 0, 1, ...}

Alphabets & Strings

Alphabet (Σ)

A finite, non-empty set of symbols.

Σ = {0, 1}
String

A finite sequence of symbols from an alphabet.

w = 01011

Formal Language

In TOC, a Language is simply a set of strings chosen from some alphabet Σ*.

  • Σ*: The set of all possible strings over Σ.
  • Language L: Any subset of Σ*.

Note: A language can be finite (e.g., {"cat", "dog"}) or infinite (e.g., all words starting with 'a').

Common Notation Reference

Σ Alphabet

Set of symbols (e.g., {0, 1})

ε Epsilon

Empty string (length 0)

Empty Set

Language with no strings

Σ* Kleene Star

Zero or more concat

δ Delta

Transition Function

|w| Length

Number of chars in w

Element Of

w ∈ L (w is in L)

Subset

L1 ⊆ L2

01 / Finite Memory

Finite Automata (DFA)

A machine with a limited amount of memory. It is always in one of a finite number of states. It reads input character by character and transitions between states.

Interactive DFA: Odd Number of 1s

STATE: EVEN
Input Tape
$
Controls
Solved Example Design a DFA

Problem: Construct a DFA over Σ = {0, 1} that accepts all strings ending with "01".

Logic
  • We need to track the progress of the suffix "01".
  • q0 (Start): Haven't seen '0' yet.
  • q1: Just saw '0'. (Potential start of match)
  • q2 (Accept): Saw '0' then '1'. (Match found!)
Transition Table
State Input 0 Input 1
-> q0 q1 q0
q1 q1 q2
* q2 q1 q0

Formal Definition

A DFA is a 5-tuple (Q, Σ, δ, q0, F):

  • Q: Finite set of states.
  • Σ: Finite set of input symbols (Alphabet).
  • δ: Transition function (Q × Σ → Q).
  • q0: Start state.
  • F: Set of accept states.

Non-Deterministic Finite Automata (NFA)

An NFA is like a DFA but with superpowers: it can be in multiple states at once. It can transition to multiple states on the same input, or move without any input at all (Epsilon transitions).

Epsilon Transitions (ε)

The ability to jump between states without consuming any input character. This models "spontaneous" state changes.

q0 --(ε)--> q1

Equivalence & Power

Surprisingly, every NFA has an equivalent DFA.

Intuition (Subset Construction): A single state in the equivalent DFA corresponds to a set of states the NFA could be in at once. If the NFA has N states, the DFA could have 2N states.

Applications of Automata

  • Lexical Analysis: Compilers use DFAs to scan code and identify keywords (if, while, int).
  • Text Search: Algorithms like KMP use finite automata for fast string matching.
  • Protocol Verification: Verifying network protocols by modeling states.
01 / Patterns

Regular Expressions & Limits

Any language accepted by a DFA is a "Regular Language". These can be described concisely using Regular Expressions (RegEx). But regular languages have limits—they cannot memory-dependent tasks like counting.

Operations

  • ab : Concatenation (a then b)
  • a|b : Union (a or b)
  • a* : Kleene Star (zero or more a's)

Example

Email validation pattern:

^[a-z0-9]+@[a-z]+\.[a-z]{2,3}$

Live Regex Tester

Result: ...
Solved Example Constructing Regex

Problem: Write a Regular Expression for the set of all strings over {a, b} that start with 'a' and end with 'b'.

a(a|b)*b
  • a: Must start with 'a'.
  • (a|b)*: Any combination of 'a's and 'b's in the middle (zero or more).
  • b: Must end with 'b'.
💡

Concept: Star (*) vs. Plus (+)

The Kleene Star (*) means "zero or more", while the Kleene Plus (+) means "one or more".

(a|b)*
Includes ε (empty)
{ ε, a, b, aa, ab... }
(a|b)⁺
Excludes ε
{ a, b, aa, ab... }

Relation: L⁺ = L L*   (or L* minus ε)

The Pumping Lemma

A rigorous proof technique for showing non-regularity.

Proof Example

Show: L = { 0n1n | n ≥ 0 } is not regular.

  1. Assume L is regular. Let p be the pumping length.
  2. Choose string w = 0p1p. (Note: |w| = 2p ≥ p).
  3. Split w into xyz such that |xy| ≤ p and |y| > 0.
  4. Since |xy| ≤ p, the substring y must consist entirely of 0s.
  5. Pump it: Consider xy2z. This string has more 0s than 1s.
  6. Contradiction: xy2z ∉ L, but Pumping Lemma says it must be. Therefore, L is not regular.
02 / Architecture

Context-Free Languages & PDAs

DFAs cannot count (e.g., matching open/close parentheses). For this, we need a Pushdown Automaton (PDA)—a machine with an infinite stack memory. These correspond to Context-Free Languages (CFLs).

Context-Free Grammars (CFG)

A CFG is defined by 4-tuple (V, T, P, S). It defines valid strings by substitution rules.

S → aSb
S → ε

Generates: "", "ab", "aabb", "aaabbb"...

Concept: Ambiguity

A grammar is "ambiguous" if a string has more than one valid parse tree. This is bad for compilers (dangling else problem)!

Pushdown Automata (PDA)

A nondeterministic finite automaton with an extra component: a stack. The stack provides "LIFO" memory, allowing the machine to remember an arbitrary amount of information.

Z₀ (Bottom)

LIFO (Last In, First Out)

Intuition: A PDA can non-deterministically "guess" transitions and use the stack to verify if the guess was correct.

Chomsky Normal Form (CNF)

A simplified way to write CFGs. In CNF, every rule must be either:

  • A non-terminal producing two non-terminals (A → BC)
  • A non-terminal producing a terminal (A → a)

Useful for algorithms like CYK (parsing).

Applications of CFGs

  • Programming Syntax: Most programming languages (Java, C++) are defined by CFGs.
  • Parsers: Browsers use CFGs to parse HTML properties.
  • XML/JSON: Nested data structures are context-free.
Solved Example Leftmost Derivation

Problem: Given the grammar S → 0S1 | ε, derive the string "0011".

Step 1:
S ⇒ 0S1 (Apply S → 0S1)
Step 2:
0S1 ⇒ 0(0S1)1 = 00S11 (Apply S → 0S1 again)
Step 3:
00S11 ⇒ 00(ε)11 = 0011 (Apply S → ε)

Result: The string "0011" is generated by this grammar (Language is 0n1n).

02 / Hierarchy

The Chomsky Hierarchy

Proposed by Noam Chomsky, this categorizes grammars by their power. Each level strictly contains the one below it.

Level 0: Recursively Enumerable Turing Machines
Level 1: Context-Sensitive Linear Bounded Automata (LBA)
Level 2: Context-Free Pushdown Automata (Syntax)
Level 3: Regular Finite Automata (Regex)
Type Language Class Automaton Production Rules
Type 0 Recursively Enumerable Unrestricted Grammar Turing Machine α → β
(No restrictions)
Type 1 Context-Sensitive CSG Linear Bounded αAβ → αγβ
(Length increasing)
Type 2 Context-Free CFG Pushdown Automaton A → γ
(LHS is single variable)
Type 3 Regular Regular Grammar Finite Automaton A → a | A → aB
(Right/Left Linear)
03 / Universality

The Turing Machine

A mathematical model of a general purpose computer. If a problem can be solved by an algorithm, it can be solved by a Turing Machine.

Church-Turing Thesis

"Everything that is intuitively computable can be computed by a Turing Machine." This connects physical computing to mathematical logic.

Recursive Languages (Decidable)

A language is Recursive if a Turing Machine exists that will accept valid strings and always halt and reject invalid strings.

Recursively Enumerable

A language is Recursively Enumerable if a TM accepts valid strings, but might run forever (loop) on invalid strings.

Universal Turing Machine (UTM)

A "standard" TM takes an input string and runs. A Universal TM takes two inputs: a description of machine M and an input w, and simulates M on w.

This is the theoretical basis for software. Your computer (a UTM) runs programs (input machines).

Solved Example TM Trace

Problem: Trace a Turing Machine designed to increment a binary number (e.g., transform "101" to "110").

Step State Tape Content (Head is Bold) Action
1 q_start 101 Move Right to end
2 q_right 101 Move Right
3 q_right 101 Move Right
4 q_carry 101 Read 1 -> Write 0, Carry Left
5 q_carry 100 Read 0 -> Write 1, Done
6 q_halt 110 HALT

Linear Bounded Automata (LBA)

A restricted Turing Machine where the tape size is limited to a constant multiple of the input size.

Significance: LBAs recognize Context-Sensitive Languages (Type 1). They are less powerful than standard TMs but more powerful than PDAs.

Turing Machine: Binary Flips

State q0
03 / Limits

Undecidability

The Halting Problem

Is there an algorithm that can inspect any code and determine if it runs forever?

Answer: No. Turing proved this is logically impossible.

Post Correspondence Problem (PCP)

Another famous undecidable problem. Given a set of dominoes with strings on top and bottom, can you arrange them so the top string equals the bottom string?

ab
a
b
ba
...

For an arbitrary set, there is no algorithm to decide this.

04 / Complexity

P vs NP

Class P

Problems solvable in polynomial time.

  • Shortest Path
  • Matrix Multiplication
  • Alphabetic Sorting

Class NP

Problems verifiable in polynomial time.

  • Sudoku
  • Protein Folding
  • Traveling Salesman

The Millennium Prize Problems

In 2000, the Clay Mathematics Institute designated seven "Millennium Prize Problems" as the most significant open questions in mathematics.

  • P vs NP (Computer Science)
  • Riemann Hypothesis (Number Theory)
  • Navier-Stokes Existence (Fluid Dynamics)
  • Poincaré Conjecture (Topology - Solved by Perelman, 2003)
  • Yang-Mills Existence (Physics)
  • Birch and Swinnerton-Dyer Conjecture (Algebraic Geometry)
  • Hodge Conjecture (Algebraic Geometry)

Solving any of these awards a $1,000,000 prize. P vs NP is often considered the most important problem in theoretical computer science, with profound implications for cryptography, AI, and optimization.

Glossary

Key Definitions

Automata Theory

The study of abstract machines and what problems they can solve.

Finite Automata (DFA)

A machine with limited memory that is always in one of a finite number of states.

Regular Expression (Regex)

A sequence of characters that describes a search pattern.

Context-Free Grammar (CFG)

A system of rules for generating valid strings in a language, supporting nesting.

Pushdown Automaton

A finite automaton with an infinite stack memory.

Turing Machine

A mathematical model of a general-purpose computer with an infinite tape.

Halting Problem

The problem of determining whether a program will finish running or run forever (proven undecidable).

P vs NP

The open question of whether every problem whose solution can be quickly verified can also be quickly solved.

Chomsky Hierarchy

A containment hierarchy of classes of formal grammars (Regular < Context-Free < Context-Sensitive < Recursively Enumerable).

Undecidability

The property of a decision problem for which it is impossible to construct an algorithm that always leads to a correct yes-or-no answer.

06 / Assessment

Knowledge Check