Artificial Intelligence

Architecture of
Synthetic Cognition

A functional deconstruction of machine intelligence. Unlike classical programming, which requires explicit rules for every scenario, AI systems build their own rules by finding patterns in data.

This guide traces the lineage of intelligence from its broadest definition down to the specific mechanics of language models. It is a journey from logic (AI) to statistics (ML), to representation (DL), and finally to communication (NLP).

01 / Artificial Intelligence

The Hierarchy of Intelligence

The field is a series of nested subsets. Generative AI (like LLMs) is a specific application of Deep Learning, which is a technique of Machine Learning, which is a branch of AI.

ARTIFICIAL INTELLIGENCE
MACHINE LEARNING
DEEP LEARNING
NLP

Artificial Intelligence

The broad umbrella. Machines mimicking cognitive functions (logic, rules, search).

Machine Learning

Systems that improve from data/experience rather than explicit programming.

Deep Learning

ML using multi-layered neural networks to model complex, hierarchical patterns.

NLP & GenAI

Specialized architectures (Transformers) for language understanding and generation.

01 / Artificial Intelligence

Evolution of Thought

1950 β€” 1958 Symbolic AI

Logic and Rules.

Early AI was symbolic; it relied on human-readable symbols and hard-coded rules logic. The "Perceptron" introduced the idea that a machine could learn weights, but it was limited to linear problems (it couldn't solve XOR).

1986 β€” 2012 The Connectionist Shift

Backpropagation & The ImageNet Moment.

The rediscovery of Backpropagation allowed us to train multi-layer networks effectively. In 2012, AlexNet utilized GPUs to crush benchmarks in image recognition, proving that "Deep" networks were superior for perceptual tasks.

2017 β€” Present The Transformer Era

Attention Is All You Need.

Google researchers introduced the Transformer, replacing sequential processing (RNNs) with parallel attention mechanisms. This allowed models to ingest massive datasets, leading to the emergence of GPT and modern Generative AI.

01 / Artificial Intelligence

Four Approaches to AI

Artificial Intelligence isn't just one goal; it's defined by how we measure success. Traditionally, the field is split along two axes: Human-like vs Rational and Thinking vs Acting.

Acting Humanly

The Turing Test Approach

"Can a machine fool a person into thinking it's human?"

Acting Rationally

The Rational Agent Approach

"Achieving the best outcome, or the best expected outcome."

Thinking Humanly

The Cognitive Modeling Approach

"Mirroring the internal reasoning processes of the human brain."

Thinking Rationally

The Laws of Thought Approach

"Rigorous, logical syllogisms where conclusions are undeniable."

The Turing Test (1950)

Interrogator πŸ‘€

Proposed by Alan Turing in his paper "Computing Machinery and Intelligence", it bypasses the philosophical question of "Can machines think?" and replaces it with the Imitation Game.

If a human interrogator cannot distinguish between the responses of a human and a computer during a text-based conversation, the computer passed the test.

Interrogator's Screen
> Query: "Write a poem about rain"
A: "Liquid drops descend, rhythmic beats on glass..."
B: "Falling sky-water makes the garden green and wet."
πŸ€–
Subject A
πŸ‘€
Subject B

Can you tell which one is the machine?

01 / Artificial Intelligence

Inference Engines: Forward vs Backward Chaining

Before neural networks, AI relied on Inference Enginesβ€”the logical "thinking" component of an Expert System. This is how machines reason using IF-THEN rules to derive conclusions from data.

Forward Chaining

Data-Driven Reasoning

Starts with Facts and matches them against the IF part of rules to derive new facts.

Fact If it is raining (A)
↓ Match Rule
Rule If A, then take Umbrella (B)
↓ Conclusion
New Fact Taking Umbrella (B)

Backward Chaining

Goal-Driven Reasoning

Starts with a Goal (Conclusion) and searches for rules that result in that goal, checking if their conditions are met.

Goal Find Umbrella (B)?
↓ Check Rule
Rule Is it raining (A)?
↓ Verify Fact
Result Fact A is True → Goal B Met
02 / Machine Learning

What is Machine Learning?

Machine Learning (ML) is the science of getting computers to act without being explicitly programmed. Instead of hard-coded rules, the machine learns from patterns in data to make decisions or predictions.

The 4 Types of Learning

🏷️
Supervised Learning

Learning with a teacher. The data is labeled (input/output pairs). The goal is to learn a mapping from x to y.

Pattern: If "Email" has "Prize" → [Spam]
🧩
Unsupervised Learning

Learning without a teacher. Find hidden patterns or structure in unlabeled data.

Pattern: "Customer A" is similar to "Customer B"
πŸŒ—
Semi-Supervised

A hybrid approach. Uses a small amount of labeled data and a large amount of unlabeled data.

Use case: Photo tagging (some named, many not)
πŸ†
Reinforcement Learning

Learning by trial and error. An agent interacts with an environment to maximize a reward.

Rule: If you win → [+1], if you lose → [-1]
🌲

Classic Algorithm: Random Forest

Before deep learning, Random Forest was the gold standard for high-performance tabular classification and regression. It is a class of Ensemble Learning where multiple models are combined to produce a more robust prediction than any individual tree could achieve.

1. Decision Trees The building blocks. A single tree splits data hierarchically at decision nodes (e.g., "is value > 5?") until reaching a final prediction leaf.
2. Random Bootstrapping To prevent trees from replicating each other, each tree is trained on a random sample of the dataset (bagging) and selects from a random subset of features at each split.
3. Ensemble Voting Individual trees can be highly unstable and overfit. By averaging their outputs (regression) or taking a majority vote (classification), errors cancel out.
Why it works: A single tree is prone to high variance (overfitting). A random forest reduces variance without increasing bias, demonstrating the mathematical principle of the "wisdom of crowds."
Interactive Playground: Ensemble Voting

Adjust the input features below and watch how three different decision trees (trained on different random subsets of data and features) traverse their paths and vote to produce a final consensus prediction.

Weather:
Temperature:
Weekend:
Tree 1 (Weather & Weekend) GO OUT
Is Weather Sunny?
→ Yes: Go Out
→ No: Is Weekend?
→ Yes: Go Out
→ No: Stay Home
Tree 2 (Weekend & Temp) GO OUT
Is Weekend Yes?
→ Yes: Go Out
→ No: Is Temp Hot?
→ Yes: Go Out
→ No: Stay Home
Tree 3 (Temp & Weather) GO OUT
Is Temp Hot?
→ Yes: Is Weather Sunny?
→ Yes: Go Out
→ No: Stay Home
→ No: Stay Home
Votes: Tree 1: GO OUT Tree 2: GO OUT Tree 3: GO OUT
Final Consensus: GO OUT (3-0 majority)
02 / Machine Learning

Training a Model

The process of teaching a machine. Training is fundamentally an optimization problem: we want to find the configuration of weights that results in the lowest possible error.

1. Forward Pass

The model receives input data, processes it through its current weights, and makes a prediction.

2. Calculate Loss

The prediction is compared to the actual answer (the ground truth). The difference is quantified as the Loss (or Error).

3. Backward Pass

The error is propagated backward through the network. The model updates its weights to reduce the error next time (using Gradient Descent).

Key Training Concepts
Train / Test Split
TRAIN (80%)
TEST (20%)

We never test a model on the data it trained on. We split data into Training (to learn) and Testing (to evaluate) sets to ensure the model generalizes well to unseen data.

Epochs & Batches
B1
B2
B3
...
= 1 EPOCH

An Epoch is one complete pass through the entire training dataset. Because datasets are huge, they are split into smaller chunks called Batches to process efficiently.

Underfitting

When a model is too simple to capture the underlying structure of the data. It performs poorly in both training and testing.

Overfitting

When a model learns the training data too well, memorizing the noise rather than the underlying pattern. It performs perfectly in training but fails in the real world.

Training Simulator

Watch the model optimize its weights over time. Observe how Training Loss always decreases, but Validation Loss might increase if the model starts to overfit.

Training Loss
Validation Loss
Epoch: 0 / 30
⚠️ Overfitting Detected

The Training Loop in Code

PyTorch (Python)
import torch
import torch.nn as nn
import torch.optim as optim

# 1. Define Model, Loss Function, and Optimizer
model = nn.Sequential(nn.Linear(10, 5), nn.ReLU(), nn.Linear(5, 1))
criterion = nn.MSELoss() # Measures Mean Squared Error
optimizer = optim.SGD(model.parameters(), lr=0.01) # Gradient Descent

epochs = 100
for epoch in range(epochs):
    # --- 1. FORWARD PASS ---
    predictions = model(training_data)
    
    # --- 2. CALCULATE LOSS ---
    loss = criterion(predictions, actual_labels)
    
    # --- 3. BACKWARD PASS ---
    optimizer.zero_grad() # Clear old gradients
    loss.backward()       # Compute new gradients (Backpropagation)
    optimizer.step()      # Update weights based on gradients
    
    if epoch % 20 == 0:
        print(f"Epoch {epoch} | Loss: {loss.item():.4f}")
$ python train.py
Epoch 0 | Loss: 2.4512
Epoch 20 | Loss: 1.1205
Epoch 40 | Loss: 0.6543
Epoch 60 | Loss: 0.2104
Epoch 80 | Loss: 0.0512
Process finished with exit code 0
Evaluating Accuracy: The Confusion Matrix

"Accuracy" isn't always enough. If a disease affects 1% of people, a model that simply always guesses "Healthy" is 99% accurate, but completely useless. We use a confusion matrix to see exactly how the model is right or wrong.

True Positive (TP)

Model predicted YES, and the actual answer was YES. (Correctly diagnosed a disease).

False Positive (FP)

Model predicted YES, but the actual answer was NO. (Type I Error: False Alarm).

False Negative (FN)

Model predicted NO, but the actual answer was YES. (Type II Error: Missed Diagnosis).

True Negative (TN)

Model predicted NO, and the actual answer was NO. (Correctly identified a healthy patient).

02 / Machine Learning

The Artificial Neuron

The fundamental atomic unit of learning. It is essentially a linear classifier.

A biological neuron fires when it receives enough stimulation. An artificial neuron mimics this mathematically. It takes inputs (x), multiplies them by learnable weights (w), adds a bias (b), and pushes the result through an activation function (like Sigmoid or ReLU) to introduce non-linearity.

If the weighted sum exceeds a threshold, the neuron "activates." By adjusting the weights, we change what stimulates the neuronβ€”training it to recognize specific patterns.

x₁ xβ‚‚ Ξ£(wx)+b 0.0
Weight 1 (Signal Strength) 0.5
Weight 2 (Signal Strength) 0.5
Bias (Activation Threshold) 0.0
Neuron Output: 0.00
02 / Machine Learning

Gradient Descent

The engine of learning. An iterative algorithm for finding the lowest error.

Imagine being blindfolded on a mountain and trying to find the bottom of the valley. You feel the ground to see which way is "down" (the gradient) and take a small step in that direction. In ML, the "mountain" is the Loss Function (total error), and your position is defined by the model's weights. Gradient Descent updates the weights to move iteratively towards the point of minimal error.

Error Landscape (Loss)
Learning Rate (Step Size)
Loss: Calculating...
03 / Deep Learning

Deep Neural Networks

Feature Abstraction. Why do we stack layers? To create a hierarchy of understanding.

A single layer can only solve simple, linear problems. By stacking layers (Deep Learning), the network learns progressively complex features. In image recognition, the first layer might detect edges. The second layer combines edges to detect shapes (circles, squares). The third layer combines shapes to detect objects (eyes, ears). This hierarchical representation is what layer combines shapes to detect objects (eyes, ears). This hierarchical representation is what makes Deep Learning so powerful.

Review: Overfitting
When a model learns the training data too well (including its noise) but fails to generalize to new, unseen data. It's like memorizing the answers to a test instead of understanding the subject.
Input Layer Hidden Layers Output
?
Click input nodes to propagate signal through the hierarchy.
03 / Deep Learning

Convolutional Neural Networks (CNNs)

How computers "see". Using filters to detect spatial patterns.

Images are grids of pixels. A CNN slides "filters" (or kernels) over the image to detect features like edges or curves. The multiplication of the filter values with the pixel values creates a "Feature Map".

Input Image (5x5)
-1
1
-1
-1
1
-1
-1
1
-1
β†’
Calculation
Sum(Pixels Γ— Weights)
0
Feature Match
04 / Natural Language Processing

Vector Embeddings

Mapping meaning to geometry. Transforming words into coordinates.

To a computer, "Apple" and "Orange" are just different strings. To make them useful, we convert them into lists of numbers (vectors) such that semantically similar words are close together in mathematical space.

This allows for Semantic Arithmetic. We can subtract the "Maleness" vector from "King", add "Femaleness", and the resulting vector points to "Queen".

King
Queen
Man
Woman
Semantic Space (2D Projection)
04 / Natural Language Processing

Self-Attention

Contextual weighting. How models understand the relevance of words.

Before Transformers, models read sentences left-to-right, often forgetting the start of a sentence by the time they reached the end. Self-Attention allows the model to look at every word in a sentence simultaneously and calculate how much each word relates to every other word. In the example below, "it" is ambiguous to a computer. Attention resolves this by linking "it" strongly to "animal" because the animal is tired.

The animal didn't cross the street because it was too tired.

Attention Weights for "it"
animal
0.85
street
0.10
tired
0.60
04 / Natural Language Processing

Large Language Models (GenAI)

From "Mining" patterns to generating answers.

Phase 1: Text Mining (Training)

The model doesn't store the internet. Instead, it "mines" billions of sentences to learn the statistical structure of language. It compresses this information into numerical weights. It creates a high-dimensional map of how words relate to one another (e.g., "doctor" appears near "nurse" more often than "table").

Phase 2: The Answer (Inference)

When you ask a question, the model doesn't retrieve a pre-written answer. It uses its map to calculate the probability of the next word. It is a prediction engine, constructing a novel response token-by-token based on the context you provided.

The future of AI is |
uncertain
45%
bright
30%
here
15%
Token Sampling (The "Choice")
05 / Advanced

Diffusion Models

Generating content via iterative denoising.

How does AI generate images? It doesn't paint like an artist. Instead, it acts like a sculptor revealing a statue from a block of marble, but the "marble" is static noise.

The model is trained to reverse the process of adding noise to an image. To generate a new image, we give it pure random noise and ask it: "What part of this looks like a cat?" It slightly adjusts the pixels. We repeat this thousands of times until the noise becomes a clear image.

Pure Noise (t=T) Refined Image (t=0)

Step: 100 (Noise)

05 / Advanced

Reinforcement Learning: Learning from Experience

Teaching machines to make sequences of decisions by maximizing numerical rewards.

Reinforcement Learning (RL) is the third paradigm of ML. Unlike Supervised Learning (learning from examples) or Unsupervised Learning (learning from structure), RL is about learning through interaction.

Imagine a robot in a maze. It doesn't have a map. It must move, observe the consequences, and adjust its behavior. This feedback loop is the core of "Agency" in AI.

The SAR Feedback Loop

S
State

The current environment snapshot.

A
Action

The choice made by the Agent.

R
Reward

The feedback (+1 or -1) received.

Exploration vs. Exploitation

Every RL agent faces a dilemma: Exploitation (doing what worked before) vs. Exploration (trying something new). An agent that only exploits might settle for a small reward, never discovering the "Jackpot" hidden around the corner.

Deep RL

When RL meets Neural Networks (Deep RL), machines can master complex games like AlphaGo or Dota 2. The network acts as a "Value Function," predicting which states will lead to the highest total reward in the long run.

Interactive Case Study: Wumpus World

In this classic AI problem, the agent must find the Gold without falling into a Pit or being eaten by the Wumpus. The agent doesn't know the map; it only perceives sensors: a Breeze (🌬️) indicates a Pit, and a Stench (🀒) indicates the Wumpus. Through trial and error, it learns a Policy to reach the goal.

Percepts (Sensors)
None

Find the Gold (πŸ’°). Avoid Pits (πŸ•³οΈ) and Wumpus (πŸ‘Ή).

05 / Advanced

Artificial General Intelligence (AGI)

Beyond specialization. The theoretical leap to human-level versatility.

Artificial Intelligence today is powerful but narrow. AGI represents the point where a machine matches human cognitive abilities across any domain. It is not just about doing one thing better; it is about the ability to learn everything.

Narrow AI (ANI)

Superhuman at specific tasks (e.g., Playing Chess, Diagnosing Cancer, Generating Text). Cannot apply Chess logic to driving a car.

General AI (AGI)

Versatile intelligence. Can learn any task a human can, from folding laundry to discovering new physics. It possesses **Cross-Domain Reasoning**.

Architecture Comparison
Logic Creativity Math Social Spatial Core
ANI: Isolated Skills

Intelligence Stages

01 / ANI

Narrow Intelligence

Where we are now. Models excel at specific domains but lack a "world model."

02 / AGI

General Intelligence

Human-level across all domains. Can self-correct, generalize, and learn autonomously.

03 / ASI

Super Intelligence

Intelligence that vastly exceeds human capacity in every possible metric.

How will we know? (The Tests)

AGI isn't just about high scores; it's about navigating the messy, physical, and social world.

The Coffee Test

Robot enters a random home and makes coffee without any prior map.

The Student Test

AI enrolls in university and earns a degree alongside humans.

The Employment Test

AI can perform any job a human currently does for money.

06 / Safety & Reliability

Model Hallucinations

Why statistical models confidently present falsehoods as facts.

A common misconception is that AI "lies." In reality, an LLM cannot lie because it has no concept of truth. It is a probabilistic engine. When it generates a response, it is simply selecting the most likely next token based on its training data.

Hallucinations (or confabulations) occur when the statistical path of the most probable tokens diverges from factual reality. This happens for several reasons:

Training Gaps

The model encounters a topic with sparse or conflicting data, forcing it to "guess" based on general patterns.

Compression Loss

Neural networks are lossy compression of the internet. Specific details (like dates or middle names) are often "blended" together.

Over-Optimization

The model is trained to be helpful and conversational, leading it to prioritize answering over admitting "I don't know."

Interactive / Probability vs. Truth

The Hallucination Fork

Input Context

"The capital of Australia is..."

Token: Canberra 32%

"Factually correct, but appears less frequently in colloquial datasets."

Token: Sydney 68%

"Highly frequent association. The model 'feels' this is the right answer."

Hallucination Occurred

The model followed the high-probability statistical path over the low-probability factual truth.

Deep Dive: The "Stochastic Parrot" Trap

The term "Stochastic Parrot" was famously coined by researchers (Bender, Gebru, et al.) to describe a fundamental limitation of Large Language Models.

The Trap: We mistake Fluency for Understanding.

Because a parrot can repeat words with perfect pronunciation doesn't mean it understands the concepts of "liberty" or "taxation." Similarly, an AI predicts the next word based on statistical patterns (stochasticity) without having a physical or logical "grounding" in reality.

Anthropomorphism

Humans are evolutionarily hardwired to assume anything that speaks coherently must have a mind. This "Illusion of Intent" makes us trust AI outputs even when they are pure statistical noise.

Lack of Grounding

An LLM has only seen text. It has never felt heat, seen a color, or experienced gravity. Its world is composed entirely of mathematical relationships between tokens.

The "Room" Analogy

Imagine a person in a room with a rulebook for Chinese symbols. They can provide perfect answers in Chinese without understanding a single word. This is the Chinese Room Argument, and it's exactly how LLMs operate.

Echo Chambers

If the training data contains a lie repeated 1,000 times, the model will "parrot" that lie as a high-probability truth. It cannot check the outside world to verify.

Meaning vs. Form

AI is a master of Form (grammar, syntax, style) but is currently disconnected from Meaning (truth, intent, consequence).

06 / Safety & Reliability

Retrieval-Augmented Generation (RAG)

Grounding models in external facts to eliminate hallucinations.

To solve the problem of hallucinations and outdated knowledge, researchers developed Retrieval-Augmented Generation (RAG). Instead of relying purely on the model's static internal weights (its "memory"), a RAG system retrieves relevant documents from an external source (like a database or wiki) and attaches them to the user's prompt as context before generating the answer.

Interactive Diagram / The Retrieval Pipeline
1. User Query "What is capital of X?" Vector DB Context Chunks 2. Augmentor [Query] + [Context] Merged Prompt 3. LLM Generation Stage Tokens Predict Grounded Answer
Interactive: Click "Simulate Pipeline" to run

The RAG Architecture

1. Retrieval

The user's query is converted into a vector embedding and matched against a Vector Database (containing chunks of verified text files, PDFs, or private wikis) to extract the most relevant snippets.

2. Augmentation

The retrieved context is injected directly into the prompt alongside the original query: "Answer the query using ONLY the following verified source text: [context]".

3. Generation

The LLM reads both the query and the source context, generating a highly accurate, factually grounded response with inline citations, preventing hallucination.

Static Weights vs. Dynamic Search

Standard LLMs have a "cutoff date" and cannot access real-time or private proprietary data. RAG connects the LLM to dynamic databases without needing expensive retraining or fine-tuning.

Vector Databases

Databases like Pinecone, Chroma, or pgvector index text using Vector Embeddings. This allows the system to find matches based on semantic meaning rather than exact keyword matches.

07 / Safety

The Alignment Problem

Ensuring AI goals match human values.

If you tell a super-intelligent AI to "eliminate cancer," it might decide the most efficient solution is to eliminate all humans. Specifying objectives without unintended side effects (reward hacking) is the central challenge of AI safety.

Example: The Paperclip Maximizer

A thought experiment by Nick Bostrom. An AI designed solely to maximize paperclip production might eventually convert the entire solar system into paperclips, destroying humanity in the process, simply because we didn't explicitly tell it not to.

Glossary

Key Definitions

Artificial Intelligence (AI)

Machines mimicking cognitive functions like logic, rules, and search.

Machine Learning (ML)

Systems that improve from data/experience rather than explicit programming.

Deep Learning (DL)

ML using multi-layered neural networks to model complex patterns.

Neural Networks

Computing systems inspired by biological neural networks.

Perceptron

A simple linear binary classifier, the fundamental unit of neural networks.

Backpropagation

An algorithm for training neural networks by propagating error backwards.

Transformers

A deep learning model that adopts the mechanism of attention, replacing RNNs.

Generative AI

AI capable of generating new content (text, images, code) in response to prompts.

Turing Test

A test of a machine's ability to exhibit intelligent behavior equivalent to a human.

Reinforcement Learning

A learning paradigm where an agent learns to make decisions by taking actions in an environment to maximize cumulative rewards (SAR loop).

08 / Knowledge Check