Data Base

Engines of
Information

A Database Management System (DBMS) is the software that interacts with end users, applications, and the database itself to capture and analyze data. This guide visualizes the hidden mechanics of storage, retrieval, and integrity.

00 / Foundation

SQL Queries

Structured Query Language (SQL) is the standard for communicating with databases. The four basic operations are CRUD: Create (INSERT), Read (SELECT), Update (UPDATE), and Delete (DELETE). Records are often identified by a Primary Key (a unique ID).

Statement Type
Analysis Functions
Terminal Output
// Ready
SELECT * FROM Students WHERE gpa > 3.5;

Retrieves data from the database. This query finds all students with a high GPA.

Live Database (Table: Students)
ID Name GPA
1 Alice 3.9
2 Bob 3.2
3 Charlie 3.7
* Updates are transient for demo purposes.
01 / Relations

SQL Joins

Visualizing how data from two distinct tables is combined based on a related column (Foreign Key).

Inner Join

Returns records that have matching values in both tables.

Left Join

Returns all records from the left table, and matched records from the right.

Right Join

Returns all records from the right table, and matched records from the left.

Full Outer Join

Returns all records when there is a match in either left or right table.

Self Join

T1 T1'

Joining a table with itself (e.g., matching employees to managers in the same table).

Users (T1)
ID Name
1 Alice
2 Bob
3 Charlie
4 David
Orders (T2)
ID User_ID Item
101 1 Laptop
102 2 Mouse
103 1 Monitor
104 5 Phone
Result Set
Name Item
Select Join Type
Waiting for query...
02 / Optimization

Indexing (B-Tree)

Without an index, the database must perform a "Full Table Scan" (O(N)). A B-Tree Index allows logarithmic time search (O(log N)).

Tree Idle.
08 / Structure

Normalization

Normalization is the process of organizing data to minimize redundancy. Move through the stages from Unnormalized (UNF) to Third Normal Form (3NF).

03 / Integrity

ACID Transactions

To ensure data validity, transactions must be Atomic (all or nothing), Consistent (valid state), Isolated (independent), and Durable (saved permanently).

Account A
$1000
Account B
$500
Transaction Log (WAL)
System Ready.
04 / Concurrency

Row-Level Locking

Row #42
Status: Available
Click a user to attempt a transaction.
10 / System Design

Architectures

How clients connect to the database. 2-Tier connects directly (e.g., Desktop App). 3-Tier uses an Application Server (e.g., Web App).

Client
App Server
Database
05 / Horizontal Scaling

Sharding

Shard 0
Shard 1
Shard 2
Partition Key: ID % 3
09 / Redundancy

Master-Slave Replication

Writes are sent to the Master node. The Master then copies (replicates) data to Read Replicas. This allows scaling reads but introduces replication lag.

M Master
Writes: 0
R1 Replica
R2 Replica
06 / Paradigms

SQL vs NoSQL

SQL (Relational) databases enforce strict schemas. NoSQL (Document) databases allow flexible, dynamic data structures.

New Data: +
Schema / Structure
Data Store
Ready.

Choosing the Right Tool

SQL (Relational)
  • Structure: Table-based with strict, pre-defined schemas.
  • Scaling: Vertical (Scale Up - adding RAM/CPU to one server).
  • Query Language: SQL (Structured Query Language).
  • Best For: Complex queries, multi-row transactions, ensuring data integrity (ACID).
  • Examples: MySQL, PostgreSQL, Oracle, SQL Server.
NoSQL (Non-Relational)
  • Structure: Flexible. Document (JSON), Key-Value, Graph, or Wide-Column.
  • Scaling: Horizontal (Scale Out - adding more servers/sharding).
  • Query Language: Dynamic schemas, often API-based.
  • Best For: Unstructured data, rapid prototyping, massive data volume, real-time apps.
  • Examples: MongoDB (Document), Redis (Key-Value), Cassandra (Wide-Column), Neo4j (Graph).
07 / Distributed Limits

CAP Theorem

In a distributed data store, you can only simultaneously guarantee two out of three: Consistency, Availability, and Partition Tolerance.

C A P Pick 2

Select two properties

Consistency: Every read receives the most recent write.
Availability: Every request receives a (non-error) response.
Partition Tolerance: The system continues to operate despite an arbitrary number of messages being dropped or delayed by the network between nodes.

Glossary

Key Definitions

DBMS

Database Management System software that interacts with end users, applications, and the database.

SQL

Structured Query Language, the standard for communicating with relational databases.

CRUD

Create, Read, Update, Delete - the four basic operations of persistent storage.

Primary Key

A unique identifier for a record in a table.

Foreign Key

A field that links to the primary key of another table, creating a relationship.

Normalization

Organizing data to minimize redundancy and dependency (e.g., 1NF, 2NF, 3NF).

ACID

Atomicity, Consistency, Isolation, Durability - properties that guarantee reliable transactions.

Index

A data structure (like B-Tree) that improves the speed of data retrieval operations.

Sharding

Horizontal scaling by splitting a database into smaller parts across multiple servers.

CAP Theorem

It is impossible for a distributed data store to simultaneously provide more than two out of three: Consistency, Availability, Partition Tolerance.

08 / Knowledge Check