How SQL queries execute in a database, using a real query example.

Understanding how an SQL query executes in a database is essential for performance tuning and system design. Here’s a step-by-step breakdown of what happens under the hood when you run an SQL query like:

SELECT name FROM employees WHERE department = 'Sales';

🧭 0. Query Input (Your SQL)

You submit the SQL query via:

SQL CLI / GUI (like pgAdmin, MySQL Workbench)
Application backend
Notebook (e.g., Databricks, Jupyter)

⚙️ Step-by-Step SQL Query Execution

🧩 Step 1: Parsing

The SQL Parser checks for syntax errors.
Converts SQL into an abstract syntax tree (AST).
Example: Confirms SELECT, FROM, WHERE are valid clauses.

✅ If there’s a syntax error → You get an error immediately.

📊 Step 2: Query Rewrite (Optional)

The query rewrite engine transforms the SQL to an equivalent but more efficient version.
Includes:
- View expansion (resolving a view into its base query)
- Subquery flattening
- Predicate pushdown
- Common subexpression elimination

Example:

SELECT * FROM view_emp WHERE dept = 'HR';

→ gets rewritten to underlying table query.

🧠 Step 3: Query Optimization

The query optimizer generates multiple query plans and picks the least-cost plan using:
- Table statistics (row count, distinct values)
- Index availability
- Join order and type
- Available keys, partitions

Example decision:

Should I scan the whole table or use an index on department?

✅ Outputs an execution plan (physical plan).

🛠️ Step 4: Query Plan Execution (Physical Operators)

The engine executes the physical plan with:

Scans (full table or index)
Filters (apply WHERE conditions)
Joins, Aggregations
Sorts, Grouping

All operators work in a pipeline, usually row-by-row or batch-by-batch.

💾 Step 5: Data Fetch / Buffer / Storage Layer

Fetches actual data blocks/pages from:
- Memory (buffer pool / cache)
- Disk (if not cached)
Applies filtering logic (e.g., department = 'Sales')

🔍 Uses indexes (if available) to avoid full scan.

📤 Step 6: Results Return

The output rows are passed back:
- Through client interface (ODBC, JDBC)
- Shown in your query editor / app
May be paginated or streamed depending on client

🗂️ Summary Flowchart

SQL Query 
   ↓
1. Parsing → Syntax Tree
   ↓
2. Rewrite (Views, Subqueries)
   ↓
3. Optimization (Plan + Cost Model)
   ↓
4. Execution Plan (Scan, Filter, Join, Aggregate)
   ↓
5. Fetch Data from Memory/Disk
   ↓
6. Return Result to User

🔍 Example: Optimized Plan for This Query

SELECT name FROM employees WHERE department = 'Sales';

Use index on department (if exists)
Perform index seek to locate matching rows
Retrieve name column using a covering index (if name is in index)
Avoid full scan

🧪 Tools to View Execution Plan

DB Engine	Command
PostgreSQL	`EXPLAIN ANALYZE SELECT ...`
MySQL	`EXPLAIN SELECT ...`
SQL Server	Use “Actual Execution Plan”
Databricks	`.explain(True)` or Visual Plan

🧠 Bonus: What Affects Execution?

Indexes and stats
Query complexity (joins, subqueries)
Table size and partitioning
Caching
Database engine (PostgreSQL vs MySQL vs Spark SQL)

HintsToday

recent posts

about