Understanding how an SQL query executes in a database is essential for performance tuning and system design. Here’s a step-by-step breakdown of what happens under the hood when you run an SQL query like:
SELECT name FROM employees WHERE department = 'Sales';
🧭 0. Query Input (Your SQL)
You submit the SQL query via:
- SQL CLI / GUI (like pgAdmin, MySQL Workbench)
- Application backend
- Notebook (e.g., Databricks, Jupyter)
⚙️ Step-by-Step SQL Query Execution
🧩 Step 1: Parsing
- The SQL Parser checks for syntax errors.
- Converts SQL into an abstract syntax tree (AST).
- Example: Confirms
SELECT
,FROM
,WHERE
are valid clauses.
✅ If there’s a syntax error → You get an error immediately.
📊 Step 2: Query Rewrite (Optional)
- The query rewrite engine transforms the SQL to an equivalent but more efficient version.
- Includes:
- View expansion (resolving a view into its base query)
- Subquery flattening
- Predicate pushdown
- Common subexpression elimination
Example:
SELECT * FROM view_emp WHERE dept = 'HR';
→ gets rewritten to underlying table query.
🧠 Step 3: Query Optimization
- The query optimizer generates multiple query plans and picks the least-cost plan using:
- Table statistics (row count, distinct values)
- Index availability
- Join order and type
- Available keys, partitions
Example decision:
Should I scan the whole table or use an index on
department
?
✅ Outputs an execution plan (physical plan).
🛠️ Step 4: Query Plan Execution (Physical Operators)
The engine executes the physical plan with:
- Scans (full table or index)
- Filters (apply WHERE conditions)
- Joins, Aggregations
- Sorts, Grouping
All operators work in a pipeline, usually row-by-row or batch-by-batch.
💾 Step 5: Data Fetch / Buffer / Storage Layer
- Fetches actual data blocks/pages from:
- Memory (buffer pool / cache)
- Disk (if not cached)
- Applies filtering logic (e.g.,
department = 'Sales'
)
🔍 Uses indexes (if available) to avoid full scan.
📤 Step 6: Results Return
- The output rows are passed back:
- Through client interface (ODBC, JDBC)
- Shown in your query editor / app
- May be paginated or streamed depending on client
🗂️ Summary Flowchart
SQL Query
↓
1. Parsing → Syntax Tree
↓
2. Rewrite (Views, Subqueries)
↓
3. Optimization (Plan + Cost Model)
↓
4. Execution Plan (Scan, Filter, Join, Aggregate)
↓
5. Fetch Data from Memory/Disk
↓
6. Return Result to User
🔍 Example: Optimized Plan for This Query
SELECT name FROM employees WHERE department = 'Sales';
- Use index on
department
(if exists) - Perform index seek to locate matching rows
- Retrieve
name
column using a covering index (ifname
is in index) - Avoid full scan
🧪 Tools to View Execution Plan
DB Engine | Command |
---|---|
PostgreSQL | EXPLAIN ANALYZE SELECT ... |
MySQL | EXPLAIN SELECT ... |
SQL Server | Use “Actual Execution Plan” |
Databricks | .explain(True) or Visual Plan |
🧠 Bonus: What Affects Execution?
- Indexes and stats
- Query complexity (joins, subqueries)
- Table size and partitioning
- Caching
- Database engine (PostgreSQL vs MySQL vs Spark SQL)
Leave a Reply