Understanding how an SQL query executes in a database is essential for performance tuning and system design. Here’s a step-by-step breakdown of what happens under the hood when you run an SQL query like:

SELECT name FROM employees WHERE department = 'Sales';

🧭 0. Query Input (Your SQL)

You submit the SQL query via:

  • SQL CLI / GUI (like pgAdmin, MySQL Workbench)
  • Application backend
  • Notebook (e.g., Databricks, Jupyter)

⚙️ Step-by-Step SQL Query Execution

🧩 Step 1: Parsing

  • The SQL Parser checks for syntax errors.
  • Converts SQL into an abstract syntax tree (AST).
  • Example: Confirms SELECT, FROM, WHERE are valid clauses.

✅ If there’s a syntax error → You get an error immediately.


📊 Step 2: Query Rewrite (Optional)

  • The query rewrite engine transforms the SQL to an equivalent but more efficient version.
  • Includes:
    • View expansion (resolving a view into its base query)
    • Subquery flattening
    • Predicate pushdown
    • Common subexpression elimination

Example:

SELECT * FROM view_emp WHERE dept = 'HR';

→ gets rewritten to underlying table query.


🧠 Step 3: Query Optimization

  • The query optimizer generates multiple query plans and picks the least-cost plan using:
    • Table statistics (row count, distinct values)
    • Index availability
    • Join order and type
    • Available keys, partitions

Example decision:

Should I scan the whole table or use an index on department?

✅ Outputs an execution plan (physical plan).


🛠️ Step 4: Query Plan Execution (Physical Operators)

The engine executes the physical plan with:

  • Scans (full table or index)
  • Filters (apply WHERE conditions)
  • Joins, Aggregations
  • Sorts, Grouping

All operators work in a pipeline, usually row-by-row or batch-by-batch.


💾 Step 5: Data Fetch / Buffer / Storage Layer

  • Fetches actual data blocks/pages from:
    • Memory (buffer pool / cache)
    • Disk (if not cached)
  • Applies filtering logic (e.g., department = 'Sales')

🔍 Uses indexes (if available) to avoid full scan.


📤 Step 6: Results Return

  • The output rows are passed back:
    • Through client interface (ODBC, JDBC)
    • Shown in your query editor / app
  • May be paginated or streamed depending on client

🗂️ Summary Flowchart

SQL Query 
   ↓
1. Parsing → Syntax Tree
   ↓
2. Rewrite (Views, Subqueries)
   ↓
3. Optimization (Plan + Cost Model)
   ↓
4. Execution Plan (Scan, Filter, Join, Aggregate)
   ↓
5. Fetch Data from Memory/Disk
   ↓
6. Return Result to User

🔍 Example: Optimized Plan for This Query

SELECT name FROM employees WHERE department = 'Sales';
  • Use index on department (if exists)
  • Perform index seek to locate matching rows
  • Retrieve name column using a covering index (if name is in index)
  • Avoid full scan

🧪 Tools to View Execution Plan

DB EngineCommand
PostgreSQLEXPLAIN ANALYZE SELECT ...
MySQLEXPLAIN SELECT ...
SQL ServerUse “Actual Execution Plan”
Databricks.explain(True) or Visual Plan

🧠 Bonus: What Affects Execution?

  • Indexes and stats
  • Query complexity (joins, subqueries)
  • Table size and partitioning
  • Caching
  • Database engine (PostgreSQL vs MySQL vs Spark SQL)

Pages: 1 2


Discover more from HintsToday

Subscribe to get the latest posts sent to your email.

Posted in

Leave a Reply

Discover more from HintsToday

Subscribe now to keep reading and get access to the full archive.

Continue reading