Great — let’s deep dive into “Execution Plan & Query Optimization” in an interview-ready format, with simple examples, key concepts, and ready answers to common questions.
✅ 1. What is an Execution Plan?
An Execution Plan is the step-by-step strategy used by the SQL engine to retrieve data. It shows how a query will run: which tables it touches, in what order, whether indexes are used, what joins it performs, etc.
You get the plan using:
EXPLAIN PLAN FOR <your-query>;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
-- or just
EXPLAIN <your-query>; -- in MySQL, PostgreSQL
✅ 2. Important Components of Execution Plan
Term | Meaning |
---|---|
TABLE ACCESS FULL | Full table scan (slow if data is huge) |
INDEX RANGE SCAN | Efficient scan using an index (range predicate like > , < , BETWEEN ) |
INDEX UNIQUE SCAN | Fast lookup using unique index (e.g., primary key) |
NESTED LOOPS | Best for small data sets or indexed joins |
HASH JOIN | Used for large datasets without indexes |
MERGE JOIN | Sort + join, good when both datasets are sorted |
SORT / FILTER | Sort or filter operations (costly if not optimized) |
✅ 3. Interview Tip: How to Debug a Slow Query?
📌 Sample Interview Answer:
To debug a slow query, I would:
- Run
EXPLAIN PLAN
to understand the path taken.- Look for Full Table Scans, Cartesian Joins, Sorts, or Filters.
- Check if indexes are being used (INDEX RANGE SCAN / INDEX UNIQUE SCAN).
- Use statistics update if outdated, as it affects cost-based optimizer.
- Rewrite subqueries / joins or use WITH (CTE) for better readability and reuse.
- Consider adding index hints or rewriting filters for better selectivity.
✅ 4. Index Hints (Oracle Example)
Force the optimizer to use an index:
SELECT /*+ INDEX(emp emp_idx1) */ * FROM emp WHERE deptno = 10;
Other hints:
FULL(emp)
– force full table scanUSE_NL(a b)
– use nested loop joinLEADING(emp dept)
– force join order
❗Use hints only after verifying the optimizer isn’t picking the best path.
✅ 5. Types of Index Scans in Execution Plan
Scan Type | When Used | Speed |
---|---|---|
Full Table Scan | No useful index or small table | ❌ Slow |
Index Range Scan | WHERE condition uses a range (> , < , etc.) | ✅ Fast |
Index Unique Scan | WHERE uses primary key or unique index | ✅✅ Very Fast |
Index Full Scan | Reads entire index (no filtering) | Medium |
Index Skip Scan | Uses non-leading column of composite index | Conditional |
✅ 6. Sample Scenario – Optimization
🐢 Slow Query:
SELECT * FROM orders WHERE status = 'Delivered';
Assume status
is not indexed, and the table has 10 million rows.
⚡ Optimized:
-- Create index
CREATE INDEX idx_status ON orders(status);
-- Rerun query, now index is used:
SELECT * FROM orders WHERE status = 'Delivered';
Use:
EXPLAIN PLAN FOR SELECT * FROM orders WHERE status = 'Delivered';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
You’ll now see: INDEX RANGE SCAN
instead of TABLE ACCESS FULL
.
✅ 7. Cost-Based Optimizer (CBO)
Oracle and most modern RDBMS use CBO. It evaluates multiple execution strategies and chooses the one with the lowest cost, based on:
- Table/index statistics
- Row counts, cardinality
- Join order & types
- Predicate selectivity
📌 Always make sure to run:
EXEC DBMS_STATS.GATHER_TABLE_STATS('schema', 'table_name');
After inserting/deleting large data — otherwise, execution plans might not be optimal.
✅ 8. Final Checklist to Optimize Queries
Checkpoint | Fix |
---|---|
Full Table Scan | Add index or rewrite filter condition |
Missing or outdated stats | Gather statistics |
Bad join order | Use LEADING or restructure query |
Cartesian joins | Add missing join conditions |
Large sort/filter steps | Add indexes or materialized views |
Non-sargable WHERE clauses | Avoid functions on columns (WHERE UPPER(name) → bad for index) |
✅ 9. Practice Dataset for Demo
You can simulate this in Oracle or PostgreSQL:
CREATE TABLE emp (
emp_id INT PRIMARY KEY,
name VARCHAR(100),
dept_id INT,
salary INT
);
CREATE INDEX idx_salary ON emp(salary);
EXPLAIN PLAN FOR
SELECT * FROM emp WHERE salary > 50000;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
Absolutely — this is an important and often-overlooked interview topic, especially for backend, full-stack, or data engineers involved in application database access. Here’s the full deep dive on SQL Injection & Security Best Practices with real-world examples, interview Q&A, and practical tips:
🔐 SQL Injection & Security Best Practices
🚨 What is SQL Injection?
SQL Injection is a security vulnerability that allows an attacker to interfere with the queries an application makes to its database. It can be used to:
- View data they shouldn’t (unauthorized access)
- Modify or delete data
- Execute administrative operations
- Bypass authentication
- Exfiltrate data (data leaks)
🧪 Classic Example of SQL Injection
Let’s say a backend login form takes user input and generates SQL:
SELECT * FROM users WHERE username = 'admin' AND password = '1234';
Now an attacker enters this:
Username: ' OR 1=1 --
Password: anything
The resulting query becomes:
SELECT * FROM users WHERE username = '' OR 1=1 --' AND password = 'anything';
➡️ This returns all rows, bypassing login.
🛡️ Best Practices to Prevent SQL Injection
✅ 1. Use Parameterized Queries / Bind Variables
Language-independent, secure practice
Java (JDBC):
String query = "SELECT * FROM users WHERE username = ? AND password = ?";
PreparedStatement pstmt = conn.prepareStatement(query);
pstmt.setString(1, username);
pstmt.setString(2, password);
ResultSet rs = pstmt.executeQuery();
Python (psycopg2 / SQLite3):
cursor.execute("SELECT * FROM users WHERE username = %s AND password = %s", (username, password))
Oracle Bind Variable (PL/SQL):
SELECT * FROM employees WHERE department_id = :dept_id;
✅ Safe from SQL injection as values are treated separately from SQL logic.
✅ 2. Use ORM Tools (Hibernate, Sequelize, SQLAlchemy)
ORMs abstract raw SQL and automatically use bind variables under the hood.
# SQLAlchemy (Python)
session.query(User).filter_by(username='john', password='secret').first()
✅ 3. Avoid Dynamic SQL Unless Absolutely Necessary
Never do this:
EXECUTE IMMEDIATE 'SELECT * FROM emp WHERE empno = ' || p_empno;
Instead use:
EXECUTE IMMEDIATE 'SELECT * FROM emp WHERE empno = :1' USING p_empno;
✅ 4. Validate and Sanitize Inputs
- Enforce strong data types
- Use whitelisting (e.g., dropdowns or regex)
- Avoid accepting SQL syntax as input
✅ 5. Minimize DB Permissions
- Least Privilege Principle
- Separate DB roles: admin vs app user
- App should not have
DROP
,ALTER
, orGRANT
access
✅ 6. Monitor and Audit Logs
- Enable SQL audit trails (Oracle:
AUDIT
command) - Use Web Application Firewalls (WAFs) to detect injection patterns
🧠 Interview Question + Answer
Q1. How do you prevent SQL injection in a production-grade application?
✅ Answer:
We always use parameterized queries or bind variables through prepared statements. This separates code from data, ensuring inputs are treated as literals. We also avoid dynamic SQL unless absolutely required, validate inputs at both front-end and back-end, and follow least-privilege for database roles. If ORMs are used, we ensure all custom queries still follow the same pattern.
Q2. What if you absolutely need dynamic SQL? How do you keep it safe?
✅ Answer:
If dynamic SQL is necessary (e.g., dynamic table names), we do not concatenate input directly. Instead, we sanitize it and still use bind variables for the parameter parts. In Oracle, we use
EXECUTE IMMEDIATE
withUSING
.
Q3. What is the difference between bind variables and hardcoded values?
✅ Answer:
Bind variables are placeholders used in SQL that the database compiles once and executes multiple times with different values. Hardcoded values cause the SQL to be parsed every time, and if unescaped, may also expose the system to SQL injection.
💡 Real-World Use Case
Login System Code Review
Before:
cursor.execute("SELECT * FROM users WHERE username = '" + uname + "' AND password = '" + pwd + "';")
After (Secure):
cursor.execute("SELECT * FROM users WHERE username = %s AND password = %s", (uname, pwd))
🚫 Common Mistakes
Mistake | Explanation |
---|---|
Using string concatenation in queries | Opens up to injection attacks |
Blind trust in front-end validations | Attackers can bypass browser controls |
Giving full DB privileges to apps | Increases damage in case of breach |
Not logging failed login attempts | Misses early indicators of attacks |
✅ Security Checklist for Interviews
- Use Prepared/Parameterized Statements
- Validate Inputs (length, type, format)
- Minimize DB Privileges
- Use Stored Procedures for business logic
- Enable logging, audit trails
- Avoid raw SQL unless absolutely necessary
- Educate team on OWASP SQLi risks
Here’s a detailed explanation of Correlated vs Non-Correlated Subqueries — covering definition, use cases, performance, interview tips, and hands-on SQL examples with sample data.
🔍 What is a Subquery?
A subquery is a query nested inside another SQL query (main/outer query). It can be in:
- SELECT clause
- WHERE clause
- FROM clause
🧠 Types of Subqueries
Feature | Non-Correlated Subquery | Correlated Subquery |
---|---|---|
Dependency on outer query | ❌ Independent | ✅ Depends on outer query row-by-row |
Execution Count | ✅ Once | ❌ Multiple times (once per outer row) |
Performance | ✅ Faster (in general) | ❌ Slower (may not scale well) |
Use Case | Aggregates, comparisons | Row-by-row checks (e.g., existence, condition) |
🧪 Sample Dataset
employees
emp_id | name | department_id | salary |
---|---|---|---|
101 | Alice | 10 | 5000 |
102 | Bob | 20 | 6000 |
103 | Charlie | 10 | 7000 |
104 | David | 30 | 4500 |
105 | Eva | 20 | 6500 |
✅ Example: Non-Correlated Subquery
Goal: Get all employees whose salary is greater than the average salary.
SELECT name, salary
FROM employees
WHERE salary > (
SELECT AVG(salary) FROM employees
);
🧾 Explanation:
- Subquery: returns a single scalar value (avg salary = 5800).
- Outer query uses this value to filter.
🧠 Interview Tip: “Use non-correlated when your subquery returns an aggregate or independent filter value.”
🔁 Example: Correlated Subquery
Goal: Get employees whose salary is above the average salary of their department.
SELECT e.name, e.department_id, e.salary
FROM employees e
WHERE e.salary > (
SELECT AVG(salary)
FROM employees
WHERE department_id = e.department_id
);
🧾 Explanation:
- Subquery runs once for each row in
employees
. - It calculates average salary per department, using outer reference
e.department_id
.
🔧 Performance Implication:
- Can be expensive on large datasets due to row-wise execution.
🚀 Optimize Correlated Subqueries
Replace with JOIN + GROUP BY if possible:
WITH dept_avg AS (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
)
SELECT e.name, e.salary, e.department_id
FROM employees e
JOIN dept_avg d
ON e.department_id = d.department_id
WHERE e.salary > d.avg_salary;
✅ Better performance and scalability.
🎯 Real Interview Scenario
Q: Find employees earning more than the average salary of their department. Can you do it without a correlated subquery?
🧠 Expected Answer:
- “Yes, we can first pre-aggregate department averages and then join with the original table.”
- Then write the
WITH
+JOIN
version shown above.
✍ Summary
Topic | Notes |
---|---|
Correlated Subquery | Subquery depends on outer query. Slower. Good for per-row checks. |
Non-Correlated Subquery | Subquery independent. Faster. Used for single result filters. |
Interview Tip | Rewrite correlated subqueries using JOINs or CTEs for optimization. |
SQL Optimization | Always watch for correlated subqueries inside large datasets — they slow down execution significantly. Use EXPLAIN PLAN . |
Here’s a comprehensive guide on handling JSON and XML in Oracle SQL, focusing on advanced use cases using JSON_TABLE
, JSON_VALUE
, XMLTABLE
, etc. This is especially important for cloud-based, microservices, or NoSQL+SQL hybrid environments where structured and semi-structured data coexist.
🔷 Part 1: JSON Handling in Oracle SQL
Oracle supports native JSON data starting from 12c. You can store JSON in VARCHAR2
, CLOB
, or the new JSON
data type (from Oracle 21c onwards).
✅ 1. Extract Single Value: JSON_VALUE
Purpose: Extract a scalar (string, number, boolean) value.
SELECT JSON_VALUE('{"id": 101, "name": "Rajeev"}', '$.name') AS name
FROM dual;
-- Output: Rajeev
Supports options like:
JSON_VALUE(json_column, '$.price' RETURNING NUMBER)
✅ 2. Extract JSON Object/Array as Text: JSON_QUERY
Purpose: Extract non-scalar JSON (objects or arrays) as text.
SELECT JSON_QUERY('{"id": 1, "skills": ["SQL", "Python"]}', '$.skills')
FROM dual;
-- Output: ["SQL", "Python"]
✅ 3. Extract Multiple Rows: JSON_TABLE
Purpose: Convert JSON arrays/objects into relational rows and columns.
SELECT *
FROM JSON_TABLE(
'{
"employees": [
{"id": 1, "name": "Raj"},
{"id": 2, "name": "Amit"}
]
}',
'$.employees[*]'
COLUMNS (
emp_id NUMBER PATH '$.id',
emp_name VARCHAR2(50) PATH '$.name'
)
);
📌 Think of JSON_TABLE
as unpivoting JSON arrays into relational rows.
✅ 4. Filter JSON Data: IS JSON
/ JSON_EXISTS
-- Validate a JSON column
SELECT *
FROM employees
WHERE json_col IS JSON;
-- Filter only if a JSON path exists
SELECT *
FROM orders
WHERE JSON_EXISTS(json_col, '$.customer.address.zip');
🔷 Part 2: XML Handling in Oracle SQL
✅ 1. Extract Single Value: EXTRACTVALUE
(deprecated), use XMLQUERY
/ XMLTABLE
SELECT XMLTYPE('<person><name>Rajeev</name></person>').EXTRACT('//name/text()').getStringVal()
FROM dual;
✅ 2. Use XMLTABLE
for Relational Output
Example: Parse XML and turn into rows:
SELECT *
FROM XMLTABLE(
'/employees/employee'
PASSING XMLTYPE('
<employees>
<employee><id>1</id><name>Raj</name></employee>
<employee><id>2</id><name>Amit</name></employee>
</employees>'
)
COLUMNS
emp_id NUMBER PATH 'id',
emp_name VARCHAR2(50) PATH 'name'
);
🔷 Advanced Use Cases (Hybrid / Real-World)
🔸1. Nested JSON with JSON_TABLE
SELECT *
FROM JSON_TABLE(
'{
"orders": [
{"id": 1, "items": [{"product": "Pen", "qty": 10}, {"product": "Book", "qty": 5}]},
{"id": 2, "items": [{"product": "Pencil", "qty": 3}]}
]
}',
'$.orders[*]'
COLUMNS (
order_id NUMBER PATH '$.id',
NESTED PATH '$.items[*]'
COLUMNS (
product_name VARCHAR2(50) PATH '$.product',
quantity NUMBER PATH '$.qty'
)
)
);
🔸2. XMLTABLE with Nested Structures
SELECT *
FROM XMLTABLE(
'/orders/order'
PASSING XMLTYPE('
<orders>
<order>
<id>1</id>
<items>
<item><product>Pen</product><qty>10</qty></item>
<item><product>Book</product><qty>5</qty></item>
</items>
</order>
</orders>'
)
COLUMNS
order_id NUMBER PATH 'id',
items XMLTYPE PATH 'items'
) o,
XMLTABLE(
'/items/item'
PASSING o.items
COLUMNS
product VARCHAR2(50) PATH 'product',
qty NUMBER PATH 'qty'
);
🔷 Real Interview Scenarios
Use Case | SQL Feature | Description |
---|---|---|
Convert JSON API response to table | JSON_TABLE | For ETL on REST data |
Validate JSON schema | IS JSON , JSON_EXISTS | Check for mandatory keys |
Flatten XML invoice feed | XMLTABLE | Common in legacy systems |
Store dynamic attributes | JSON/XML column | Semi-structured, searchable |
Hybrid system (RDBMS + NoSQL) | JSON + SQL | Use JSON_QUERY , JSON_VALUE |
🔷 Summary Cheat Sheet
Feature | JSON | XML |
---|---|---|
Single value | JSON_VALUE | EXTRACTVALUE (old), XMLQUERY |
Full object | JSON_QUERY | XMLTYPE , EXTRACT() |
Rows/tables | JSON_TABLE | XMLTABLE |
Validate | IS JSON , JSON_EXISTS | XMLIsWellFormed() |
Nested data | Nested JSON_TABLE | Join XMLTABLE results |
🔷 Bonus: Indexing JSON/XML Columns
-- JSON index (Oracle 12.2+)
CREATE INDEX idx_orders_json ON orders(json_col)
INDEXTYPE IS JSON;
-- Functional index
CREATE INDEX idx_customer_city ON customers(JSON_VALUE(json_col, '$.address.city'));
Absolutely — the MERGE
statement (aka Upsert) is critical in ETL, data warehousing, slowly changing dimensions (SCD), and CDC (Change Data Capture) logic.
🔷 MERGE INTO
: Syntax & Core Concept
The MERGE
statement performs:
- UPDATE if the record exists (matched),
- INSERT if it doesn’t exist (not matched).
✅ Basic Syntax:
MERGE INTO target_table t
USING source_table s
ON (t.id = s.id)
WHEN MATCHED THEN
UPDATE SET t.name = s.name, t.salary = s.salary
WHEN NOT MATCHED THEN
INSERT (id, name, salary)
VALUES (s.id, s.name, s.salary);
📌 Think of MERGE
as:
IF EXISTS (match)
THEN UPDATE
ELSE
INSERT
🔷 Practical Example
🧱 Tables:
-- Target Table (e.g. Data Warehouse Table)
CREATE TABLE emp_dw (
emp_id NUMBER PRIMARY KEY,
emp_name VARCHAR2(100),
salary NUMBER
);
-- Source Table (e.g. Staging Table)
CREATE TABLE emp_stage (
emp_id NUMBER,
emp_name VARCHAR2(100),
salary NUMBER
);
🧪 Data:
-- Data in target (DW)
1, 'Raj', 50000
-- Data in source (Stage)
1, 'Rajeev', 55000 -- should update
2, 'Amit', 40000 -- should insert
🔁 MERGE Logic:
MERGE INTO emp_dw dw
USING emp_stage stg
ON (dw.emp_id = stg.emp_id)
WHEN MATCHED THEN
UPDATE SET dw.emp_name = stg.emp_name,
dw.salary = stg.salary
WHEN NOT MATCHED THEN
INSERT (emp_id, emp_name, salary)
VALUES (stg.emp_id, stg.emp_name, stg.salary);
🔷 Advanced Use Cases
🔸1. Add Filters (e.g., Update Only If Salary Changes)
WHEN MATCHED AND dw.salary != stg.salary THEN
UPDATE SET dw.salary = stg.salary
🔸2. Insert Only Select Columns
INSERT (emp_id, salary) VALUES (stg.emp_id, stg.salary)
🔸3. SCD Type 1 (Overwrite Changes)
MERGE INTO dim_customer d
USING stg_customer s
ON (d.cust_id = s.cust_id)
WHEN MATCHED THEN
UPDATE SET d.name = s.name, d.city = s.city
WHEN NOT MATCHED THEN
INSERT (cust_id, name, city)
VALUES (s.cust_id, s.name, s.city);
🔸4. SCD Type 2 (Preserve History)
SCD Type 2 needs additional logic (valid_from, valid_to, is_current). MERGE
is used only for insert new records and expire old ones via UPDATE
.
🔸5. UPSERT from JSON or API into Table (Oracle 21c+ JSON)
MERGE INTO customer t
USING JSON_TABLE(:json_payload, '$[*]'
COLUMNS (id PATH '$.id', name PATH '$.name')) s
ON (t.id = s.id)
WHEN MATCHED THEN
UPDATE SET t.name = s.name
WHEN NOT MATCHED THEN
INSERT (id, name) VALUES (s.id, s.name);
🔷 Interview Tip
💬 “How do you perform UPSERT in Oracle SQL?”
Answer:
Using the
MERGE
statement. It compares rows between a target and source using a join condition. If a match is found, it updates; otherwise, it inserts. This is commonly used in ETL pipelines and data warehouse dimensions.
🔷 Alternatives (Non-MERGE)
In some environments (e.g., Snowflake, PostgreSQL), you may use:
DBMS | Alternative |
---|---|
Oracle | MERGE |
PostgreSQL | INSERT ... ON CONFLICT DO UPDATE |
MySQL | INSERT ... ON DUPLICATE KEY UPDATE |
SQL Server | MERGE (with caution, has edge cases) |
Snowflake | MERGE |
BigQuery | MERGE |
✅ Summary: MERGE Highlights
Scenario | MERGE Feature |
---|---|
UPSERT Logic | ✔️ |
ETL Staging | ✔️ |
Data Warehouse Dims (SCD) | ✔️ |
Performance | Efficient on indexed joins |
Conditional Updates | Yes via WHEN MATCHED AND ... |
JSON Payloads | Yes (from 21c using JSON_TABLE ) |
Here’s a crisp yet deep dive into Partitioning Techniques — particularly for Oracle, BigQuery, and Redshift — focusing on Range, Hash, List partitioning, pruning, sub-partitioning, and indexing strategies.
🧠 Why Partitioning Matters
Partitioning improves:
- Query performance (via partition pruning)
- Data management (purging, archiving)
- Parallelism and load balancing
- Scalability for large datasets
🔷 Types of Partitioning
Type | Description | Best For |
---|---|---|
🔢 Range | Based on ranges (e.g. date, salary) | Time-series, archiving |
#️⃣ Hash | Based on a hash function | Uniform distribution |
📋 List | Based on discrete values (e.g. region=’US’) | Categorical data |
📊 Composite | Combination (e.g., Range + Hash) | Complex use cases |
🔷 1. Oracle Partitioning
Oracle supports all types: Range, Hash, List, and Composite (Sub-Partitioning)
✅ a. Range Partition Example
CREATE TABLE sales (
sale_id NUMBER,
sale_date DATE,
amount NUMBER
)
PARTITION BY RANGE (sale_date) (
PARTITION p2023q1 VALUES LESS THAN (TO_DATE('2023-04-01', 'YYYY-MM-DD')),
PARTITION p2023q2 VALUES LESS THAN (TO_DATE('2023-07-01', 'YYYY-MM-DD'))
);
✅ b. Hash Partition
CREATE TABLE users (
user_id NUMBER,
name VARCHAR2(100)
)
PARTITION BY HASH(user_id)
PARTITIONS 4;
✅ c. List Partition
CREATE TABLE customer_data (
cust_id NUMBER,
region VARCHAR2(10)
)
PARTITION BY LIST (region) (
PARTITION p_north VALUES ('NORTH'),
PARTITION p_south VALUES ('SOUTH')
);
✅ d. Composite Partition (Range + Hash)
CREATE TABLE orders (
order_id NUMBER,
order_date DATE,
cust_id NUMBER
)
PARTITION BY RANGE (order_date)
SUBPARTITION BY HASH (cust_id)
SUBPARTITIONS 4
(
PARTITION p2024 VALUES LESS THAN (TO_DATE('2025-01-01', 'YYYY-MM-DD')),
PARTITION p2025 VALUES LESS THAN (MAXVALUE)
);
⚡ Oracle Partition Pruning
Oracle automatically skips partitions if WHERE clause filters allow it.
SELECT * FROM sales WHERE sale_date >= DATE '2023-01-01';
-- Prunes all irrelevant partitions
You can verify with:
EXPLAIN PLAN FOR ...
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
🔷 2. BigQuery Partitioning
BigQuery supports:
- Ingestion time partitioning
- Partitioned Tables (by column)
- Clustering (secondary filtering)
✅ a. Partition by DATE or TIMESTAMP
CREATE TABLE sales_partitioned (
sale_id INT64,
sale_date DATE
)
PARTITION BY sale_date;
✅ b. Pseudo Hash via Clustering (No true HASH partition)
CREATE TABLE logs_partitioned
PARTITION BY DATE(log_time)
CLUSTER BY user_id;
📌 Clustering helps optimize filtering/sorting, especially when partitioning alone isn’t enough.
⚡ Partition Pruning in BigQuery
-- This is efficient (partition prune)
SELECT * FROM sales_partitioned
WHERE sale_date = '2023-01-01';
-- This disables pruning (bad):
WHERE EXTRACT(YEAR FROM sale_date) = 2023;
🔷 3. Redshift Partitioning (via Distribution + Sort Keys)
Redshift doesn’t support traditional partitioning, but uses:
- Distribution Keys (DISTKEY) = like HASH partition
- Sort Keys = like Range partition
✅ a. Hash Distribution
CREATE TABLE sales (
sale_id INT,
customer_id INT,
sale_date DATE
)
DISTSTYLE KEY
DISTKEY(customer_id);
📌 Ensures all same customer data goes to the same node.
✅ b. Range-like Behavior: SORTKEY
CREATE TABLE sales (
sale_id INT,
sale_date DATE
)
SORTKEY(sale_date);
📌 Makes range queries fast (BETWEEN
, >
, <
, etc.)
⚡ Redshift Best Practices
Scenario | Strategy |
---|---|
Large joins | Use same DISTKEY |
Time-series data | Use SORTKEY(date) |
Small lookup tables | Use DISTSTYLE ALL (replicated) |
🔷 Indexing with Partitioning
DB | Strategy |
---|---|
Oracle | Local indexes (partition-wise) are common; global indexes work too |
BigQuery | No indexes; relies on partition + clustering |
Redshift | No indexes; sortkey + distkey act as physical optimizers |
🔷 Summary Table
Feature | Oracle | BigQuery | Redshift |
---|---|---|---|
Range Partitioning | ✅ Yes | ✅ Yes | ❌ (Sort Key only) |
Hash Partitioning | ✅ Yes | ❌ (Cluster-only) | ✅ (DISTKEY) |
List Partitioning | ✅ Yes | ❌ | ❌ |
Composite (Subpart) | ✅ Yes | ❌ | ❌ |
Partition Pruning | ✅ Automatic | ✅ Column-aware | ❌ (Handled via sortkey) |
Index Support | ✅ (Local/Global) | ❌ (None) | ❌ (None) |
💬 Interview Insight
“How would you partition a multi-billion row table in Oracle/BigQuery?”
Answer:
I’d use range partitioning on a time column (e.g., sale_date), and if further optimization is needed, apply sub-partitioning (e.g., hash on customer_id). This supports fast partition pruning, parallel processing, and easy purging of old partitions.
Excellent! You’ve now covered a complete, enterprise-grade SQL interview roadmap.
Below is a clean summary of points 7–10 with quick interview insights, real use cases, and tips — ideal for last-minute revision or documentation.
✅ 7. Temporal Tables (System-Versioned Tables)
📌 What it is:
A table that automatically tracks historical changes — records “when” each row was valid.
🧠 Core Concepts:
- Two timestamps:
valid_from
,valid_to
- Common in auditing, slowly changing dimensions (SCD2), and regulatory logs
- Also known as system-versioned tables
📌 Supported in:
SQL Server (SYSTEM_VERSIONING = ON
), Oracle (via custom triggers or Flashback), PostgreSQL (manual), BigQuery (using FOR SYSTEM_TIME AS OF
)
✅ Use Case:
-- Query as of a previous point in time
SELECT * FROM employees
FOR SYSTEM_TIME AS OF TIMESTAMP '2024-12-31 23:59:59';
💡 Interview Tip:
Be ready to simulate temporal behavior manually if the RDBMS doesn’t support it natively (e.g., keep history tables, use triggers).
✅ 8. Hierarchical Queries
📌 What it is:
Used to model tree structures like org charts, category hierarchies, file systems, etc.
🧠 Syntax Across RDBMS:
DB | Syntax |
---|---|
Oracle | CONNECT BY PRIOR + LEVEL |
PostgreSQL / MySQL 8+ | WITH RECURSIVE CTE |
SQL Server | WITH RECURSIVE (same as above) |
✅ Example (PostgreSQL):
WITH RECURSIVE org_chart AS (
SELECT emp_id, manager_id, emp_name, 1 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
SELECT e.emp_id, e.manager_id, e.emp_name, oc.level + 1
FROM employees e
JOIN org_chart oc ON e.manager_id = oc.emp_id
)
SELECT * FROM org_chart;
💡 Interview Tip:
Interviewers love this for BOM (Bill of Materials) questions. Know both Oracle (CONNECT BY
) and CTE style.
✅ 9. Role of SQL in Data Pipelines & Orchestration
📌 Context:
SQL powers:
- Extract: Pulling data from source systems
- Transform: Cleansing, joins, validations
- Load: Inserting into reporting/marts
💡 Orchestration Context:
- Airflow DAGs: SQL used in operators like
PostgresOperator
,BigQueryOperator
- Spark SQL Jobs: SQL queries over DataFrames or Hive tables
- Validation Checks: Row counts, null checks, duplicates, reconciliation
✅ Sample Validation SQL:
-- Check record mismatch
SELECT COUNT(*) FROM src_table
MINUS
SELECT COUNT(*) FROM target_table;
💡 Interview Tip:
Be ready to explain how you used SQL for quality checks or task dependencies in orchestration.
✅ 10. SQL Coding Best Practices
📌 Why Important:
Clean SQL → Better maintenance → Fewer bugs in pipelines/reports
✅ Best Practices Summary:
Aspect | Best Practice |
---|---|
Naming | Use snake_case or camelCase , meaningful aliases |
NULLs | Use COALESCE , beware of NULL != NULL , handle CASE WHEN col IS NULL explicitly |
Casting | Always cast explicitly in calculations |
Joins | Use aliasing and qualify columns: a.id = b.id |
Reusability | Use CTEs (WITH ) for modular, readable SQL |
Performance | Avoid SELECT * , use indexed columns in filters, monitor query plans |
Comments | Always explain logic in complex CTEs or subqueries |
💡 Interview Tip:
They often ask you to “optimize” a given messy SQL. Follow these rules and explain your cleanup rationale.
Leave a Reply