SQL Tutorial for Interviews- Types of SQL /Spark SQL commands- DDL, DML, TCL, CRUD in SQL

Great — let’s deep dive into “Execution Plan & Query Optimization” in an interview-ready format, with simple examples, key concepts, and ready answers to common questions.


✅ 1. What is an Execution Plan?

An Execution Plan is the step-by-step strategy used by the SQL engine to retrieve data. It shows how a query will run: which tables it touches, in what order, whether indexes are used, what joins it performs, etc.

You get the plan using:

EXPLAIN PLAN FOR <your-query>;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
-- or just
EXPLAIN <your-query>; -- in MySQL, PostgreSQL

✅ 2. Important Components of Execution Plan

TermMeaning
TABLE ACCESS FULLFull table scan (slow if data is huge)
INDEX RANGE SCANEfficient scan using an index (range predicate like >, <, BETWEEN)
INDEX UNIQUE SCANFast lookup using unique index (e.g., primary key)
NESTED LOOPSBest for small data sets or indexed joins
HASH JOINUsed for large datasets without indexes
MERGE JOINSort + join, good when both datasets are sorted
SORT / FILTERSort or filter operations (costly if not optimized)

✅ 3. Interview Tip: How to Debug a Slow Query?

📌 Sample Interview Answer:

To debug a slow query, I would:

  1. Run EXPLAIN PLAN to understand the path taken.
  2. Look for Full Table Scans, Cartesian Joins, Sorts, or Filters.
  3. Check if indexes are being used (INDEX RANGE SCAN / INDEX UNIQUE SCAN).
  4. Use statistics update if outdated, as it affects cost-based optimizer.
  5. Rewrite subqueries / joins or use WITH (CTE) for better readability and reuse.
  6. Consider adding index hints or rewriting filters for better selectivity.

✅ 4. Index Hints (Oracle Example)

Force the optimizer to use an index:

SELECT /*+ INDEX(emp emp_idx1) */ * FROM emp WHERE deptno = 10;

Other hints:

  • FULL(emp) – force full table scan
  • USE_NL(a b) – use nested loop join
  • LEADING(emp dept) – force join order

❗Use hints only after verifying the optimizer isn’t picking the best path.


✅ 5. Types of Index Scans in Execution Plan

Scan TypeWhen UsedSpeed
Full Table ScanNo useful index or small table❌ Slow
Index Range ScanWHERE condition uses a range (>, <, etc.)✅ Fast
Index Unique ScanWHERE uses primary key or unique index✅✅ Very Fast
Index Full ScanReads entire index (no filtering)Medium
Index Skip ScanUses non-leading column of composite indexConditional

✅ 6. Sample Scenario – Optimization

🐢 Slow Query:

SELECT * FROM orders WHERE status = 'Delivered';

Assume status is not indexed, and the table has 10 million rows.

⚡ Optimized:

-- Create index
CREATE INDEX idx_status ON orders(status);

-- Rerun query, now index is used:
SELECT * FROM orders WHERE status = 'Delivered';

Use:

EXPLAIN PLAN FOR SELECT * FROM orders WHERE status = 'Delivered';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

You’ll now see: INDEX RANGE SCAN instead of TABLE ACCESS FULL.


✅ 7. Cost-Based Optimizer (CBO)

Oracle and most modern RDBMS use CBO. It evaluates multiple execution strategies and chooses the one with the lowest cost, based on:

  • Table/index statistics
  • Row counts, cardinality
  • Join order & types
  • Predicate selectivity

📌 Always make sure to run:

EXEC DBMS_STATS.GATHER_TABLE_STATS('schema', 'table_name');

After inserting/deleting large data — otherwise, execution plans might not be optimal.


✅ 8. Final Checklist to Optimize Queries

CheckpointFix
Full Table ScanAdd index or rewrite filter condition
Missing or outdated statsGather statistics
Bad join orderUse LEADING or restructure query
Cartesian joinsAdd missing join conditions
Large sort/filter stepsAdd indexes or materialized views
Non-sargable WHERE clausesAvoid functions on columns (WHERE UPPER(name) → bad for index)

✅ 9. Practice Dataset for Demo

You can simulate this in Oracle or PostgreSQL:

CREATE TABLE emp (
  emp_id INT PRIMARY KEY,
  name VARCHAR(100),
  dept_id INT,
  salary INT
);

CREATE INDEX idx_salary ON emp(salary);

EXPLAIN PLAN FOR
SELECT * FROM emp WHERE salary > 50000;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

Absolutely — this is an important and often-overlooked interview topic, especially for backend, full-stack, or data engineers involved in application database access. Here’s the full deep dive on SQL Injection & Security Best Practices with real-world examples, interview Q&A, and practical tips:


🔐 SQL Injection & Security Best Practices


🚨 What is SQL Injection?

SQL Injection is a security vulnerability that allows an attacker to interfere with the queries an application makes to its database. It can be used to:

  • View data they shouldn’t (unauthorized access)
  • Modify or delete data
  • Execute administrative operations
  • Bypass authentication
  • Exfiltrate data (data leaks)

🧪 Classic Example of SQL Injection

Let’s say a backend login form takes user input and generates SQL:

SELECT * FROM users WHERE username = 'admin' AND password = '1234';

Now an attacker enters this:

Username: ' OR 1=1 --
Password: anything

The resulting query becomes:

SELECT * FROM users WHERE username = '' OR 1=1 --' AND password = 'anything';

➡️ This returns all rows, bypassing login.


🛡️ Best Practices to Prevent SQL Injection


✅ 1. Use Parameterized Queries / Bind Variables

Language-independent, secure practice

Java (JDBC):

String query = "SELECT * FROM users WHERE username = ? AND password = ?";
PreparedStatement pstmt = conn.prepareStatement(query);
pstmt.setString(1, username);
pstmt.setString(2, password);
ResultSet rs = pstmt.executeQuery();

Python (psycopg2 / SQLite3):

cursor.execute("SELECT * FROM users WHERE username = %s AND password = %s", (username, password))

Oracle Bind Variable (PL/SQL):

SELECT * FROM employees WHERE department_id = :dept_id;

✅ Safe from SQL injection as values are treated separately from SQL logic.


✅ 2. Use ORM Tools (Hibernate, Sequelize, SQLAlchemy)

ORMs abstract raw SQL and automatically use bind variables under the hood.

# SQLAlchemy (Python)
session.query(User).filter_by(username='john', password='secret').first()

✅ 3. Avoid Dynamic SQL Unless Absolutely Necessary

Never do this:

EXECUTE IMMEDIATE 'SELECT * FROM emp WHERE empno = ' || p_empno;

Instead use:

EXECUTE IMMEDIATE 'SELECT * FROM emp WHERE empno = :1' USING p_empno;

✅ 4. Validate and Sanitize Inputs

  • Enforce strong data types
  • Use whitelisting (e.g., dropdowns or regex)
  • Avoid accepting SQL syntax as input

✅ 5. Minimize DB Permissions

  • Least Privilege Principle
  • Separate DB roles: admin vs app user
  • App should not have DROP, ALTER, or GRANT access

✅ 6. Monitor and Audit Logs

  • Enable SQL audit trails (Oracle: AUDIT command)
  • Use Web Application Firewalls (WAFs) to detect injection patterns

🧠 Interview Question + Answer


Q1. How do you prevent SQL injection in a production-grade application?

Answer:

We always use parameterized queries or bind variables through prepared statements. This separates code from data, ensuring inputs are treated as literals. We also avoid dynamic SQL unless absolutely required, validate inputs at both front-end and back-end, and follow least-privilege for database roles. If ORMs are used, we ensure all custom queries still follow the same pattern.


Q2. What if you absolutely need dynamic SQL? How do you keep it safe?

Answer:

If dynamic SQL is necessary (e.g., dynamic table names), we do not concatenate input directly. Instead, we sanitize it and still use bind variables for the parameter parts. In Oracle, we use EXECUTE IMMEDIATE with USING.


Q3. What is the difference between bind variables and hardcoded values?

Answer:

Bind variables are placeholders used in SQL that the database compiles once and executes multiple times with different values. Hardcoded values cause the SQL to be parsed every time, and if unescaped, may also expose the system to SQL injection.


💡 Real-World Use Case

Login System Code Review

Before:

cursor.execute("SELECT * FROM users WHERE username = '" + uname + "' AND password = '" + pwd + "';")

After (Secure):

cursor.execute("SELECT * FROM users WHERE username = %s AND password = %s", (uname, pwd))

🚫 Common Mistakes

MistakeExplanation
Using string concatenation in queriesOpens up to injection attacks
Blind trust in front-end validationsAttackers can bypass browser controls
Giving full DB privileges to appsIncreases damage in case of breach
Not logging failed login attemptsMisses early indicators of attacks

✅ Security Checklist for Interviews

  • Use Prepared/Parameterized Statements
  • Validate Inputs (length, type, format)
  • Minimize DB Privileges
  • Use Stored Procedures for business logic
  • Enable logging, audit trails
  • Avoid raw SQL unless absolutely necessary
  • Educate team on OWASP SQLi risks

Here’s a detailed explanation of Correlated vs Non-Correlated Subqueries — covering definition, use cases, performance, interview tips, and hands-on SQL examples with sample data.


🔍 What is a Subquery?

A subquery is a query nested inside another SQL query (main/outer query). It can be in:

  • SELECT clause
  • WHERE clause
  • FROM clause

🧠 Types of Subqueries

FeatureNon-Correlated SubqueryCorrelated Subquery
Dependency on outer query❌ Independent✅ Depends on outer query row-by-row
Execution Count✅ Once❌ Multiple times (once per outer row)
Performance✅ Faster (in general)❌ Slower (may not scale well)
Use CaseAggregates, comparisonsRow-by-row checks (e.g., existence, condition)

🧪 Sample Dataset

employees

emp_idnamedepartment_idsalary
101Alice105000
102Bob206000
103Charlie107000
104David304500
105Eva206500

✅ Example: Non-Correlated Subquery

Goal: Get all employees whose salary is greater than the average salary.

SELECT name, salary
FROM employees
WHERE salary > (
    SELECT AVG(salary) FROM employees
);

🧾 Explanation:

  • Subquery: returns a single scalar value (avg salary = 5800).
  • Outer query uses this value to filter.

🧠 Interview Tip: “Use non-correlated when your subquery returns an aggregate or independent filter value.”


🔁 Example: Correlated Subquery

Goal: Get employees whose salary is above the average salary of their department.

SELECT e.name, e.department_id, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department_id = e.department_id
);

🧾 Explanation:

  • Subquery runs once for each row in employees.
  • It calculates average salary per department, using outer reference e.department_id.

🔧 Performance Implication:

  • Can be expensive on large datasets due to row-wise execution.

🚀 Optimize Correlated Subqueries

Replace with JOIN + GROUP BY if possible:

WITH dept_avg AS (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
)
SELECT e.name, e.salary, e.department_id
FROM employees e
JOIN dept_avg d
  ON e.department_id = d.department_id
WHERE e.salary > d.avg_salary;

✅ Better performance and scalability.


🎯 Real Interview Scenario

Q: Find employees earning more than the average salary of their department. Can you do it without a correlated subquery?

🧠 Expected Answer:

  • “Yes, we can first pre-aggregate department averages and then join with the original table.”
  • Then write the WITH + JOIN version shown above.

✍ Summary

TopicNotes
Correlated SubquerySubquery depends on outer query. Slower. Good for per-row checks.
Non-Correlated SubquerySubquery independent. Faster. Used for single result filters.
Interview TipRewrite correlated subqueries using JOINs or CTEs for optimization.
SQL OptimizationAlways watch for correlated subqueries inside large datasets — they slow down execution significantly. Use EXPLAIN PLAN.

Here’s a comprehensive guide on handling JSON and XML in Oracle SQL, focusing on advanced use cases using JSON_TABLE, JSON_VALUE, XMLTABLE, etc. This is especially important for cloud-based, microservices, or NoSQL+SQL hybrid environments where structured and semi-structured data coexist.


🔷 Part 1: JSON Handling in Oracle SQL

Oracle supports native JSON data starting from 12c. You can store JSON in VARCHAR2, CLOB, or the new JSON data type (from Oracle 21c onwards).

✅ 1. Extract Single Value: JSON_VALUE

Purpose: Extract a scalar (string, number, boolean) value.

SELECT JSON_VALUE('{"id": 101, "name": "Rajeev"}', '$.name') AS name
FROM dual;
-- Output: Rajeev

Supports options like:

JSON_VALUE(json_column, '$.price' RETURNING NUMBER)

✅ 2. Extract JSON Object/Array as Text: JSON_QUERY

Purpose: Extract non-scalar JSON (objects or arrays) as text.

SELECT JSON_QUERY('{"id": 1, "skills": ["SQL", "Python"]}', '$.skills')
FROM dual;
-- Output: ["SQL", "Python"]

✅ 3. Extract Multiple Rows: JSON_TABLE

Purpose: Convert JSON arrays/objects into relational rows and columns.

SELECT *
FROM JSON_TABLE(
  '{
     "employees": [
        {"id": 1, "name": "Raj"},
        {"id": 2, "name": "Amit"}
     ]
   }',
   '$.employees[*]'
   COLUMNS (
     emp_id   NUMBER  PATH '$.id',
     emp_name VARCHAR2(50) PATH '$.name'
   )
);

📌 Think of JSON_TABLE as unpivoting JSON arrays into relational rows.


✅ 4. Filter JSON Data: IS JSON / JSON_EXISTS

-- Validate a JSON column
SELECT *
FROM employees
WHERE json_col IS JSON;

-- Filter only if a JSON path exists
SELECT *
FROM orders
WHERE JSON_EXISTS(json_col, '$.customer.address.zip');

🔷 Part 2: XML Handling in Oracle SQL


✅ 1. Extract Single Value: EXTRACTVALUE (deprecated), use XMLQUERY / XMLTABLE

SELECT XMLTYPE('<person><name>Rajeev</name></person>').EXTRACT('//name/text()').getStringVal()
FROM dual;

✅ 2. Use XMLTABLE for Relational Output

Example: Parse XML and turn into rows:

SELECT *
FROM XMLTABLE(
  '/employees/employee'
  PASSING XMLTYPE('
    <employees>
      <employee><id>1</id><name>Raj</name></employee>
      <employee><id>2</id><name>Amit</name></employee>
    </employees>'
  )
  COLUMNS
    emp_id   NUMBER       PATH 'id',
    emp_name VARCHAR2(50) PATH 'name'
);

🔷 Advanced Use Cases (Hybrid / Real-World)

🔸1. Nested JSON with JSON_TABLE

SELECT *
FROM JSON_TABLE(
  '{
    "orders": [
      {"id": 1, "items": [{"product": "Pen", "qty": 10}, {"product": "Book", "qty": 5}]},
      {"id": 2, "items": [{"product": "Pencil", "qty": 3}]}
    ]
  }',
  '$.orders[*]'
  COLUMNS (
    order_id NUMBER PATH '$.id',
    NESTED PATH '$.items[*]'
    COLUMNS (
      product_name VARCHAR2(50) PATH '$.product',
      quantity     NUMBER       PATH '$.qty'
    )
  )
);

🔸2. XMLTABLE with Nested Structures

SELECT *
FROM XMLTABLE(
  '/orders/order'
  PASSING XMLTYPE('
    <orders>
      <order>
        <id>1</id>
        <items>
          <item><product>Pen</product><qty>10</qty></item>
          <item><product>Book</product><qty>5</qty></item>
        </items>
      </order>
    </orders>'
  )
  COLUMNS
    order_id  NUMBER PATH 'id',
    items XMLTYPE PATH 'items'
) o,
XMLTABLE(
  '/items/item'
  PASSING o.items
  COLUMNS
    product VARCHAR2(50) PATH 'product',
    qty     NUMBER       PATH 'qty'
);

🔷 Real Interview Scenarios

Use CaseSQL FeatureDescription
Convert JSON API response to tableJSON_TABLEFor ETL on REST data
Validate JSON schemaIS JSON, JSON_EXISTSCheck for mandatory keys
Flatten XML invoice feedXMLTABLECommon in legacy systems
Store dynamic attributesJSON/XML columnSemi-structured, searchable
Hybrid system (RDBMS + NoSQL)JSON + SQLUse JSON_QUERY, JSON_VALUE

🔷 Summary Cheat Sheet

FeatureJSONXML
Single valueJSON_VALUEEXTRACTVALUE (old), XMLQUERY
Full objectJSON_QUERYXMLTYPE, EXTRACT()
Rows/tablesJSON_TABLEXMLTABLE
ValidateIS JSON, JSON_EXISTSXMLIsWellFormed()
Nested dataNested JSON_TABLEJoin XMLTABLE results

🔷 Bonus: Indexing JSON/XML Columns

-- JSON index (Oracle 12.2+)
CREATE INDEX idx_orders_json ON orders(json_col)
  INDEXTYPE IS JSON;

-- Functional index
CREATE INDEX idx_customer_city ON customers(JSON_VALUE(json_col, '$.address.city'));

Absolutely — the MERGE statement (aka Upsert) is critical in ETL, data warehousing, slowly changing dimensions (SCD), and CDC (Change Data Capture) logic.


🔷 MERGE INTO: Syntax & Core Concept

The MERGE statement performs:

  • UPDATE if the record exists (matched),
  • INSERT if it doesn’t exist (not matched).

✅ Basic Syntax:

MERGE INTO target_table t
USING source_table s
ON (t.id = s.id)
WHEN MATCHED THEN
  UPDATE SET t.name = s.name, t.salary = s.salary
WHEN NOT MATCHED THEN
  INSERT (id, name, salary)
  VALUES (s.id, s.name, s.salary);

📌 Think of MERGE as:

IF EXISTS (match)
  THEN UPDATE
ELSE
  INSERT

🔷 Practical Example

🧱 Tables:

-- Target Table (e.g. Data Warehouse Table)
CREATE TABLE emp_dw (
  emp_id   NUMBER PRIMARY KEY,
  emp_name VARCHAR2(100),
  salary   NUMBER
);

-- Source Table (e.g. Staging Table)
CREATE TABLE emp_stage (
  emp_id   NUMBER,
  emp_name VARCHAR2(100),
  salary   NUMBER
);

🧪 Data:

-- Data in target (DW)
1, 'Raj', 50000

-- Data in source (Stage)
1, 'Rajeev', 55000   -- should update
2, 'Amit', 40000     -- should insert

🔁 MERGE Logic:

MERGE INTO emp_dw dw
USING emp_stage stg
ON (dw.emp_id = stg.emp_id)
WHEN MATCHED THEN
  UPDATE SET dw.emp_name = stg.emp_name,
             dw.salary = stg.salary
WHEN NOT MATCHED THEN
  INSERT (emp_id, emp_name, salary)
  VALUES (stg.emp_id, stg.emp_name, stg.salary);

🔷 Advanced Use Cases


🔸1. Add Filters (e.g., Update Only If Salary Changes)

WHEN MATCHED AND dw.salary != stg.salary THEN
  UPDATE SET dw.salary = stg.salary

🔸2. Insert Only Select Columns

INSERT (emp_id, salary) VALUES (stg.emp_id, stg.salary)

🔸3. SCD Type 1 (Overwrite Changes)

MERGE INTO dim_customer d
USING stg_customer s
ON (d.cust_id = s.cust_id)
WHEN MATCHED THEN
  UPDATE SET d.name = s.name, d.city = s.city
WHEN NOT MATCHED THEN
  INSERT (cust_id, name, city)
  VALUES (s.cust_id, s.name, s.city);

🔸4. SCD Type 2 (Preserve History)

SCD Type 2 needs additional logic (valid_from, valid_to, is_current). MERGE is used only for insert new records and expire old ones via UPDATE.


🔸5. UPSERT from JSON or API into Table (Oracle 21c+ JSON)

MERGE INTO customer t
USING JSON_TABLE(:json_payload, '$[*]' 
       COLUMNS (id PATH '$.id', name PATH '$.name')) s
ON (t.id = s.id)
WHEN MATCHED THEN
  UPDATE SET t.name = s.name
WHEN NOT MATCHED THEN
  INSERT (id, name) VALUES (s.id, s.name);

🔷 Interview Tip

💬 “How do you perform UPSERT in Oracle SQL?”

Answer:

Using the MERGE statement. It compares rows between a target and source using a join condition. If a match is found, it updates; otherwise, it inserts. This is commonly used in ETL pipelines and data warehouse dimensions.


🔷 Alternatives (Non-MERGE)

In some environments (e.g., Snowflake, PostgreSQL), you may use:

DBMSAlternative
OracleMERGE
PostgreSQLINSERT ... ON CONFLICT DO UPDATE
MySQLINSERT ... ON DUPLICATE KEY UPDATE
SQL ServerMERGE (with caution, has edge cases)
SnowflakeMERGE
BigQueryMERGE

✅ Summary: MERGE Highlights

ScenarioMERGE Feature
UPSERT Logic✔️
ETL Staging✔️
Data Warehouse Dims (SCD)✔️
PerformanceEfficient on indexed joins
Conditional UpdatesYes via WHEN MATCHED AND ...
JSON PayloadsYes (from 21c using JSON_TABLE)

Here’s a crisp yet deep dive into Partitioning Techniques — particularly for Oracle, BigQuery, and Redshift — focusing on Range, Hash, List partitioning, pruning, sub-partitioning, and indexing strategies.


🧠 Why Partitioning Matters

Partitioning improves:

  • Query performance (via partition pruning)
  • Data management (purging, archiving)
  • Parallelism and load balancing
  • Scalability for large datasets

🔷 Types of Partitioning

TypeDescriptionBest For
🔢 RangeBased on ranges (e.g. date, salary)Time-series, archiving
#️⃣ HashBased on a hash functionUniform distribution
📋 ListBased on discrete values (e.g. region=’US’)Categorical data
📊 CompositeCombination (e.g., Range + Hash)Complex use cases

🔷 1. Oracle Partitioning

Oracle supports all types: Range, Hash, List, and Composite (Sub-Partitioning)


✅ a. Range Partition Example

CREATE TABLE sales (
  sale_id     NUMBER,
  sale_date   DATE,
  amount      NUMBER
)
PARTITION BY RANGE (sale_date) (
  PARTITION p2023q1 VALUES LESS THAN (TO_DATE('2023-04-01', 'YYYY-MM-DD')),
  PARTITION p2023q2 VALUES LESS THAN (TO_DATE('2023-07-01', 'YYYY-MM-DD'))
);

✅ b. Hash Partition

CREATE TABLE users (
  user_id NUMBER,
  name    VARCHAR2(100)
)
PARTITION BY HASH(user_id)
PARTITIONS 4;

✅ c. List Partition

CREATE TABLE customer_data (
  cust_id    NUMBER,
  region     VARCHAR2(10)
)
PARTITION BY LIST (region) (
  PARTITION p_north VALUES ('NORTH'),
  PARTITION p_south VALUES ('SOUTH')
);

✅ d. Composite Partition (Range + Hash)

CREATE TABLE orders (
  order_id   NUMBER,
  order_date DATE,
  cust_id    NUMBER
)
PARTITION BY RANGE (order_date)
SUBPARTITION BY HASH (cust_id)
SUBPARTITIONS 4
(
  PARTITION p2024 VALUES LESS THAN (TO_DATE('2025-01-01', 'YYYY-MM-DD')),
  PARTITION p2025 VALUES LESS THAN (MAXVALUE)
);

⚡ Oracle Partition Pruning

Oracle automatically skips partitions if WHERE clause filters allow it.

SELECT * FROM sales WHERE sale_date >= DATE '2023-01-01';
-- Prunes all irrelevant partitions

You can verify with:

EXPLAIN PLAN FOR ...
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

🔷 2. BigQuery Partitioning

BigQuery supports:

  • Ingestion time partitioning
  • Partitioned Tables (by column)
  • Clustering (secondary filtering)

✅ a. Partition by DATE or TIMESTAMP

CREATE TABLE sales_partitioned (
  sale_id INT64,
  sale_date DATE
)
PARTITION BY sale_date;

✅ b. Pseudo Hash via Clustering (No true HASH partition)

CREATE TABLE logs_partitioned
PARTITION BY DATE(log_time)
CLUSTER BY user_id;

📌 Clustering helps optimize filtering/sorting, especially when partitioning alone isn’t enough.


⚡ Partition Pruning in BigQuery

-- This is efficient (partition prune)
SELECT * FROM sales_partitioned
WHERE sale_date = '2023-01-01';

-- This disables pruning (bad):
WHERE EXTRACT(YEAR FROM sale_date) = 2023;

🔷 3. Redshift Partitioning (via Distribution + Sort Keys)

Redshift doesn’t support traditional partitioning, but uses:

  • Distribution Keys (DISTKEY) = like HASH partition
  • Sort Keys = like Range partition

✅ a. Hash Distribution

CREATE TABLE sales (
  sale_id INT,
  customer_id INT,
  sale_date DATE
)
DISTSTYLE KEY
DISTKEY(customer_id);

📌 Ensures all same customer data goes to the same node.


✅ b. Range-like Behavior: SORTKEY

CREATE TABLE sales (
  sale_id INT,
  sale_date DATE
)
SORTKEY(sale_date);

📌 Makes range queries fast (BETWEEN, >, <, etc.)


⚡ Redshift Best Practices

ScenarioStrategy
Large joinsUse same DISTKEY
Time-series dataUse SORTKEY(date)
Small lookup tablesUse DISTSTYLE ALL (replicated)

🔷 Indexing with Partitioning

DBStrategy
OracleLocal indexes (partition-wise) are common; global indexes work too
BigQueryNo indexes; relies on partition + clustering
RedshiftNo indexes; sortkey + distkey act as physical optimizers

🔷 Summary Table

FeatureOracleBigQueryRedshift
Range Partitioning✅ Yes✅ Yes❌ (Sort Key only)
Hash Partitioning✅ Yes❌ (Cluster-only)✅ (DISTKEY)
List Partitioning✅ Yes
Composite (Subpart)✅ Yes
Partition Pruning✅ Automatic✅ Column-aware❌ (Handled via sortkey)
Index Support✅ (Local/Global)❌ (None)❌ (None)

💬 Interview Insight

“How would you partition a multi-billion row table in Oracle/BigQuery?”

Answer:

I’d use range partitioning on a time column (e.g., sale_date), and if further optimization is needed, apply sub-partitioning (e.g., hash on customer_id). This supports fast partition pruning, parallel processing, and easy purging of old partitions.


Excellent! You’ve now covered a complete, enterprise-grade SQL interview roadmap.

Below is a clean summary of points 7–10 with quick interview insights, real use cases, and tips — ideal for last-minute revision or documentation.


7. Temporal Tables (System-Versioned Tables)

📌 What it is:
A table that automatically tracks historical changes — records “when” each row was valid.

🧠 Core Concepts:

  • Two timestamps: valid_from, valid_to
  • Common in auditing, slowly changing dimensions (SCD2), and regulatory logs
  • Also known as system-versioned tables

📌 Supported in:
SQL Server (SYSTEM_VERSIONING = ON), Oracle (via custom triggers or Flashback), PostgreSQL (manual), BigQuery (using FOR SYSTEM_TIME AS OF)

✅ Use Case:

-- Query as of a previous point in time
SELECT * FROM employees
FOR SYSTEM_TIME AS OF TIMESTAMP '2024-12-31 23:59:59';

💡 Interview Tip:
Be ready to simulate temporal behavior manually if the RDBMS doesn’t support it natively (e.g., keep history tables, use triggers).


8. Hierarchical Queries

📌 What it is:
Used to model tree structures like org charts, category hierarchies, file systems, etc.

🧠 Syntax Across RDBMS:

DBSyntax
OracleCONNECT BY PRIOR + LEVEL
PostgreSQL / MySQL 8+WITH RECURSIVE CTE
SQL ServerWITH RECURSIVE (same as above)

✅ Example (PostgreSQL):

WITH RECURSIVE org_chart AS (
  SELECT emp_id, manager_id, emp_name, 1 AS level
  FROM employees
  WHERE manager_id IS NULL
  UNION ALL
  SELECT e.emp_id, e.manager_id, e.emp_name, oc.level + 1
  FROM employees e
  JOIN org_chart oc ON e.manager_id = oc.emp_id
)
SELECT * FROM org_chart;

💡 Interview Tip:
Interviewers love this for BOM (Bill of Materials) questions. Know both Oracle (CONNECT BY) and CTE style.


9. Role of SQL in Data Pipelines & Orchestration

📌 Context:
SQL powers:

  • Extract: Pulling data from source systems
  • Transform: Cleansing, joins, validations
  • Load: Inserting into reporting/marts

💡 Orchestration Context:

  • Airflow DAGs: SQL used in operators like PostgresOperator, BigQueryOperator
  • Spark SQL Jobs: SQL queries over DataFrames or Hive tables
  • Validation Checks: Row counts, null checks, duplicates, reconciliation

✅ Sample Validation SQL:

-- Check record mismatch
SELECT COUNT(*) FROM src_table
MINUS
SELECT COUNT(*) FROM target_table;

💡 Interview Tip:
Be ready to explain how you used SQL for quality checks or task dependencies in orchestration.


10. SQL Coding Best Practices

📌 Why Important:
Clean SQL → Better maintenance → Fewer bugs in pipelines/reports

✅ Best Practices Summary:

AspectBest Practice
NamingUse snake_case or camelCase, meaningful aliases
NULLsUse COALESCE, beware of NULL != NULL, handle CASE WHEN col IS NULL explicitly
CastingAlways cast explicitly in calculations
JoinsUse aliasing and qualify columns: a.id = b.id
ReusabilityUse CTEs (WITH) for modular, readable SQL
PerformanceAvoid SELECT *, use indexed columns in filters, monitor query plans
CommentsAlways explain logic in complex CTEs or subqueries

💡 Interview Tip:
They often ask you to “optimize” a given messy SQL. Follow these rules and explain your cleanup rationale.


Pages: 1 2 3 4 5

Leave a Reply

Your email address will not be published. Required fields are marked *