SQL Tutorial for Interviews- Types of SQL /Spark SQL commands- DDL, DML, TCL, CRUD in SQL

Great — let’s deep dive into “Execution Plan & Query Optimization” in an interview-ready format, with simple examples, key concepts, and ready answers to common questions.

✅ 1. What is an Execution Plan?

An Execution Plan is the step-by-step strategy used by the SQL engine to retrieve data. It shows how a query will run: which tables it touches, in what order, whether indexes are used, what joins it performs, etc.

You get the plan using:

EXPLAIN PLAN FOR <your-query>;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
-- or just
EXPLAIN <your-query>; -- in MySQL, PostgreSQL

✅ 2. Important Components of Execution Plan

Term	Meaning
TABLE ACCESS FULL	Full table scan (slow if data is huge)
INDEX RANGE SCAN	Efficient scan using an index (range predicate like `>`, `<`, `BETWEEN`)
INDEX UNIQUE SCAN	Fast lookup using unique index (e.g., primary key)
NESTED LOOPS	Best for small data sets or indexed joins
HASH JOIN	Used for large datasets without indexes
MERGE JOIN	Sort + join, good when both datasets are sorted
SORT / FILTER	Sort or filter operations (costly if not optimized)

✅ 3. Interview Tip: How to Debug a Slow Query?

📌 Sample Interview Answer:

To debug a slow query, I would:
Run EXPLAIN PLAN to understand the path taken.
Look for Full Table Scans, Cartesian Joins, Sorts, or Filters.
Check if indexes are being used (INDEX RANGE SCAN / INDEX UNIQUE SCAN).
Use statistics update if outdated, as it affects cost-based optimizer.
Rewrite subqueries / joins or use WITH (CTE) for better readability and reuse.
Consider adding index hints or rewriting filters for better selectivity.

✅ 4. Index Hints (Oracle Example)

Force the optimizer to use an index:

SELECT /*+ INDEX(emp emp_idx1) */ * FROM emp WHERE deptno = 10;

Other hints:

FULL(emp) – force full table scan
USE_NL(a b) – use nested loop join
LEADING(emp dept) – force join order

❗Use hints only after verifying the optimizer isn’t picking the best path.

✅ 5. Types of Index Scans in Execution Plan

Scan Type	When Used	Speed
Full Table Scan	No useful index or small table	❌ Slow
Index Range Scan	WHERE condition uses a range (`>`, `<`, etc.)	✅ Fast
Index Unique Scan	WHERE uses primary key or unique index	✅✅ Very Fast
Index Full Scan	Reads entire index (no filtering)	Medium
Index Skip Scan	Uses non-leading column of composite index	Conditional

✅ 6. Sample Scenario – Optimization

🐢 Slow Query:

SELECT * FROM orders WHERE status = 'Delivered';

Assume status is not indexed, and the table has 10 million rows.

⚡ Optimized:

-- Create index
CREATE INDEX idx_status ON orders(status);

-- Rerun query, now index is used:
SELECT * FROM orders WHERE status = 'Delivered';

Use:

EXPLAIN PLAN FOR SELECT * FROM orders WHERE status = 'Delivered';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

You’ll now see: INDEX RANGE SCAN instead of TABLE ACCESS FULL.

✅ 7. Cost-Based Optimizer (CBO)

Oracle and most modern RDBMS use CBO. It evaluates multiple execution strategies and chooses the one with the lowest cost, based on:

Table/index statistics
Row counts, cardinality
Join order & types
Predicate selectivity

📌 Always make sure to run:

EXEC DBMS_STATS.GATHER_TABLE_STATS('schema', 'table_name');

After inserting/deleting large data — otherwise, execution plans might not be optimal.

✅ 8. Final Checklist to Optimize Queries

Checkpoint	Fix
Full Table Scan	Add index or rewrite filter condition
Missing or outdated stats	Gather statistics
Bad join order	Use `LEADING` or restructure query
Cartesian joins	Add missing join conditions
Large sort/filter steps	Add indexes or materialized views
Non-sargable WHERE clauses	Avoid functions on columns (`WHERE UPPER(name)` → bad for index)

✅ 9. Practice Dataset for Demo

You can simulate this in Oracle or PostgreSQL:

CREATE TABLE emp (
  emp_id INT PRIMARY KEY,
  name VARCHAR(100),
  dept_id INT,
  salary INT
);

CREATE INDEX idx_salary ON emp(salary);

EXPLAIN PLAN FOR
SELECT * FROM emp WHERE salary > 50000;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

Absolutely — this is an important and often-overlooked interview topic, especially for backend, full-stack, or data engineers involved in application database access. Here’s the full deep dive on SQL Injection & Security Best Practices with real-world examples, interview Q&A, and practical tips:

🔐 SQL Injection & Security Best Practices

🚨 What is SQL Injection?

SQL Injection is a security vulnerability that allows an attacker to interfere with the queries an application makes to its database. It can be used to:

View data they shouldn’t (unauthorized access)
Modify or delete data
Execute administrative operations
Bypass authentication
Exfiltrate data (data leaks)

🧪 Classic Example of SQL Injection

Let’s say a backend login form takes user input and generates SQL:

SELECT * FROM users WHERE username = 'admin' AND password = '1234';

Now an attacker enters this:

Username: ' OR 1=1 --
Password: anything

The resulting query becomes:

SELECT * FROM users WHERE username = '' OR 1=1 --' AND password = 'anything';

➡️ This returns all rows, bypassing login.

🛡️ Best Practices to Prevent SQL Injection

✅ 1. Use Parameterized Queries / Bind Variables

Language-independent, secure practice

Java (JDBC):

String query = "SELECT * FROM users WHERE username = ? AND password = ?";
PreparedStatement pstmt = conn.prepareStatement(query);
pstmt.setString(1, username);
pstmt.setString(2, password);
ResultSet rs = pstmt.executeQuery();

Python (psycopg2 / SQLite3):

cursor.execute("SELECT * FROM users WHERE username = %s AND password = %s", (username, password))

Oracle Bind Variable (PL/SQL):

SELECT * FROM employees WHERE department_id = :dept_id;

✅ Safe from SQL injection as values are treated separately from SQL logic.

✅ 2. Use ORM Tools (Hibernate, Sequelize, SQLAlchemy)

ORMs abstract raw SQL and automatically use bind variables under the hood.

# SQLAlchemy (Python)
session.query(User).filter_by(username='john', password='secret').first()

✅ 3. Avoid Dynamic SQL Unless Absolutely Necessary

Never do this:

EXECUTE IMMEDIATE 'SELECT * FROM emp WHERE empno = ' || p_empno;

Instead use:

EXECUTE IMMEDIATE 'SELECT * FROM emp WHERE empno = :1' USING p_empno;

✅ 4. Validate and Sanitize Inputs

Enforce strong data types
Use whitelisting (e.g., dropdowns or regex)
Avoid accepting SQL syntax as input

✅ 5. Minimize DB Permissions

Least Privilege Principle
Separate DB roles: admin vs app user
App should not have DROP, ALTER, or GRANT access

✅ 6. Monitor and Audit Logs

Enable SQL audit trails (Oracle: AUDIT command)
Use Web Application Firewalls (WAFs) to detect injection patterns

🧠 Interview Question + Answer

Q1. How do you prevent SQL injection in a production-grade application?

✅ Answer:

We always use parameterized queries or bind variables through prepared statements. This separates code from data, ensuring inputs are treated as literals. We also avoid dynamic SQL unless absolutely required, validate inputs at both front-end and back-end, and follow least-privilege for database roles. If ORMs are used, we ensure all custom queries still follow the same pattern.

Q2. What if you absolutely need dynamic SQL? How do you keep it safe?

✅ Answer:

If dynamic SQL is necessary (e.g., dynamic table names), we do not concatenate input directly. Instead, we sanitize it and still use bind variables for the parameter parts. In Oracle, we use EXECUTE IMMEDIATE with USING.

Q3. What is the difference between bind variables and hardcoded values?

✅ Answer:

Bind variables are placeholders used in SQL that the database compiles once and executes multiple times with different values. Hardcoded values cause the SQL to be parsed every time, and if unescaped, may also expose the system to SQL injection.

💡 Real-World Use Case

Login System Code Review

Before:

cursor.execute("SELECT * FROM users WHERE username = '" + uname + "' AND password = '" + pwd + "';")

After (Secure):

cursor.execute("SELECT * FROM users WHERE username = %s AND password = %s", (uname, pwd))

🚫 Common Mistakes

Mistake	Explanation
Using string concatenation in queries	Opens up to injection attacks
Blind trust in front-end validations	Attackers can bypass browser controls
Giving full DB privileges to apps	Increases damage in case of breach
Not logging failed login attempts	Misses early indicators of attacks

✅ Security Checklist for Interviews

Use Prepared/Parameterized Statements
Validate Inputs (length, type, format)
Minimize DB Privileges
Use Stored Procedures for business logic
Enable logging, audit trails
Avoid raw SQL unless absolutely necessary
Educate team on OWASP SQLi risks

Here’s a detailed explanation of Correlated vs Non-Correlated Subqueries — covering definition, use cases, performance, interview tips, and hands-on SQL examples with sample data.

🔍 What is a Subquery?

A subquery is a query nested inside another SQL query (main/outer query). It can be in:

SELECT clause
WHERE clause
FROM clause

🧠 Types of Subqueries

Feature	Non-Correlated Subquery	Correlated Subquery
Dependency on outer query	❌ Independent	✅ Depends on outer query row-by-row
Execution Count	✅ Once	❌ Multiple times (once per outer row)
Performance	✅ Faster (in general)	❌ Slower (may not scale well)
Use Case	Aggregates, comparisons	Row-by-row checks (e.g., existence, condition)

🧪 Sample Dataset

`employees`

emp_id	name	department_id	salary
101	Alice	10	5000
102	Bob	20	6000
103	Charlie	10	7000
104	David	30	4500
105	Eva	20	6500

✅ Example: Non-Correlated Subquery

Goal: Get all employees whose salary is greater than the average salary.

SELECT name, salary
FROM employees
WHERE salary > (
    SELECT AVG(salary) FROM employees
);

🧾 Explanation:

Subquery: returns a single scalar value (avg salary = 5800).
Outer query uses this value to filter.

🧠 Interview Tip: “Use non-correlated when your subquery returns an aggregate or independent filter value.”

🔁 Example: Correlated Subquery

Goal: Get employees whose salary is above the average salary of their department.

SELECT e.name, e.department_id, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department_id = e.department_id
);

🧾 Explanation:

Subquery runs once for each row in employees.
It calculates average salary per department, using outer reference e.department_id.

🔧 Performance Implication:

Can be expensive on large datasets due to row-wise execution.

🚀 Optimize Correlated Subqueries

Replace with JOIN + GROUP BY if possible:

WITH dept_avg AS (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
)
SELECT e.name, e.salary, e.department_id
FROM employees e
JOIN dept_avg d
  ON e.department_id = d.department_id
WHERE e.salary > d.avg_salary;

✅ Better performance and scalability.

🎯 Real Interview Scenario

Q: Find employees earning more than the average salary of their department. Can you do it without a correlated subquery?

🧠 Expected Answer:

“Yes, we can first pre-aggregate department averages and then join with the original table.”
Then write the WITH + JOIN version shown above.

✍ Summary

Topic	Notes
Correlated Subquery	Subquery depends on outer query. Slower. Good for per-row checks.
Non-Correlated Subquery	Subquery independent. Faster. Used for single result filters.
Interview Tip	Rewrite correlated subqueries using JOINs or CTEs for optimization.
SQL Optimization	Always watch for correlated subqueries inside large datasets — they slow down execution significantly. Use `EXPLAIN PLAN`.

Here’s a comprehensive guide on handling JSON and XML in Oracle SQL, focusing on advanced use cases using JSON_TABLE, JSON_VALUE, XMLTABLE, etc. This is especially important for cloud-based, microservices, or NoSQL+SQL hybrid environments where structured and semi-structured data coexist.

🔷 Part 1: JSON Handling in Oracle SQL

Oracle supports native JSON data starting from 12c. You can store JSON in VARCHAR2, CLOB, or the new JSON data type (from Oracle 21c onwards).

✅ 1. Extract Single Value: `JSON_VALUE`

Purpose: Extract a scalar (string, number, boolean) value.

SELECT JSON_VALUE('{"id": 101, "name": "Rajeev"}', '$.name') AS name
FROM dual;
-- Output: Rajeev

Supports options like:

JSON_VALUE(json_column, '$.price' RETURNING NUMBER)

✅ 2. Extract JSON Object/Array as Text: `JSON_QUERY`

Purpose: Extract non-scalar JSON (objects or arrays) as text.

SELECT JSON_QUERY('{"id": 1, "skills": ["SQL", "Python"]}', '$.skills')
FROM dual;
-- Output: ["SQL", "Python"]

✅ 3. Extract Multiple Rows: `JSON_TABLE`

Purpose: Convert JSON arrays/objects into relational rows and columns.

SELECT *
FROM JSON_TABLE(
  '{
     "employees": [
        {"id": 1, "name": "Raj"},
        {"id": 2, "name": "Amit"}
     ]
   }',
   '$.employees[*]'
   COLUMNS (
     emp_id   NUMBER  PATH '$.id',
     emp_name VARCHAR2(50) PATH '$.name'
   )
);

📌 Think of JSON_TABLE as unpivoting JSON arrays into relational rows.

✅ 4. Filter JSON Data: `IS JSON` / `JSON_EXISTS`

-- Validate a JSON column
SELECT *
FROM employees
WHERE json_col IS JSON;

-- Filter only if a JSON path exists
SELECT *
FROM orders
WHERE JSON_EXISTS(json_col, '$.customer.address.zip');

🔷 Part 2: XML Handling in Oracle SQL

✅ 1. Extract Single Value: `EXTRACTVALUE` (deprecated), use `XMLQUERY` / `XMLTABLE`

SELECT XMLTYPE('<person><name>Rajeev</name></person>').EXTRACT('//name/text()').getStringVal()
FROM dual;

✅ 2. Use `XMLTABLE` for Relational Output

Example: Parse XML and turn into rows:

SELECT *
FROM XMLTABLE(
  '/employees/employee'
  PASSING XMLTYPE('
    <employees>
      <employee><id>1</id><name>Raj</name></employee>
      <employee><id>2</id><name>Amit</name></employee>
    </employees>'
  )
  COLUMNS
    emp_id   NUMBER       PATH 'id',
    emp_name VARCHAR2(50) PATH 'name'
);

🔷 Advanced Use Cases (Hybrid / Real-World)

🔸1. Nested JSON with `JSON_TABLE`

SELECT *
FROM JSON_TABLE(
  '{
    "orders": [
      {"id": 1, "items": [{"product": "Pen", "qty": 10}, {"product": "Book", "qty": 5}]},
      {"id": 2, "items": [{"product": "Pencil", "qty": 3}]}
    ]
  }',
  '$.orders[*]'
  COLUMNS (
    order_id NUMBER PATH '$.id',
    NESTED PATH '$.items[*]'
    COLUMNS (
      product_name VARCHAR2(50) PATH '$.product',
      quantity     NUMBER       PATH '$.qty'
    )
  )
);

🔸2. XMLTABLE with Nested Structures

SELECT *
FROM XMLTABLE(
  '/orders/order'
  PASSING XMLTYPE('
    <orders>
      <order>
        <id>1</id>
        <items>
          <item><product>Pen</product><qty>10</qty></item>
          <item><product>Book</product><qty>5</qty></item>
        </items>
      </order>
    </orders>'
  )
  COLUMNS
    order_id  NUMBER PATH 'id',
    items XMLTYPE PATH 'items'
) o,
XMLTABLE(
  '/items/item'
  PASSING o.items
  COLUMNS
    product VARCHAR2(50) PATH 'product',
    qty     NUMBER       PATH 'qty'
);

🔷 Real Interview Scenarios

Use Case	SQL Feature	Description
Convert JSON API response to table	`JSON_TABLE`	For ETL on REST data
Validate JSON schema	`IS JSON`, `JSON_EXISTS`	Check for mandatory keys
Flatten XML invoice feed	`XMLTABLE`	Common in legacy systems
Store dynamic attributes	JSON/XML column	Semi-structured, searchable
Hybrid system (RDBMS + NoSQL)	JSON + SQL	Use `JSON_QUERY`, `JSON_VALUE`

🔷 Summary Cheat Sheet

Feature	JSON	XML
Single value	`JSON_VALUE`	`EXTRACTVALUE` (old), `XMLQUERY`
Full object	`JSON_QUERY`	`XMLTYPE`, `EXTRACT()`
Rows/tables	`JSON_TABLE`	`XMLTABLE`
Validate	`IS JSON`, `JSON_EXISTS`	`XMLIsWellFormed()`
Nested data	Nested `JSON_TABLE`	Join `XMLTABLE` results

🔷 Bonus: Indexing JSON/XML Columns

-- JSON index (Oracle 12.2+)
CREATE INDEX idx_orders_json ON orders(json_col)
  INDEXTYPE IS JSON;

-- Functional index
CREATE INDEX idx_customer_city ON customers(JSON_VALUE(json_col, '$.address.city'));

Absolutely — the MERGE statement (aka Upsert) is critical in ETL, data warehousing, slowly changing dimensions (SCD), and CDC (Change Data Capture) logic.

🔷 `MERGE INTO`: Syntax & Core Concept

The MERGE statement performs:

UPDATE if the record exists (matched),
INSERT if it doesn’t exist (not matched).

✅ Basic Syntax:

MERGE INTO target_table t
USING source_table s
ON (t.id = s.id)
WHEN MATCHED THEN
  UPDATE SET t.name = s.name, t.salary = s.salary
WHEN NOT MATCHED THEN
  INSERT (id, name, salary)
  VALUES (s.id, s.name, s.salary);

📌 Think of MERGE as:

IF EXISTS (match)
  THEN UPDATE
ELSE
  INSERT

🔷 Practical Example

🧱 Tables:

-- Target Table (e.g. Data Warehouse Table)
CREATE TABLE emp_dw (
  emp_id   NUMBER PRIMARY KEY,
  emp_name VARCHAR2(100),
  salary   NUMBER
);

-- Source Table (e.g. Staging Table)
CREATE TABLE emp_stage (
  emp_id   NUMBER,
  emp_name VARCHAR2(100),
  salary   NUMBER
);

🧪 Data:

-- Data in target (DW)
1, 'Raj', 50000

-- Data in source (Stage)
1, 'Rajeev', 55000   -- should update
2, 'Amit', 40000     -- should insert

🔁 MERGE Logic:

MERGE INTO emp_dw dw
USING emp_stage stg
ON (dw.emp_id = stg.emp_id)
WHEN MATCHED THEN
  UPDATE SET dw.emp_name = stg.emp_name,
             dw.salary = stg.salary
WHEN NOT MATCHED THEN
  INSERT (emp_id, emp_name, salary)
  VALUES (stg.emp_id, stg.emp_name, stg.salary);

🔷 Advanced Use Cases

🔸1. Add Filters (e.g., Update Only If Salary Changes)

WHEN MATCHED AND dw.salary != stg.salary THEN
  UPDATE SET dw.salary = stg.salary

🔸2. Insert Only Select Columns

INSERT (emp_id, salary) VALUES (stg.emp_id, stg.salary)

🔸3. SCD Type 1 (Overwrite Changes)

MERGE INTO dim_customer d
USING stg_customer s
ON (d.cust_id = s.cust_id)
WHEN MATCHED THEN
  UPDATE SET d.name = s.name, d.city = s.city
WHEN NOT MATCHED THEN
  INSERT (cust_id, name, city)
  VALUES (s.cust_id, s.name, s.city);

🔸4. SCD Type 2 (Preserve History)

SCD Type 2 needs additional logic (valid_from, valid_to, is_current). MERGE is used only for insert new records and expire old ones via UPDATE.

🔸5. UPSERT from JSON or API into Table (Oracle 21c+ JSON)

MERGE INTO customer t
USING JSON_TABLE(:json_payload, '$[*]' 
       COLUMNS (id PATH '$.id', name PATH '$.name')) s
ON (t.id = s.id)
WHEN MATCHED THEN
  UPDATE SET t.name = s.name
WHEN NOT MATCHED THEN
  INSERT (id, name) VALUES (s.id, s.name);

🔷 Interview Tip

💬 “How do you perform UPSERT in Oracle SQL?”

Answer:

Using the MERGE statement. It compares rows between a target and source using a join condition. If a match is found, it updates; otherwise, it inserts. This is commonly used in ETL pipelines and data warehouse dimensions.

🔷 Alternatives (Non-MERGE)

In some environments (e.g., Snowflake, PostgreSQL), you may use:

DBMS	Alternative
Oracle	`MERGE`
PostgreSQL	`INSERT ... ON CONFLICT DO UPDATE`
MySQL	`INSERT ... ON DUPLICATE KEY UPDATE`
SQL Server	`MERGE` (with caution, has edge cases)
Snowflake	`MERGE`
BigQuery	`MERGE`

✅ Summary: MERGE Highlights

Scenario	MERGE Feature
UPSERT Logic	✔️
ETL Staging	✔️
Data Warehouse Dims (SCD)	✔️
Performance	Efficient on indexed joins
Conditional Updates	Yes via `WHEN MATCHED AND ...`
JSON Payloads	Yes (from 21c using `JSON_TABLE`)

Here’s a crisp yet deep dive into Partitioning Techniques — particularly for Oracle, BigQuery, and Redshift — focusing on Range, Hash, List partitioning, pruning, sub-partitioning, and indexing strategies.

🧠 Why Partitioning Matters

Partitioning improves:

Query performance (via partition pruning)
Data management (purging, archiving)
Parallelism and load balancing
Scalability for large datasets

🔷 Types of Partitioning

Type	Description	Best For
🔢 Range	Based on ranges (e.g. date, salary)	Time-series, archiving
#️⃣ Hash	Based on a hash function	Uniform distribution
📋 List	Based on discrete values (e.g. region=’US’)	Categorical data
📊 Composite	Combination (e.g., Range + Hash)	Complex use cases

🔷 1. Oracle Partitioning

Oracle supports all types: Range, Hash, List, and Composite (Sub-Partitioning)

✅ a. Range Partition Example

CREATE TABLE sales (
  sale_id     NUMBER,
  sale_date   DATE,
  amount      NUMBER
)
PARTITION BY RANGE (sale_date) (
  PARTITION p2023q1 VALUES LESS THAN (TO_DATE('2023-04-01', 'YYYY-MM-DD')),
  PARTITION p2023q2 VALUES LESS THAN (TO_DATE('2023-07-01', 'YYYY-MM-DD'))
);

✅ b. Hash Partition

CREATE TABLE users (
  user_id NUMBER,
  name    VARCHAR2(100)
)
PARTITION BY HASH(user_id)
PARTITIONS 4;

✅ c. List Partition

CREATE TABLE customer_data (
  cust_id    NUMBER,
  region     VARCHAR2(10)
)
PARTITION BY LIST (region) (
  PARTITION p_north VALUES ('NORTH'),
  PARTITION p_south VALUES ('SOUTH')
);

✅ d. Composite Partition (Range + Hash)

CREATE TABLE orders (
  order_id   NUMBER,
  order_date DATE,
  cust_id    NUMBER
)
PARTITION BY RANGE (order_date)
SUBPARTITION BY HASH (cust_id)
SUBPARTITIONS 4
(
  PARTITION p2024 VALUES LESS THAN (TO_DATE('2025-01-01', 'YYYY-MM-DD')),
  PARTITION p2025 VALUES LESS THAN (MAXVALUE)
);

⚡ Oracle Partition Pruning

Oracle automatically skips partitions if WHERE clause filters allow it.

SELECT * FROM sales WHERE sale_date >= DATE '2023-01-01';
-- Prunes all irrelevant partitions

You can verify with:

EXPLAIN PLAN FOR ...
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

🔷 2. BigQuery Partitioning

BigQuery supports:

Ingestion time partitioning
Partitioned Tables (by column)
Clustering (secondary filtering)

✅ a. Partition by DATE or TIMESTAMP

CREATE TABLE sales_partitioned (
  sale_id INT64,
  sale_date DATE
)
PARTITION BY sale_date;

✅ b. Pseudo Hash via Clustering (No true HASH partition)

CREATE TABLE logs_partitioned
PARTITION BY DATE(log_time)
CLUSTER BY user_id;

📌 Clustering helps optimize filtering/sorting, especially when partitioning alone isn’t enough.

⚡ Partition Pruning in BigQuery

-- This is efficient (partition prune)
SELECT * FROM sales_partitioned
WHERE sale_date = '2023-01-01';

-- This disables pruning (bad):
WHERE EXTRACT(YEAR FROM sale_date) = 2023;

🔷 3. Redshift Partitioning (via Distribution + Sort Keys)

Redshift doesn’t support traditional partitioning, but uses:

Distribution Keys (DISTKEY) = like HASH partition
Sort Keys = like Range partition

✅ a. Hash Distribution

CREATE TABLE sales (
  sale_id INT,
  customer_id INT,
  sale_date DATE
)
DISTSTYLE KEY
DISTKEY(customer_id);

📌 Ensures all same customer data goes to the same node.

✅ b. Range-like Behavior: SORTKEY

CREATE TABLE sales (
  sale_id INT,
  sale_date DATE
)
SORTKEY(sale_date);

📌 Makes range queries fast (BETWEEN, >, <, etc.)

⚡ Redshift Best Practices

Scenario	Strategy
Large joins	Use same `DISTKEY`
Time-series data	Use `SORTKEY(date)`
Small lookup tables	Use `DISTSTYLE ALL` (replicated)

🔷 Indexing with Partitioning

DB	Strategy
Oracle	Local indexes (partition-wise) are common; global indexes work too
BigQuery	No indexes; relies on partition + clustering
Redshift	No indexes; sortkey + distkey act as physical optimizers

🔷 Summary Table

Feature	Oracle	BigQuery	Redshift
Range Partitioning	✅ Yes	✅ Yes	❌ (Sort Key only)
Hash Partitioning	✅ Yes	❌ (Cluster-only)	✅ (DISTKEY)
List Partitioning	✅ Yes	❌	❌
Composite (Subpart)	✅ Yes	❌	❌
Partition Pruning	✅ Automatic	✅ Column-aware	❌ (Handled via sortkey)
Index Support	✅ (Local/Global)	❌ (None)	❌ (None)

💬 Interview Insight

“How would you partition a multi-billion row table in Oracle/BigQuery?”

Answer:

I’d use range partitioning on a time column (e.g., sale_date), and if further optimization is needed, apply sub-partitioning (e.g., hash on customer_id). This supports fast partition pruning, parallel processing, and easy purging of old partitions.

Excellent! You’ve now covered a complete, enterprise-grade SQL interview roadmap.

Below is a clean summary of points 7–10 with quick interview insights, real use cases, and tips — ideal for last-minute revision or documentation.

✅ 7. Temporal Tables (System-Versioned Tables)

📌 What it is:
A table that automatically tracks historical changes — records “when” each row was valid.

🧠 Core Concepts:

Two timestamps: valid_from, valid_to
Common in auditing, slowly changing dimensions (SCD2), and regulatory logs
Also known as system-versioned tables

📌 Supported in:
SQL Server (SYSTEM_VERSIONING = ON), Oracle (via custom triggers or Flashback), PostgreSQL (manual), BigQuery (using FOR SYSTEM_TIME AS OF)

✅ Use Case:

-- Query as of a previous point in time
SELECT * FROM employees
FOR SYSTEM_TIME AS OF TIMESTAMP '2024-12-31 23:59:59';

💡 Interview Tip:
Be ready to simulate temporal behavior manually if the RDBMS doesn’t support it natively (e.g., keep history tables, use triggers).

✅ 8. Hierarchical Queries

📌 What it is:
Used to model tree structures like org charts, category hierarchies, file systems, etc.

🧠 Syntax Across RDBMS:

DB	Syntax
Oracle	`CONNECT BY PRIOR` + `LEVEL`
PostgreSQL / MySQL 8+	`WITH RECURSIVE` CTE
SQL Server	`WITH RECURSIVE` (same as above)

✅ Example (PostgreSQL):

WITH RECURSIVE org_chart AS (
  SELECT emp_id, manager_id, emp_name, 1 AS level
  FROM employees
  WHERE manager_id IS NULL
  UNION ALL
  SELECT e.emp_id, e.manager_id, e.emp_name, oc.level + 1
  FROM employees e
  JOIN org_chart oc ON e.manager_id = oc.emp_id
)
SELECT * FROM org_chart;

💡 Interview Tip:
Interviewers love this for BOM (Bill of Materials) questions. Know both Oracle (CONNECT BY) and CTE style.

✅ 9. Role of SQL in Data Pipelines & Orchestration

📌 Context:
SQL powers:

Extract: Pulling data from source systems
Transform: Cleansing, joins, validations
Load: Inserting into reporting/marts

💡 Orchestration Context:

Airflow DAGs: SQL used in operators like PostgresOperator, BigQueryOperator
Spark SQL Jobs: SQL queries over DataFrames or Hive tables
Validation Checks: Row counts, null checks, duplicates, reconciliation

✅ Sample Validation SQL:

-- Check record mismatch
SELECT COUNT(*) FROM src_table
MINUS
SELECT COUNT(*) FROM target_table;

💡 Interview Tip:
Be ready to explain how you used SQL for quality checks or task dependencies in orchestration.

✅ 10. SQL Coding Best Practices

📌 Why Important:
Clean SQL → Better maintenance → Fewer bugs in pipelines/reports

✅ Best Practices Summary:

Aspect	Best Practice
Naming	Use `snake_case` or `camelCase`, meaningful aliases
NULLs	Use `COALESCE`, beware of `NULL != NULL`, handle `CASE WHEN col IS NULL` explicitly
Casting	Always cast explicitly in calculations
Joins	Use aliasing and qualify columns: `a.id = b.id`
Reusability	Use CTEs (`WITH`) for modular, readable SQL
Performance	Avoid `SELECT *`, use indexed columns in filters, monitor query plans
Comments	Always explain logic in complex CTEs or subqueries

💡 Interview Tip:
They often ask you to “optimize” a given messy SQL. Follow these rules and explain your cleanup rationale.

SQL Tricky Conceptual Interview Questions

Coding Questions in Spark SQL, Pyspark, and Python

recent posts

about

✅ 1. What is an Execution Plan?

✅ 2. Important Components of Execution Plan

✅ 3. Interview Tip: How to Debug a Slow Query?

📌 Sample Interview Answer:

✅ 4. Index Hints (Oracle Example)

✅ 5. Types of Index Scans in Execution Plan

✅ 6. Sample Scenario – Optimization

🐢 Slow Query:

⚡ Optimized:

✅ 7. Cost-Based Optimizer (CBO)

✅ 8. Final Checklist to Optimize Queries

✅ 9. Practice Dataset for Demo

🔐 SQL Injection & Security Best Practices

🚨 What is SQL Injection?

🧪 Classic Example of SQL Injection

🛡️ Best Practices to Prevent SQL Injection

✅ 1. Use Parameterized Queries / Bind Variables

✅ 2. Use ORM Tools (Hibernate, Sequelize, SQLAlchemy)

✅ 3. Avoid Dynamic SQL Unless Absolutely Necessary

✅ 4. Validate and Sanitize Inputs

✅ 5. Minimize DB Permissions

✅ 6. Monitor and Audit Logs

🧠 Interview Question + Answer

💡 Real-World Use Case

🚫 Common Mistakes

✅ Security Checklist for Interviews

🔍 What is a Subquery?

🧠 Types of Subqueries

🧪 Sample Dataset

employees

✅ Example: Non-Correlated Subquery

Goal: Get all employees whose salary is greater than the average salary.

🔁 Example: Correlated Subquery

Goal: Get employees whose salary is above the average salary of their department.

🚀 Optimize Correlated Subqueries

🎯 Real Interview Scenario

✍ Summary

🔷 Part 1: JSON Handling in Oracle SQL

✅ 1. Extract Single Value: JSON_VALUE

✅ 2. Extract JSON Object/Array as Text: JSON_QUERY

✅ 3. Extract Multiple Rows: JSON_TABLE

✅ 4. Filter JSON Data: IS JSON / JSON_EXISTS

🔷 Part 2: XML Handling in Oracle SQL

✅ 1. Extract Single Value: EXTRACTVALUE (deprecated), use XMLQUERY / XMLTABLE

✅ 2. Use XMLTABLE for Relational Output

🔷 Advanced Use Cases (Hybrid / Real-World)

🔸1. Nested JSON with JSON_TABLE

🔸2. XMLTABLE with Nested Structures

🔷 Real Interview Scenarios

🔷 Summary Cheat Sheet

🔷 Bonus: Indexing JSON/XML Columns

🔷 MERGE INTO: Syntax & Core Concept

✅ Basic Syntax:

🔷 Practical Example

🧱 Tables:

🧪 Data:

🔁 MERGE Logic:

🔷 Advanced Use Cases

🔸1. Add Filters (e.g., Update Only If Salary Changes)

🔸2. Insert Only Select Columns

🔸3. SCD Type 1 (Overwrite Changes)

🔸4. SCD Type 2 (Preserve History)

🔸5. UPSERT from JSON or API into Table (Oracle 21c+ JSON)

🔷 Interview Tip

🔷 Alternatives (Non-MERGE)

✅ Summary: MERGE Highlights

🧠 Why Partitioning Matters

🔷 Types of Partitioning

🔷 1. Oracle Partitioning

✅ a. Range Partition Example

✅ b. Hash Partition

✅ c. List Partition

✅ d. Composite Partition (Range + Hash)

⚡ Oracle Partition Pruning

🔷 2. BigQuery Partitioning

✅ a. Partition by DATE or TIMESTAMP

✅ b. Pseudo Hash via Clustering (No true HASH partition)

`employees`

✅ 1. Extract Single Value: `JSON_VALUE`

✅ 2. Extract JSON Object/Array as Text: `JSON_QUERY`

✅ 3. Extract Multiple Rows: `JSON_TABLE`

✅ 4. Filter JSON Data: `IS JSON` / `JSON_EXISTS`

✅ 1. Extract Single Value: `EXTRACTVALUE` (deprecated), use `XMLQUERY` / `XMLTABLE`

✅ 2. Use `XMLTABLE` for Relational Output

🔸1. Nested JSON with `JSON_TABLE`

🔷 `MERGE INTO`: Syntax & Core Concept