Indexing is one of the most important SQL performance optimization techniques. When used correctly, it drastically improves read/query speed—especially for large tables.


✅ What is an Index in SQL?

An index is a data structure (usually B-Tree) that allows fast access to rows based on key column values, like a book’s index.

Think of it like a lookup table: instead of scanning all rows, the DB engine jumps to the location directly.


🧪 1. Basic Syntax to Create an Index

CREATE INDEX index_name
ON table_name (column1, column2, ...);

📘 Example:

CREATE INDEX idx_customer_email
ON customers (email);

Now, queries like:

SELECT * FROM customers WHERE email = 'abc@example.com';

…will use the index and avoid a full table scan.


✅ 2. Use Cases Where Indexes Help

🔍 a. WHERE clause filters

SELECT * FROM orders WHERE order_id = 12345;

✅ Index on order_id makes this instant.


🔍 b. JOIN conditions

SELECT * FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;

✅ Index on customer_id in both tables speeds up joins.


🔍 c. GROUP BY / ORDER BY columns

SELECT category, COUNT(*) FROM products GROUP BY category;
SELECT * FROM orders ORDER BY order_date DESC;

✅ Index on category or order_date improves performance.


🔍 d. Foreign key columns

Foreign keys are frequently used in joins and filters, so indexing them is good practice.

ALTER TABLE orders
ADD CONSTRAINT fk_customer FOREIGN KEY (customer_id) REFERENCES customers(id);

✅ Indexing customer_id in orders improves performance.


⚙️ 3. Composite Index (Multi-Column Index)

CREATE INDEX idx_order_customer_date
ON orders (customer_id, order_date);

Best when queries filter by leading columns:

-- ✅ uses index
SELECT * FROM orders WHERE customer_id = 5;

-- ✅ uses index
SELECT * FROM orders WHERE customer_id = 5 AND order_date > '2023-01-01';

-- ❌ won’t use index effectively
SELECT * FROM orders WHERE order_date > '2023-01-01';

🧠 Index is only used when the query includes the left-most column(s).


⚠️ 4. Downsides of Indexes

ConcernExplanation
Slower INSERT/UPDATE/DELETEBecause the index must also be updated
Takes disk spaceEspecially for large tables
Over-indexingToo many indexes can confuse the optimizer

🔍 5. Check if Index Is Used (Query Plan)

EXPLAIN SELECT * FROM customers WHERE email = 'abc@example.com';
  • In PostgreSQL: look for Index Scan
  • In MySQL: look for Using index or Using where

This helps validate that your query uses the index and is optimized.


🧱 6. Unique Index

Also prevents duplicates:

CREATE UNIQUE INDEX idx_unique_email
ON users (email);

🧊 7. Index on Expressions / Functions (DB-specific)

Some DBs allow function-based indexing:

-- Example in PostgreSQL
CREATE INDEX idx_lower_email ON users (LOWER(email));

Used when queries use functions:

SELECT * FROM users WHERE LOWER(email) = 'abc@example.com';

🧮 8. Partial Indexes (PostgreSQL, SQLite)

CREATE INDEX idx_active_users ON users (last_login)
WHERE is_active = true;

✅ Efficient for selective queries.


🗑️ 9. Drop Index

DROP INDEX idx_customer_email;  -- PostgreSQL / SQLite

-- MySQL
DROP INDEX idx_customer_email ON customers;

🧠 Summary Table

Use CaseIndex TypeExample Syntax
Fast lookupsSingle-column indexCREATE INDEX ON t(col);
Filter + order/groupMulti-column (composite) indexCREATE INDEX ON t(col1, col2);
Uniqueness enforcementUnique indexCREATE UNIQUE INDEX ON t(col);
Function searchFunctional indexCREATE INDEX ON t(LOWER(col));
Only some rowsPartial index (PostgreSQL)CREATE INDEX ON t(col) WHERE cond;
Query optimizationExplain planEXPLAIN SELECT...
Index cleanupDrop indexDROP INDEX ...

Great! Here’s how we can demonstrate indexing across the three platforms with hands-on examples:


✅ 1. 🔧 PostgreSQL Indexing Demo

Step 1: Setup a sample table

CREATE TABLE customers (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(100),
  city VARCHAR(50),
  created_at DATE
);

Step 2: Add some sample data (bulk insert or generate rows)

INSERT INTO customers (name, email, city, created_at)
SELECT 
  'Name' || i, 
  'user' || i || '@test.com', 
  CASE WHEN i % 2 = 0 THEN 'Delhi' ELSE 'Mumbai' END,
  NOW() - (i || ' days')::interval
FROM generate_series(1, 100000) AS i;

Step 3: Create an index

CREATE INDEX idx_city ON customers(city);

Step 4: Compare with EXPLAIN ANALYZE

EXPLAIN ANALYZE 
SELECT * FROM customers WHERE city = 'Delhi';

You should see Bitmap Index Scan or Index Scan in the plan.


✅ 2. ⚙️ MySQL Indexing Demo

Step 1: Create a table

CREATE TABLE orders (
  id INT AUTO_INCREMENT PRIMARY KEY,
  customer_id INT,
  amount DECIMAL(10, 2),
  order_date DATE
);

Step 2: Insert test data

DELIMITER $$
CREATE PROCEDURE insert_orders()
BEGIN
  DECLARE i INT DEFAULT 1;
  WHILE i <= 100000 DO
    INSERT INTO orders (customer_id, amount, order_date)
    VALUES (
      FLOOR(RAND()*1000),
      RAND()*1000,
      DATE_SUB(CURDATE(), INTERVAL FLOOR(RAND()*365) DAY)
    );
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;

CALL insert_orders();

Step 3: Create Index

CREATE INDEX idx_order_date ON orders(order_date);

Step 4: Check performance

EXPLAIN SELECT * FROM orders WHERE order_date = '2024-01-01';

✅ Look for “Using index” or “ref” in EXPLAIN.


✅ 3. 🧊 PySpark + Delta Lake Z-Ordering

Z-Ordering optimizes read performance in Delta Lake tables by co-locating related data.

Step 1: Create a Delta Table

from pyspark.sql.functions import expr
df = spark.range(0, 1000000).withColumn("order_id", expr("id % 1000")) \
                             .withColumn("country", expr("CASE WHEN id % 2 = 0 THEN 'India' ELSE 'US' END")) \
                             .withColumn("amount", expr("rand() * 1000"))

df.write.format("delta").mode("overwrite").save("/mnt/delta/orders")
spark.sql("DROP TABLE IF EXISTS delta_orders")
spark.sql("CREATE TABLE delta_orders USING DELTA LOCATION '/mnt/delta/orders'")

Step 2: Optimize the table using Z-Order

OPTIMIZE delta_orders ZORDER BY (country);

Step 3: Query and observe performance

spark.sql("SELECT * FROM delta_orders WHERE country = 'India'").explain(True)

✅ Performance gains are observed on large datasets after OPTIMIZE.


🧠 Summary

TechIndexing MethodSyntaxPerformance Hint
PostgreSQLB-Tree IndexCREATE INDEXEXPLAIN ANALYZE
MySQLB-Tree IndexCREATE INDEXEXPLAIN
PySpark + Delta LakeZ-OrderingOPTIMIZE ... ZORDER BYspark.sql(...).explain(True)

Pages: 1 2


Discover more from HintsToday

Subscribe to get the latest posts sent to your email.

Posted in

Leave a Reply

Discover more from HintsToday

Subscribe now to keep reading and get access to the full archive.

Continue reading