Indexing is one of the most important SQL performance optimization techniques. When used correctly, it drastically improves read/query speed—especially for large tables.
✅ What is an Index in SQL?
An index is a data structure (usually B-Tree) that allows fast access to rows based on key column values, like a book’s index.
Think of it like a lookup table: instead of scanning all rows, the DB engine jumps to the location directly.
🧪 1. Basic Syntax to Create an Index
CREATE INDEX index_name
ON table_name (column1, column2, ...);
📘 Example:
CREATE INDEX idx_customer_email
ON customers (email);
Now, queries like:
SELECT * FROM customers WHERE email = 'abc@example.com';
…will use the index and avoid a full table scan.
✅ 2. Use Cases Where Indexes Help
🔍 a. WHERE clause filters
SELECT * FROM orders WHERE order_id = 12345;
✅ Index on order_id
makes this instant.
🔍 b. JOIN conditions
SELECT * FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;
✅ Index on customer_id
in both tables speeds up joins.
🔍 c. GROUP BY / ORDER BY columns
SELECT category, COUNT(*) FROM products GROUP BY category;
SELECT * FROM orders ORDER BY order_date DESC;
✅ Index on category
or order_date
improves performance.
🔍 d. Foreign key columns
Foreign keys are frequently used in joins and filters, so indexing them is good practice.
ALTER TABLE orders
ADD CONSTRAINT fk_customer FOREIGN KEY (customer_id) REFERENCES customers(id);
✅ Indexing customer_id
in orders
improves performance.
⚙️ 3. Composite Index (Multi-Column Index)
CREATE INDEX idx_order_customer_date
ON orders (customer_id, order_date);
Best when queries filter by leading columns:
-- ✅ uses index
SELECT * FROM orders WHERE customer_id = 5;
-- ✅ uses index
SELECT * FROM orders WHERE customer_id = 5 AND order_date > '2023-01-01';
-- ❌ won’t use index effectively
SELECT * FROM orders WHERE order_date > '2023-01-01';
🧠 Index is only used when the query includes the left-most column(s).
⚠️ 4. Downsides of Indexes
Concern | Explanation |
---|---|
Slower INSERT/UPDATE/DELETE | Because the index must also be updated |
Takes disk space | Especially for large tables |
Over-indexing | Too many indexes can confuse the optimizer |
🔍 5. Check if Index Is Used (Query Plan)
EXPLAIN SELECT * FROM customers WHERE email = 'abc@example.com';
- In PostgreSQL: look for
Index Scan
- In MySQL: look for
Using index
orUsing where
This helps validate that your query uses the index and is optimized.
🧱 6. Unique Index
Also prevents duplicates:
CREATE UNIQUE INDEX idx_unique_email
ON users (email);
🧊 7. Index on Expressions / Functions (DB-specific)
Some DBs allow function-based indexing:
-- Example in PostgreSQL
CREATE INDEX idx_lower_email ON users (LOWER(email));
Used when queries use functions:
SELECT * FROM users WHERE LOWER(email) = 'abc@example.com';
🧮 8. Partial Indexes (PostgreSQL, SQLite)
CREATE INDEX idx_active_users ON users (last_login)
WHERE is_active = true;
✅ Efficient for selective queries.
🗑️ 9. Drop Index
DROP INDEX idx_customer_email; -- PostgreSQL / SQLite
-- MySQL
DROP INDEX idx_customer_email ON customers;
🧠 Summary Table
Use Case | Index Type | Example Syntax |
---|---|---|
Fast lookups | Single-column index | CREATE INDEX ON t(col); |
Filter + order/group | Multi-column (composite) index | CREATE INDEX ON t(col1, col2); |
Uniqueness enforcement | Unique index | CREATE UNIQUE INDEX ON t(col); |
Function search | Functional index | CREATE INDEX ON t(LOWER(col)); |
Only some rows | Partial index (PostgreSQL) | CREATE INDEX ON t(col) WHERE cond; |
Query optimization | Explain plan | EXPLAIN SELECT... |
Index cleanup | Drop index | DROP INDEX ... |
Great! Here’s how we can demonstrate indexing across the three platforms with hands-on examples:
✅ 1. 🔧 PostgreSQL Indexing Demo
Step 1: Setup a sample table
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100),
city VARCHAR(50),
created_at DATE
);
Step 2: Add some sample data (bulk insert or generate rows)
INSERT INTO customers (name, email, city, created_at)
SELECT
'Name' || i,
'user' || i || '@test.com',
CASE WHEN i % 2 = 0 THEN 'Delhi' ELSE 'Mumbai' END,
NOW() - (i || ' days')::interval
FROM generate_series(1, 100000) AS i;
Step 3: Create an index
CREATE INDEX idx_city ON customers(city);
Step 4: Compare with EXPLAIN ANALYZE
EXPLAIN ANALYZE
SELECT * FROM customers WHERE city = 'Delhi';
You should see Bitmap Index Scan
or Index Scan
in the plan.
✅ 2. ⚙️ MySQL Indexing Demo
Step 1: Create a table
CREATE TABLE orders (
id INT AUTO_INCREMENT PRIMARY KEY,
customer_id INT,
amount DECIMAL(10, 2),
order_date DATE
);
Step 2: Insert test data
DELIMITER $$
CREATE PROCEDURE insert_orders()
BEGIN
DECLARE i INT DEFAULT 1;
WHILE i <= 100000 DO
INSERT INTO orders (customer_id, amount, order_date)
VALUES (
FLOOR(RAND()*1000),
RAND()*1000,
DATE_SUB(CURDATE(), INTERVAL FLOOR(RAND()*365) DAY)
);
SET i = i + 1;
END WHILE;
END$$
DELIMITER ;
CALL insert_orders();
Step 3: Create Index
CREATE INDEX idx_order_date ON orders(order_date);
Step 4: Check performance
EXPLAIN SELECT * FROM orders WHERE order_date = '2024-01-01';
✅ Look for “Using index” or “ref” in EXPLAIN
.
✅ 3. 🧊 PySpark + Delta Lake Z-Ordering
Z-Ordering optimizes read performance in Delta Lake tables by co-locating related data.
Step 1: Create a Delta Table
from pyspark.sql.functions import expr
df = spark.range(0, 1000000).withColumn("order_id", expr("id % 1000")) \
.withColumn("country", expr("CASE WHEN id % 2 = 0 THEN 'India' ELSE 'US' END")) \
.withColumn("amount", expr("rand() * 1000"))
df.write.format("delta").mode("overwrite").save("/mnt/delta/orders")
spark.sql("DROP TABLE IF EXISTS delta_orders")
spark.sql("CREATE TABLE delta_orders USING DELTA LOCATION '/mnt/delta/orders'")
Step 2: Optimize the table using Z-Order
OPTIMIZE delta_orders ZORDER BY (country);
Step 3: Query and observe performance
spark.sql("SELECT * FROM delta_orders WHERE country = 'India'").explain(True)
✅ Performance gains are observed on large datasets after OPTIMIZE
.
🧠 Summary
Tech | Indexing Method | Syntax | Performance Hint |
---|---|---|---|
PostgreSQL | B-Tree Index | CREATE INDEX | EXPLAIN ANALYZE |
MySQL | B-Tree Index | CREATE INDEX | EXPLAIN |
PySpark + Delta Lake | Z-Ordering | OPTIMIZE ... ZORDER BY | spark.sql(...).explain(True) |
Leave a Reply