Here’s a comprehensive guide to important and tricky conceptual issues in SQL, including NULL behavior, joins, filters, grouping, ordering, and subqueries.
✅ 1. NULLs: The #1 source of confusion
a. NULL
≠ NULL
NULL = NULL
→UNKNOWN
(notTRUE
)- Must use:
IS NULL
,IS NOT NULL
b. NOT IN
with NULL
- If subquery has a single
NULL
, the wholeNOT IN
fails. - ✅ Use
LEFT ANTI JOIN
instead.
c. Arithmetic with NULL
5 + NULL
=NULL
- Any arithmetic with NULL gives NULL
✅ 2. JOIN Issues
a. INNER JOIN
drops unmatched rows.
b. LEFT JOIN
keeps all rows from the left, even unmatched.
- Add
WHERE right_col IS NULL
to find “non-matching” entries.
c. Beware of duplicate matches
- If join keys aren’t unique, you’ll get row explosion.
- Use aggregation or row number if necessary.
✅ 3. WHERE vs HAVING
a. WHERE
filters before aggregation
b. HAVING
filters after aggregation
SELECT department, COUNT(*) as emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 10;
✅ 4. GROUP BY Pitfalls
a. Only select aggregated columns or grouped columns
SELECT dept, COUNT(*) FROM employees GROUP BY dept; ✅
SELECT name, COUNT(*) FROM employees GROUP BY dept; ❌
✅ 5. ORDER BY and LIMIT
a. ORDER BY
applies after SELECT and GROUP BY
- Often used with
LIMIT
to find top N
SELECT * FROM sales ORDER BY revenue DESC LIMIT 5;
✅ 6. Subquery Issues
a. NOT IN (SELECT NULL...)
⇒ fails
b. Correlated subqueries run per row
SELECT e.*
FROM employees e
WHERE salary > (
SELECT AVG(salary) FROM employees WHERE department = e.department
);
⚠️ Can be slow if not optimized
✅ 7. DISTINCT vs GROUP BY
a. DISTINCT
removes duplicate rows
SELECT DISTINCT dept FROM employees;
b. GROUP BY
groups rows and aggregates
SELECT dept, COUNT(*) FROM employees GROUP BY dept;
✅ 8. BETWEEN is inclusive
WHERE date BETWEEN '2023-01-01' AND '2023-01-31'
Includes both '2023-01-01'
and '2023-01-31'
✅ 9. NULL-safe Equality (MySQL specific)
SELECT * FROM t WHERE a <=> b; -- True even if both NULL
Not supported in most engines; use with caution.
✅ 10. CASE and COALESCE
a. CASE
for conditional logic
SELECT
CASE
WHEN score > 90 THEN 'A'
WHEN score > 70 THEN 'B'
ELSE 'C'
END as grade
b. COALESCE()
returns first non-NULL
SELECT COALESCE(nickname, name, 'N/A') FROM users;
✅ 11. Window Functions (OVER clause)
Powerful for rankings, moving averages, cumulative sums:
SELECT name, salary,
RANK() OVER (PARTITION BY dept ORDER BY salary DESC) as dept_rank
FROM employees;
✅ 12. UNION vs UNION ALL
UNION
removes duplicatesUNION ALL
keeps all rows (faster)
✅ 13. Cartesian Product (Cross Join by mistake)
If you forget ON
clause in join, you get:
SELECT * FROM A, B; -- Big disaster!
✅ 14. Filtering on Aggregates: Don’t put aggregates in WHERE
SELECT dept, AVG(salary)
FROM employees
WHERE AVG(salary) > 50000; ❌ Invalid
✅ Use HAVING
✅ 15. Boolean logic (3-valued)
TRUE
,FALSE
, andUNKNOWN
- Comparisons with NULL yield UNKNOWN
- WHERE filters only
TRUE
✅ 16. Common performance tips
- Use indexed columns in joins/filters
- Avoid
SELECT *
in production - Filter early (pushdown)
- Watch out for implicit type conversion (slows joins)
✅ 17. Date functions differ by dialect
- Spark SQL:
date_sub()
,current_date()
- MySQL:
DATE_SUB(NOW(), INTERVAL 7 DAY)
- Standard SQL:
CURRENT_DATE - INTERVAL '7 days'
Leave a Reply