Here’s a comprehensive guide to important and tricky conceptual issues in SQL, including NULL behavior, joins, filters, grouping, ordering, and subqueries.


1. NULLs: The #1 source of confusion

a. NULLNULL

  • NULL = NULLUNKNOWN (not TRUE)
  • Must use: IS NULL, IS NOT NULL

b. NOT IN with NULL

  • If subquery has a single NULL, the whole NOT IN fails.
  • ✅ Use LEFT ANTI JOIN instead.

c. Arithmetic with NULL

  • 5 + NULL = NULL
  • Any arithmetic with NULL gives NULL

2. JOIN Issues

a. INNER JOIN drops unmatched rows.

b. LEFT JOIN keeps all rows from the left, even unmatched.

  • Add WHERE right_col IS NULL to find “non-matching” entries.

c. Beware of duplicate matches

  • If join keys aren’t unique, you’ll get row explosion.
  • Use aggregation or row number if necessary.

3. WHERE vs HAVING

a. WHERE filters before aggregation

b. HAVING filters after aggregation

SELECT department, COUNT(*) as emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 10;

4. GROUP BY Pitfalls

a. Only select aggregated columns or grouped columns

SELECT dept, COUNT(*) FROM employees GROUP BY dept; ✅
SELECT name, COUNT(*) FROM employees GROUP BY dept; ❌

5. ORDER BY and LIMIT

a. ORDER BY applies after SELECT and GROUP BY

  • Often used with LIMIT to find top N
SELECT * FROM sales ORDER BY revenue DESC LIMIT 5;

6. Subquery Issues

a. NOT IN (SELECT NULL...) ⇒ fails

b. Correlated subqueries run per row

SELECT e.*
FROM employees e
WHERE salary > (
  SELECT AVG(salary) FROM employees WHERE department = e.department
);

⚠️ Can be slow if not optimized


7. DISTINCT vs GROUP BY

a. DISTINCT removes duplicate rows

SELECT DISTINCT dept FROM employees;

b. GROUP BY groups rows and aggregates

SELECT dept, COUNT(*) FROM employees GROUP BY dept;

8. BETWEEN is inclusive

WHERE date BETWEEN '2023-01-01' AND '2023-01-31'

Includes both '2023-01-01' and '2023-01-31'


9. NULL-safe Equality (MySQL specific)

SELECT * FROM t WHERE a <=> b;  -- True even if both NULL

Not supported in most engines; use with caution.


10. CASE and COALESCE

a. CASE for conditional logic

SELECT
  CASE
    WHEN score > 90 THEN 'A'
    WHEN score > 70 THEN 'B'
    ELSE 'C'
  END as grade

b. COALESCE() returns first non-NULL

SELECT COALESCE(nickname, name, 'N/A') FROM users;

11. Window Functions (OVER clause)

Powerful for rankings, moving averages, cumulative sums:

SELECT name, salary,
  RANK() OVER (PARTITION BY dept ORDER BY salary DESC) as dept_rank
FROM employees;

12. UNION vs UNION ALL

  • UNION removes duplicates
  • UNION ALL keeps all rows (faster)

13. Cartesian Product (Cross Join by mistake)

If you forget ON clause in join, you get:

SELECT * FROM A, B;  -- Big disaster!

14. Filtering on Aggregates: Don’t put aggregates in WHERE

SELECT dept, AVG(salary)
FROM employees
WHERE AVG(salary) > 50000;  ❌ Invalid

✅ Use HAVING


15. Boolean logic (3-valued)

  • TRUE, FALSE, and UNKNOWN
  • Comparisons with NULL yield UNKNOWN
  • WHERE filters only TRUE

16. Common performance tips

  • Use indexed columns in joins/filters
  • Avoid SELECT * in production
  • Filter early (pushdown)
  • Watch out for implicit type conversion (slows joins)

17. Date functions differ by dialect

  • Spark SQL: date_sub(), current_date()
  • MySQL: DATE_SUB(NOW(), INTERVAL 7 DAY)
  • Standard SQL: CURRENT_DATE - INTERVAL '7 days'

Pages: 1 2 3


Discover more from HintsToday

Subscribe to get the latest posts sent to your email.

Posted in

Leave a Reply

Discover more from HintsToday

Subscribe now to keep reading and get access to the full archive.

Continue reading

Subscribe