1. CTEs for Readability (Not Just Elegance)
A WITH clause isn’t fancy syntax—it’s a readability tool. Break complex logic into digestible steps.
Without CTE (hard to follow):
SELECT
customer_id,
SUM(amount) as total_spent,
COUNT(*) as num_orders
FROM orders
WHERE order_date >= '2026-01-01'
AND customer_id IN (
SELECT id FROM customers
WHERE created_at <= '2025-01-01'
)
GROUP BY customer_id
HAVING COUNT(*) > 5
With CTE (clear intent):
WITH established_customers AS (
SELECT id
FROM customers
WHERE created_at <= '2025-01-01'
),
recent_orders AS (
SELECT customer_id, amount
FROM orders
WHERE order_date >= '2026-01-01'
)
SELECT
customer_id,
SUM(amount) as total_spent,
COUNT(*) as num_orders
FROM recent_orders
WHERE customer_id IN (SELECT id FROM established_customers)
GROUP BY customer_id
HAVING COUNT(*) > 5
Each CTE answers one question. Future-you (and teammates) will understand instantly.
2. GROUP BY: Watch Your Aggregations
GROUP BY is straightforward, but easy to mess up. Two things to watch:
Gotcha 1: Non-aggregated columns
-- ❌ WRONG (some databases will error, others guess)
SELECT
customer_id,
email, -- This isn't in GROUP BY and isn't aggregated!
SUM(amount) as total
FROM orders
GROUP BY customer_id
-- ✓ RIGHT
SELECT
customer_id,
MAX(email) as email, -- Or use DISTINCT if you know it's one per customer
SUM(amount) as total
FROM orders
GROUP BY customer_id
Gotcha 2: NULL handling
GROUP BY treats NULL as a distinct value—all NULLs group together:
SELECT
category, -- If NULL values exist, they'll be grouped as one row
COUNT(*) as count
FROM products
GROUP BY category
If you have 50 products with no category, they’ll appear as one NULL row with count 50. Handle intentionally:
SELECT
COALESCE(category, 'Uncategorized') as category,
COUNT(*) as count
FROM products
GROUP BY COALESCE(category, 'Uncategorized')
3. Query Optimization: Execution Time Wins
You don’t need to be a DBA—focus on structure, not tuning.
Pattern 1: Filter Early
Move WHERE clauses before JOINs when possible. Filter the largest tables first.
-- ❌ SLOW (joins millions of rows, then filters)
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.created_at >= '2026-01-01'
-- ✓ FAST (filters orders first, then joins)
SELECT o.*, c.name
FROM (
SELECT * FROM orders
WHERE created_at >= '2026-01-01'
) o
JOIN customers c ON o.customer_id = c.id
Pattern 2: Use UNION Instead of OR
OR conditions force the database to evaluate every row. UNION can use separate indexes:
-- ❌ SLOWER
SELECT * FROM orders
WHERE status = 'completed' OR status = 'pending'
-- ✓ FASTER
SELECT * FROM orders WHERE status = 'completed'
UNION ALL
SELECT * FROM orders WHERE status = 'pending'
Pattern 3: Check Execution Time (EXPLAIN ANALYZE)
Before optimizing, measure:
EXPLAIN ANALYZE
SELECT customer_id, SUM(amount)
FROM orders
GROUP BY customer_id
Look for:
- Seq Scan = full table scan (slow if table is huge)
- Index Scan = using an index (fast)
- Actual vs. Planned rows = big difference = bad estimate
If you see sequential scans on large tables, an index might help. Ask your DBA.
The Pattern
These three techniques work together:
- CTEs = clear, readable structure
- GROUP BY = correct aggregations, handle NULLs
- Optimization = filter early, measure, optimize intentionally
Write for clarity first. Optimize only if it’s slow.