SQL Joins Explained: LEFT JOIN, INNER JOIN, OUTER JOIN Guide 2025

data career skills data engineering database fundementals query optimisation sql Aug 21, 2025
SQL Joins Explained blog header covering LEFT JOIN INNER JOIN OUTER JOIN guide 2025 by Tom Mitchell

The Complete Guide to SQL JOINs: From Beginner to Expert

Without a doubt, if you plan to work with SQL in any capacity, you'll be required to join data tables regularly. Unfortunately, this is a topic that most beginners find difficult to wrap their heads around. The truth is, once you've grasped the basics and done it a few times, it will become second nature.

Here's what you'll learn in this article:

  • All major join types (INNER, LEFT, RIGHT, FULL OUTER) with real examples from my own experience
  • Multi-table join strategies for 3, 4, or even 10+ tables
  • Performance optimisation techniques that can cut query time by 90%+
  • Common join pitfalls and how to debug them quickly
  • PostgreSQL-specific tips for production-scale application

By the end, you'll write better queries and be able to handle data from multiple tables with confidence.

💡 Quick Answer: If you're here wondering "what's the difference between LEFT JOIN and INNER JOIN?" - INNER JOIN only shows rows that exist in both tables, while LEFT JOIN shows all rows from the first table plus matching rows from the second. But stick around, because there's much more to master.

SQL JOIN Types: Quick Reference

Before we dive into Luigi's story, here's what you'll master:

JOIN Type What It Does When Luigi Uses It
INNER JOIN Only matched data "Show customers who actually ordered"
LEFT JOIN All left table + matches "Show ALL customers, even non-buyers"
RIGHT JOIN All right table + matches "Show all orders, even if customer deleted"
FULL OUTER JOIN Everything from both tables "Show customers AND orders, matched or not"

Now, let's meet Luigi...

Table of Contents

  1. What Are SQL Joins?
    • Why we need joins
    • Real-world examples from e-commerce and finance
  2. SQL LEFT JOIN vs INNER JOIN vs OUTER JOIN: Complete Guide
    • INNER JOIN: Getting matched data only
    • LEFT JOIN: Keep all left table rows
    • RIGHT JOIN: Keep all right table rows
    • FULL OUTER JOIN: Keep everything
    • When to use each type
  3. How to Join Multiple Tables Step-by-Step
    • 2-table joins with examples
    • 3-table joins (customers → orders → products)
    • 4+ table joins for complex data
    • Join order and performance
  4. SQL Join on Multiple Columns
    • When you need composite keys
    • Syntax and examples
    • Performance considerations
  5. SQL JOIN Performance: Making Your Queries Lightning Fast
    • Index optimisation
    • Join order strategies
    • Production tips
  6. Advanced Join Techniques
    • Self joins explained
    • Using subqueries with joins
    • Complex join conditions
  7. Common Join Mistakes & How to Fix Them
    • Cartesian products (too many rows)
    • Missing join conditions
    • Wrong join types
    • Debugging techniques
  8. Best Practices & Tips
    • Writing readable join queries
    • Testing strategies
    • Production considerations
  9. Frequently Asked Questions: SQL JOINs
  10. TL;DR Summary

1. What Are SQL Joins?

Meet Luigi. He owns a busy pizzeria and wants to answer simple questions like "Which customers ordered pepperoni pizza last week?" or "How much did Maria spend this month?"

Luigi stores his data in separate tables: one for customers (names, phone numbers), one for orders (dates, totals), and one for pizzas (names, prices).

To answer his questions, he needs SQL joins to connect these tables and get the complete picture.

Without joins, Luigi would need to look through each table separately, then manually match customer IDs to order IDs to pizza IDs.

That's exactly what joins do automatically—they connect related data across multiple tables.

Why We Need Joins

Imagine if Luigi stored everything in one giant table. Every time someone ordered a Margherita pizza, he'd need to save their full name, address, phone number, and pizza details again and again. With hundreds of orders, this creates messy, repeated data.

Instead, smart databases split information into focused tables. This is called normalisation, and it keeps data clean and storage efficient.

What does this look like in the real world?

When I worked at ASOS, we had tables for orders, geographical information, different customer tables (depending on what gets collected you might need to split it up), product tables and inventory so we could see how much of each product we had:

  • users table (customer name, email, address)
  • orders table (order date, total amount, status)
  • products table (item name, price, product image)
  • inventory table (stock volume, where that stock was held)

Without joins, you'd need separate queries for each table, then manually match everything up in your code. That's slow, messy, and breaks easily.

Key Takeaway: Joins let you combine related data from multiple tables in a single query, turning scattered information into useful results.

2. SQL LEFT JOIN vs INNER JOIN vs OUTER JOIN: Complete Guide

Luigi needs different types of joins for different questions. The key is understanding when you want to exclude data (INNER JOIN) versus when you want to keep everything (OUTER JOINs).

Here's the golden rule: Use INNER JOIN to filter data. Use LEFT JOIN to preserve your main data while adding optional information.

Here are Luigi's main tables:

  • customers (customer_id, name, phone, email)
  • orders (order_id, customer_id, order_date, total_amount)
  • pizzas (pizza_id, order_id, pizza_name, size, price)

INNER JOIN: Getting Matched Data Only

When Luigi asks: "Show me customers who have actually ordered pizza this month."

An INNER JOIN only shows rows where data exists in both tables. Think of it as the overlap in a Venn diagram.

SELECT c.name, c.phone, o.order_date, o.order_amt
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2025-01-01';

💡 Pro Tip: Notice the c and o after the table names? These are called aliases - shortcuts that make your code cleaner. Instead of writing customers.name, you can write c.name. Much easier to read and type, especially with longer table names!

This query only returns customers who have actually placed orders. If Maria signed up but never ordered, she won't appear in the results.

When to use INNER JOIN:

  • You only want data that exists in both tables
  • Finding active customers, completed transactions, matched records
  • Most common join type (about 80% of use cases)

LEFT JOIN: Keep All Left Table Rows

When Luigi asks: "Show me ALL customers, even those who haven't ordered yet."

A LEFT JOIN (also called left outer join) keeps every row from the left table, even if there's no match in the right table.

SELECT c.name, c.phone, o.order_date, o.order_amt
FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id;

This shows all customers. For customers who haven't ordered, the order columns will show NULL values.

When to use LEFT JOIN:

  • You want all records from the main table
  • Finding customers who haven't purchased, missing data, optional relationships
  • Very common for reporting and analysis

RIGHT JOIN: Keep All Right Table Rows

When Luigi asks: "Show me all orders, even if customer data got deleted."

A RIGHT JOIN keeps every row from the right table, even if there's no match in the left table.

SELECT c.name, c.phone, o.order_date, o.order_amt
FROM customers c
RIGHT JOIN orders o ON c.customer_id = o.customer_id;

This shows all orders. If an order exists but the customer record was deleted, the customer columns will show NULL.

When to use RIGHT JOIN:

  • Less common than LEFT JOIN
  • Usually better to flip tables and use LEFT JOIN instead
  • Useful for data quality checks

FULL OUTER JOIN: Keep Everything

When Luigi asks: "Show me all customers AND all orders, whether they match or not."

A FULL OUTER JOIN keeps every row from both tables, matching where possible.

SELECT c.name, c.phone, o.order_date, o.order_amt
FROM customers c
FULL OUTER JOIN orders o ON c.customer_id = o.customer_id;

This shows customers without orders AND orders without customer data.

When to use FULL OUTER JOIN:

  • Data analysis and quality checks
  • Finding orphaned records
  • Less common in day-to-day queries

LEFT JOIN vs INNER JOIN: The Critical Difference

This trips up everyone starting with SQL. Here's Luigi's lightbulb moment:

INNER JOIN = Strict matching only

-- INNER JOIN: "Only show customers who have actually ordered"
SELECT c.name, o.total_amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;
-- Result: 87 rows (only customers with orders)

LEFT JOIN = Keep your main data, add optional extras

-- LEFT JOIN: "Show ALL customers, with order info if they have it"
SELECT c.name, o.total_amount
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;
-- Result: 120 rows (all customers, nulls for non-buyers)

🎯 Pro Tip: When in doubt, use LEFT JOIN. It's safer because you won't accidentally lose data from your main table. You can always filter out the NULLs or change the join type later if needed.

Quick Reference: When to Use Each Join

Join Type Use When Example
INNER Only want matched data Active customers with orders
LEFT Want all from main table All customers (ordered or not)
RIGHT Want all from second table All orders (valid customer or not)
FULL OUTER Want everything Data quality analysis

Key Takeaway: Use INNER JOIN to filter rows. LEFT JOIN to keep all your main data while adding optional information from other tables.

3. How to Join Multiple Tables Step-by-Step

Luigi's business is growing! Now he wants to answer more complex questions like "Which customers ordered pepperoni pizza in January?" This needs data from three tables: customers, orders, AND pizzas.

Here's how to build up from simple to complex joins step-by-step.

2-Table Joins: The Foundation

We already saw this with customers and orders. It's the building block for everything else:

-- Show customers who ordered this month
SELECT c.name, o.order_date, o.total_amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2025-01-01';

Simple and clean. Now let's take it up a level.

3-Table Joins: Adding More Detail

Luigi wants to see which customers ordered which specific pizzas. Now we need three tables:

-- Show customers, their orders, AND what pizzas they ordered
SELECT c.name, c.phone, o.order_date, p.pizza_name, p.size
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN pizzas p ON o.order_id = p.order_id
WHERE o.order_date >= '2025-01-01'
ORDER BY c.name, o.order_date;

Notice the pattern: each JOIN connects two tables using their relationship. We build the chain step by step.

4+ Table Joins: Complex Business Questions

Luigi wants to know: "Which customers are allergic to ingredients in the pizzas they ordered?" Now we need four tables:

  • customers (basic info + allergies)
  • orders (order details)
  • pizzas (pizza details)
  • ingredients (what's in each pizza)
-- Find customers who might have ordered pizzas with ingredients they're allergic to
SELECT 
    c.name,
    c.allergies,
    p.pizza_name,
    i.ingredient_name,
    o.order_date
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN pizzas p ON o.order_id = p.order_id
INNER JOIN ingredients i ON p.pizza_id = i.pizza_id
WHERE c.allergies IS NOT NULL
  AND o.order_date >= '2025-01-01'
ORDER BY c.name;

Join Order and Performance

The order in which you write your joins can influence readability and, in some cases, performance.

Luigi learned this the hard way when his dinner rush queries suddenly slowed from milliseconds to several seconds.

Best Practices

1. Filter early
Apply restrictive filters as soon as possible, either in the WHERE clause or in a CTE, to reduce the number of rows the database has to process.

-- Readable version with early filter
SELECT c.name, p.pizza_name, o.order_date
FROM orders o  -- start with filtered orders
INNER JOIN customers c ON o.customer_id = c.customer_id
INNER JOIN pizzas p ON o.order_id = p.order_id
WHERE o.order_date = '2025-01-15';

2. Use table aliases
Short aliases (single letters aren't always the easiest to read, but good for simple queries) make multi-table queries easier to read. 

E.g. At Revolut we had security_trade which I shortened to st.

3. Join in a logical order
Typically: main table → related table → detail table.

4. Consider table size
When joining multiple tables, starting with the table that has fewer rows or is most restrictive can sometimes improve performance, though modern SQL optimisers often handle this automatically.

Note: Modern databases are smart — the optimiser often pushes filters down and chooses the best join order. Writing queries with clear, logical join order and early filters mainly helps readability and maintainability, but can also aid performance in complex queries.

Real-World Multi-Table Example

Here's a real scenario from my time at ASOS: "Show me customers who bought items that are now running low on inventory, so we can recommend similar products."

This needs data from four tables:

  • users (customer name, email, address)
  • orders (order date, total amount, status)
  • products (item name, price, product image)
  • inventory (stock volume, where stock is held)
SELECT 
    u.name,
    u.email,
    p.item_name,
    p.price,
    o.order_date,
    i.stock_volume,
    i.warehouse_location
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id
INNER JOIN products p ON o.product_id = p.product_id
INNER JOIN inventory i ON p.product_id = i.product_id
WHERE i.stock_volume < 10  -- Low stock threshold
  AND o.order_date >= '2025-01-01'
  AND o.status = 'completed'
ORDER BY i.stock_volume ASC, o.order_date DESC

This query helps the merchandising team identify customers to target with "similar items" recommendations when their previously purchased products are running low.

Notice how we joined four tables in sequence, filtering for completed orders and low stock items. At ASOS scale (millions of orders), join order and proper indexing were critical for performance.

💡 Pro Tip: Data Quality Checking Before running complex joins on large datasets, always check your data quality:

-- Check row counts before joining
SELECT 'users' as table_name, COUNT(*) as row_count FROM users
UNION ALL
SELECT 'orders', COUNT(*) FROM orders
UNION ALL  
SELECT 'products', COUNT(*) FROM products;

-- Check for duplicates that could inflate your results
SELECT product_id, COUNT(*) 
FROM inventory 
GROUP BY product_id 
HAVING COUNT(*) > 1;  -- Shows duplicate product IDs

If your join returns way more rows than expected, you probably have duplicate keys creating a Cartesian product!

Key Takeaway: Build complex joins step by step. Start with two tables, get that working, then add the third table, and so on. This makes debugging much easier.

4. SQL Join on Multiple Columns

Sometimes one column isn't enough to uniquely identify a relationship. Luigi discovered this when he started offering different pizza prices for dine-in vs delivery, and during lunch vs dinner hours.

Luigi now has a pricing table that stores different prices based on both the pizza AND the time period:

-- pricing table structure
pizza_id | time_period | location_type | price
---------|-------------|---------------|-------
1        | lunch       | dine_in       | 12.99
1        | lunch       | delivery      | 14.99  
1        | dinner      | dine_in       | 15.99
1        | dinner      | delivery      | 17.99

To get the correct price, Luigi needs to join on multiple columns.

When You Need Composite Keys

Use multiple column joins when:

  • No single column uniquely identifies the relationship
  • You have time-based or location-based data variations
  • Your primary key spans multiple fields

Syntax and Examples

Here's how to join on multiple columns:

-- Basic syntax for multiple column joins
SELECT 
    o.order_id,
    p.pizza_name,
    pr.price,
    pr.time_period,
    pr.location_type
FROM orders o
INNER JOIN pizzas p ON o.order_id = p.order_id
INNER JOIN pricing pr ON p.pizza_id = pr.pizza_id 
                     AND o.time_period = pr.time_period
                     AND o.location_type = pr.location_type;

Real example: Luigi wants to see all lunch orders with correct pricing:

SELECT 
    c.name,
    p.pizza_name,
    p.size,
    pr.price,
    o.order_date
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN pizzas p ON o.order_id = p.order_id  
INNER JOIN pricing pr ON p.pizza_id = pr.pizza_id
                     AND o.time_period = pr.time_period
                     AND o.location_type = pr.location_type
WHERE o.time_period = 'lunch'
  AND o.order_date >= '2025-01-01';

Performance Considerations

Multiple column joins can be slower because the database needs to check more conditions. Here's how to optimise:

1. Create composite indexes:

-- Index on the pricing table for faster lookups
CREATE INDEX idx_pricing_composite 
ON pricing (pizza_id, time_period, location_type);

2. Order your join conditions by selectivity:

-- Put most selective conditions first
INNER JOIN pricing pr ON p.pizza_id = pr.pizza_id          -- Most selective
                     AND o.location_type = pr.location_type -- Medium selective  
                     AND o.time_period = pr.time_period     -- Least selective

3. Consider using EXISTS for complex conditions:

-- Sometimes EXISTS is faster than multiple column joins
SELECT c.name, p.pizza_name
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN pizzas p ON o.order_id = p.order_id
WHERE EXISTS (
    SELECT 1 FROM pricing pr 
    WHERE pr.pizza_id = p.pizza_id
      AND pr.time_period = o.time_period
      AND pr.location_type = o.location_type
      AND pr.price > 15.00
);

💡 Pro Tip: EXISTS vs IN Performance EXISTS stops searching as soon as it finds the first matching row, making it faster than IN or multiple joins when you just need to check if a relationship exists. IN has to collect all matching values before returning results.

4. Use CTEs for complex multi-column logic:

-- Common Table Expressions make complex joins more readable
WITH lunch_pricing AS (
    SELECT pizza_id, location_type, price
    FROM pricing 
    WHERE time_period = 'lunch'
),
recent_orders AS (
    SELECT order_id, customer_id, location_type, order_date
    FROM orders 
    WHERE order_date >= '2025-01-01'
)
SELECT c.name, p.pizza_name, lp.price
FROM customers c
INNER JOIN recent_orders ro ON c.customer_id = ro.customer_id
INNER JOIN pizzas p ON ro.order_id = p.order_id
INNER JOIN lunch_pricing lp ON p.pizza_id = lp.pizza_id 
                           AND ro.location_type = lp.location_type;

CTEs break complex logic into readable chunks and can improve performance by pre-filtering data.

Common Patterns

Date range joins:

-- Join based on date ranges
SELECT c.name, d.discount_rate
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN discounts d ON c.customer_tier = d.customer_tier
                      AND o.order_date BETWEEN d.start_date AND d.end_date;

Geographic joins:

-- Join based on location and postal code
SELECT c.name, dt.delivery_fee
FROM customers c  
INNER JOIN delivery_zones dt ON c.postal_code = dt.postal_code
                            AND c.city = dt.city;

Key Takeaway: Multiple column joins are powerful but can hurt performance. Always create appropriate indexes and test with realistic data volumes.

5. SQL JOIN Performance: Making Your Queries Lightning Fast

Luigi learned this lesson during his busiest Saturday night when customers were waiting 30 seconds for the "order history" page to load. Here's what transformed his 10-second queries into milliseconds:

Indexing really does matter

-- Before: Large table, sluggish
SELECT c.name, COUNT(o.order_id)
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;

-- After adding indexes: engine knows where to look, faster
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_customers_id ON customers(customer_id);

💡 Pro Tip: If the join column is already a primary key (like customers.customer_id), it’s already indexed — no extra index is needed.

Key Takeaway: Always index the columns you use for joins. This lets the database quickly locate matching rows instead of scanning the entire table. On large tables, this change can make a dramatic difference.

JOIN Order Matters More Than You Think

-- Slow: Starts with the biggest table
SELECT c.name, p.pizza_name
FROM customers c  -- 10,000 rows
JOIN orders o ON c.customer_id = o.customer_id  -- 50,000 rows
JOIN pizzas p ON o.order_id = p.order_id        -- 100,000 rows
WHERE o.order_date = '2025-01-15';  -- Only 100 matching rows!

-- Fast: Start with the filter first
SELECT c.name, p.pizza_name
FROM orders o     -- Start here with the filter
JOIN customers c ON o.customer_id = c.customer_id
JOIN pizzas p ON o.order_id = p.order_id
WHERE o.order_date = '2025-01-15';  -- Reduces data early

Essential Performance Tips

1. Plan for data growth:

Every year we collect more data than ever before. Joins that worked fine with small data will become slow eventually.

Strategies for scaling:

  • Partition large tables by date or customer segment
  • Archive old data that's rarely queried
  • Use summary tables for common aggregations
  • Consider read replicas for reporting queries

2. Use EXPLAIN to understand what's happening:

-- PostgreSQL: See what the database is actually doing
EXPLAIN (ANALYZE, BUFFERS) 
SELECT c.name, COUNT(*)
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;

3. Consider query structure for large datasets:

-- Sometimes EXISTS is faster than JOIN for large tables
SELECT c.name, c.email
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.customer_id = c.customer_id 
    AND o.order_date >= '2025-01-01'
);
-- vs
SELECT DISTINCT c.name, c.email
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2025-01-01';

Key Takeaway: Performance optimisation is about understanding your data and using the right tool for each situation.

6. Advanced Join Techniques

Now that you've mastered the basics, let's explore some advanced techniques.

Self Joins Explained

A self join is when a table joins to itself. This sounds odd but it's actually useful for comparing rows within the same table.

Luigi's scenario: He wants to find customers who live in the same city, so he can offer group delivery discounts.

-- Find customers who share the same city
SELECT 
    c1.name AS customer1,
    c2.name AS customer2,
    c1.city
FROM customers c1
INNER JOIN customers c2 ON c1.city = c2.city
                       AND c1.customer_id < c2.customer_id  -- Avoid duplicates
WHERE c1.city IS NOT NULL
ORDER BY c1.city;

Notice how we use c1 and c2 as aliases for the same table. The condition c1.customer_id < c2.customer_id prevents showing the same pair twice.

Another example: Luigi wants to find pizzas with similar prices (within £2 of each other):

-- Find pizzas with similar pricing
SELECT 
    p1.pizza_name AS pizza1,
    p1.price AS price1,
    p2.pizza_name AS pizza2,
    p2.price AS price2,
    ABS(p1.price - p2.price) AS price_difference
FROM pizzas p1
INNER JOIN pizzas p2 ON p1.pizza_id != p2.pizza_id  -- Different pizzas
WHERE ABS(p1.price - p2.price) <= 2.00
ORDER BY price_difference;

INNER JOIN vs LEFT JOIN vs FULL OUTER JOIN

Luigi handles these scenarios daily. Here's when he uses each:

Scenario 1: Monthly sales report (INNER JOIN)

-- "Show only customers who bought something this month"
SELECT c.name, SUM(o.total_amount) as monthly_spend
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2025-01-01'
GROUP BY c.customer_id, c.name;
-- Perfect for sales reports - only shows active customers

Scenario 2: Marketing email list (LEFT JOIN)

-- "Show ALL customers with their order history for targeted emails"
SELECT c.name, c.email, COUNT(o.order_id) as total_orders
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name, c.email;
-- Includes customers who never ordered - perfect for win-back campaigns

Scenario 3: Data audit (FULL OUTER JOIN)

-- "Find all mismatched data between systems"
SELECT 
    COALESCE(c.customer_id, o.customer_id) as id,
    c.name,
    COUNT(o.order_id) as order_count
FROM customers c
FULL OUTER JOIN orders o ON c.customer_id = o.customer_id
WHERE c.customer_id IS NULL OR o.customer_id IS NULL;
-- Shows orphaned records that need cleanup

Using Subqueries with Joins

Subqueries let you filter or calculate data before joining, making complex queries more readable and often faster.

Luigi's challenge: "Show me customers who spent more than the average customer this month."

-- Using a subquery to calculate average first
SELECT 
    c.name,
    c.email,
    SUM(o.total_amount) as total_spent
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2025-01-01'
GROUP BY c.customer_id, c.name, c.email
HAVING SUM(o.total_amount) > (
    SELECT AVG(monthly_total) 
    FROM (
        SELECT SUM(total_amount) as monthly_total
        FROM orders 
        WHERE order_date >= '2025-01-01'
        GROUP BY customer_id
    ) avg_calc
)
ORDER BY total_spent DESC;

Better approach using CTEs:

-- Cleaner with CTEs
WITH monthly_spending AS (
    SELECT 
        customer_id,
        SUM(total_amount) as total_spent
    FROM orders 
    WHERE order_date >= '2025-01-01'
    GROUP BY customer_id
),
average_spending AS (
    SELECT AVG(total_spent) as avg_amount
    FROM monthly_spending
)
SELECT 
    c.name,
    c.email,
    ms.total_spent
FROM customers c
INNER JOIN monthly_spending ms ON c.customer_id = ms.customer_id
CROSS JOIN average_spending avg
WHERE ms.total_spent > avg.avg_amount
ORDER BY ms.total_spent DESC;

💡 Pro Tip: A CROSS JOIN simply combines every row from one table with every row from another table. It can produce a lot of rows, so use it carefully.

Complex Join Conditions

Sometimes you need more than simple equality conditions.

Here are some patterns I've used in production environments:

Range joins: Depending on how much a customer spends, we apply a discount based on ranges.

-- Join orders to discount tiers based on amount ranges
SELECT 
    c.name,
    o.total_amount,
    dt.discount_name,
    dt.discount_percent
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN discount_tiers dt ON o.total_amount BETWEEN dt.min_amount AND dt.max_amount
WHERE o.order_date >= '2025-01-01';

Conditional joins: Different logic based on customer type:

-- Different join logic for VIP vs regular customers
SELECT 
    c.name,
    c.customer_type,
    CASE 
        WHEN c.customer_type = 'VIP' THEN vip.special_price
        ELSE p.regular_price
    END as final_price
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN pizzas p ON o.order_id = p.order_id
LEFT JOIN vip_pricing vip ON p.pizza_id = vip.pizza_id 
                         AND c.customer_type = 'VIP';

Date-based joins with intervals:

-- Find customers who haven't ordered in 30 days
SELECT DISTINCT
    c.name,
    c.email,
    last_order.last_order_date
FROM customers c
INNER JOIN (
    SELECT 
        customer_id,
        MAX(order_date) as last_order_date
    FROM orders
    GROUP BY customer_id
) last_order ON c.customer_id = last_order.customer_id
WHERE last_order.last_order_date < CURRENT_DATE - INTERVAL '30 days';

Key Takeaway: Advanced join techniques solve complex business problems, but always test performance with realistic data volumes. Sometimes the simplest solution is the best one.

7. Common Join Mistakes & How to Fix Them

Let's be honest—we've all written queries that returned millions of rows when we expected thousands, or got zero results when we knew data existed.

I still do, even after working with SQL for ~9 years.

Here are the most common mistakes I've seen (and made), plus how to fix them fast.

Cartesian Products (Too Many Rows)

This is the big one. A Cartesian product happens when you accidentally match every row from one table with every row from another table.

Luigi's disaster: He wanted to see all customers and all pizzas, so he wrote:

-- WRONG: This creates a Cartesian product
SELECT c.name, p.pizza_name
FROM customers c, pizzas p;  -- Missing JOIN condition

If Luigi has 100 customers and 20 pizza types, this returns 2,000 rows instead of the useful data he wanted.

The fix: Always use explicit JOIN syntax with proper conditions:

-- RIGHT: Proper join with condition
SELECT c.name, p.pizza_name
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN pizzas p ON o.order_id = p.order_id;

How to spot Cartesian products:

  • Your query returns way more rows than expected
  • Row count = (table1 rows × table2 rows × table3 rows...)
  • Query runs very slowly

Quick check:

-- Count rows before joining
SELECT COUNT(*) FROM customers;  -- 100 rows
SELECT COUNT(*) FROM pizzas;     -- 20 rows
-- If your join returns 2,000 rows, you have a Cartesian product!

Missing Join Conditions

This usually happens when you add a new table to an existing query but forget to connect it properly.

Luigi's mistake: Adding ingredients table without proper join:

-- WRONG: Missing join between pizzas and ingredients
SELECT c.name, p.pizza_name, i.ingredient_name
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN pizzas p ON o.order_id = p.order_id
INNER JOIN ingredients i;  -- MISSING: ON condition

The fix: Every table needs a join condition:

-- RIGHT: All tables properly connected
SELECT c.name, p.pizza_name, i.ingredient_name
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN pizzas p ON o.order_id = p.order_id
INNER JOIN ingredients i ON p.pizza_id = i.pizza_id;  -- Fixed

Wrong Join Types

Using the wrong join type is subtle but can completely change your results.

Luigi's confusion: He wanted all customers for a marketing email, including those who never ordered:

-- WRONG: INNER JOIN excludes customers without orders
SELECT c.name, c.email, COUNT(o.order_id) as order_count
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id  -- Wrong join type
GROUP BY c.customer_id, c.name, c.email;

This only shows customers who ordered, missing potential targets for his marketing campaign.

The fix: Use LEFT JOIN to include all customers:

-- RIGHT: LEFT JOIN includes all customers
SELECT c.name, c.email, COUNT(o.order_id) as order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id  -- Fixed
GROUP BY c.customer_id, c.name, c.email;

Common join type mistakes:

  • Using INNER JOIN when you need LEFT JOIN (excludes null relationships)
  • Using LEFT JOIN when you need INNER JOIN (includes unwanted nulls)
  • Using RIGHT JOIN instead of flipping tables and using LEFT JOIN

Debugging Techniques

When your joins go wrong, here's how to debug them systematically:

1. Check row counts at each step:

-- Build joins one table at a time
SELECT COUNT(*) FROM customers;                    -- Step 1: 100 rows
SELECT COUNT(*) FROM customers c 
INNER JOIN orders o ON c.customer_id = o.customer_id;  -- Step 2: 150 rows
-- If step 2 suddenly shows 15,000 rows, you have a problem

2. Use LIMIT during development:

-- Always limit rows when testing joins
SELECT c.name, p.pizza_name
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN pizzas p ON o.order_id = p.order_id
LIMIT 10;  -- See sample results first

3. Check for duplicate keys:

-- Find duplicate foreign keys that cause row multiplication
SELECT customer_id, COUNT(*)
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 1;

-- Check for duplicate primary keys (shouldn't happen but...)
SELECT pizza_id, COUNT(*)
FROM pizzas
GROUP BY pizza_id
HAVING COUNT(*) > 1;

4. Use DISTINCT to identify the problem:

-- If this returns fewer rows than your full query, you have duplicates
SELECT DISTINCT c.customer_id, c.name
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN pizzas p ON o.order_id = p.order_id;

5. Examine the execution plan:

-- PostgreSQL: See what the database is actually doing
EXPLAIN (ANALYZE, BUFFERS) 
SELECT c.name, COUNT(*)
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;

Quick Debugging Checklist

When your join results look wrong:

  • Count rows - Does the result make sense?
  • Check join conditions - Is every table properly connected?
  • Verify join types - Do you want ALL records from the left table?
  • Look for duplicates - Are there duplicate keys multiplying rows?
  • Test with LIMIT - Examine a small sample first
  • Use DISTINCT - Does this change your row count?

Real Production Bug Story

At Revolut, we were creating a presentation table for regulatory reporting that should have had around 500,000 rows but was returning 1.2 million rows instead. After investigating, it turned out there was miscommunication between teams. We believed one of our reference tables was unique (one ID per row), but it wasn't due to an accounting rule that allowed multiple entries for the same account under certain conditions.

The business logic was correct, but our join assumptions were wrong. Once we understood the actual data structure, we had to adjust our joins to handle the duplicates properly.

Key Takeaway: Most join problems are syntax errors, not logic errors. Check your join conditions first, then worry about the business logic.

8. Best Practices & Tips

After working with SQL joins for years, here are the habits that separate good developers from great ones.

Writing Readable Join Queries

1. Use consistent formatting and indentation:

-- Hard to read
SELECT c.name,o.order_date,p.pizza_name FROM customers c INNER JOIN orders o ON c.customer_id=o.customer_id INNER JOIN pizzas p ON o.order_id=p.order_id WHERE o.order_date>='2025-01-01';

-- Much better
SELECT 
    c.name,
    o.order_date,
    p.pizza_name
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN pizzas p ON o.order_id = p.order_id
WHERE o.order_date >= '2025-01-01';

2. Use meaningful table aliases:

-- Confusing
SELECT a.name, b.total, c.pizza_name
FROM customers a
INNER JOIN orders b ON a.customer_id = b.customer_id
INNER JOIN pizzas c ON b.order_id = c.order_id;

-- Clear
SELECT 
    cust.name, 
    ord.total_amount, 
    pizza.pizza_name
FROM customers cust
INNER JOIN orders ord ON cust.customer_id = ord.customer_id
INNER JOIN pizzas pizza ON ord.order_id = pizza.order_id;

3. Add comments for complex join logic:

-- Monthly revenue report with customer segmentation
SELECT 
    c.name,
    c.customer_tier,
    SUM(p.price) as monthly_spending
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
    -- Only include completed orders from current month
    AND o.status = 'completed'
    AND o.order_date >= DATE_TRUNC('month', CURRENT_DATE)
INNER JOIN pizzas p ON o.order_id = p.order_id
    -- Exclude promotional items from revenue calculation
    AND p.promotional = false
GROUP BY c.customer_id, c.name, c.customer_tier
ORDER BY monthly_spending DESC;

4. Break complex queries into CTEs:

-- Instead of one giant query, use readable CTEs
WITH completed_orders AS (
    SELECT customer_id, order_id, order_date, total_amount
    FROM orders 
    WHERE status = 'completed' 
      AND order_date >= '2025-01-01'
),
customer_spending AS (
    SELECT 
        c.customer_id,
        c.name,
        SUM(co.total_amount) as total_spent
    FROM customers c
    INNER JOIN completed_orders co ON c.customer_id = co.customer_id
    GROUP BY c.customer_id, c.name
)
SELECT * FROM customer_spending 
WHERE total_spent > 100
ORDER BY total_spent DESC;

Testing Strategies

1. Start small and build up:

-- Step 1: Test with one table
SELECT COUNT(*) FROM customers WHERE city = 'London';

-- Step 2: Add first join
SELECT COUNT(*) 
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE c.city = 'London';

-- Step 3: Add complexity gradually
-- (Add more joins one at a time)

2. Use sample data for development:

-- Create a small test dataset
WITH customers_test AS (
    SELECT * FROM customers LIMIT 100
)
-- Test your joins on small data first
SELECT c.name, COUNT(o.order_id)
FROM customers_test c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;

3. Validate your results:

-- Sanity check: Do the numbers make sense?
SELECT 
    'Total customers' as metric,
    COUNT(*) as count
FROM customers
UNION ALL
SELECT 
    'Customers with orders',
    COUNT(DISTINCT customer_id)
FROM orders
UNION ALL
SELECT 
    'Customers in join result',
    COUNT(DISTINCT c.customer_id)
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;

4. Test edge cases:

-- What happens with NULL values?
SELECT c.name, o.total_amount
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.total_amount IS NULL;  -- Customers who never ordered

-- What about duplicate keys?
SELECT customer_id, COUNT(*)
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 5;  -- Heavy customers

Production Considerations

1. Handle NULL values gracefully:

-- Be explicit about NULL handling
SELECT 
    COALESCE(c.name, 'Unknown Customer') as customer_name,
    COALESCE(o.total_amount, 0) as order_total
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;

2. Document your tables:

/*
 * Customer Lifetime Value Report
 * 
 * Purpose: Calculate CLV for marketing segmentation
 * Dependencies: customers, orders, products tables
 * Performance: ~2 seconds with current data volume
 * 
 * Last updated: 2025-01-20
 * Author: Luigi ([email protected])
 */

9. Frequently Asked Questions: SQL JOINs

These are the questions Luigi wished someone had answered when he started:

Q: What's the difference between LEFT JOIN and LEFT OUTER JOIN?

A: They're exactly the same thing. LEFT JOIN is just shorthand for LEFT OUTER JOIN. Use whichever feels more natural—most developers prefer the shorter LEFT JOIN.

Q: When should I use INNER JOIN vs LEFT JOIN?

A: Use INNER JOIN when you only want data that exists in both tables (like "show customers who placed orders"). Use LEFT JOIN when you want all records from your main table, whether they have matches or not (like "show all customers and their order counts, including zero").

Q: Why is my JOIN returning way more rows than expected?

A: You probably have a Cartesian product from missing join conditions, or duplicate keys in one of your tables. Check your row counts and look for missing ON clauses.

Q: Can I join more than two tables at once?

A: Absolutely! Luigi regularly joins 4-5 tables for complex reports. Just add each table one at a time with proper join conditions. Start simple and build up.

Q: Should I use WHERE or ON in my join conditions?

A: Use ON for join logic (how tables connect) and WHERE for filtering results. This keeps your query readable and performs better.

Q: Is there a limit to how many tables I can join?

A: Technically no, but practically yes. More than 6-7 tables usually means you should reconsider your approach or use CTEs to break the query into readable chunks.

Q: What's the difference between CROSS JOIN and other joins?

A: CROSS JOIN creates a Cartesian product - every row from the first table paired with every row from the second table. It's rarely used intentionally but happens accidentally when you forget join conditions.

Q: How do I join tables with different column names?

A: Just specify both column names in your ON clause:

SELECT c.name, o.order_date
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.cust_id;  -- Different column names

Q: Can I join on calculated fields?

A: Yes, but it can be slow. Better to create the calculation in a CTE first:

WITH customer_segments AS (
    SELECT customer_id, 
           CASE WHEN total_spent > 1000 THEN 'VIP' ELSE 'Regular' END as segment
    FROM customer_totals
)
SELECT c.name, cs.segment
FROM customers c
JOIN customer_segments cs ON c.customer_id = cs.customer_id;

🎯 Luigi's Golden Rule: If you're asking "Should I use INNER or LEFT JOIN?" - try LEFT JOIN first. It's safer because you won't accidentally lose data.

TL;DR: Complete SQL JOINs Summary

Key takeaways:

  • LEFT JOIN is your most-used join type - keeps all records from your main table while adding optional data
  • INNER JOIN for matched data only - when you only want records that exist in both tables
  • Build multi-table joins step-by-step - start with 2 tables, then add complexity gradually
  • Always index your join columns - biggest performance win you'll get
  • Use table aliases and CTEs - makes complex queries readable and maintainable
  • Check for Cartesian products - if your results have way more rows than expected, you're missing a join condition
  • Test with realistic data volumes - queries that work with 1,000 rows might fail with 1 million
  • Create some debugging friends - Write reusable queries that show exactly what PostgreSQL is doing (Duplicate checks, NULL checks etc).

Quick reference for join types:

JOIN Type What It Does Luigi's Use Case
INNER JOIN Only matched records "Show customers who ordered"
LEFT JOIN All left + matches "Show ALL customers + order info"
RIGHT JOIN All right + matches "Show all orders + customer info"
FULL OUTER Everything "Show mismatched data for cleanup"

SQL joins might seem tricky at first, but they're a fundamental SQL skill and if I'm being completely honest, you won't get very far without them.

Start with simple two-table joins, understand your data relationships, and always test with realistic data volumes. Use Luigi's examples as templates, but don't be afraid to experiment with your own scenarios.

The difference between a junior and senior developer often comes down to these fundamentals: proper join types, good performance habits, and readable code that others can maintain.

Like what you've read?

Then you'll love this

Every Friday morning (9am UK time), I send a short, actionable email to over 13,000 professionals keen to develop high-paying data skills.

No fluff, no spam—just honest, transparent lessons from my 8+ year career.

If that sounds interesting, join us—it’s free!

Subscribe here 👇🏻

Unsubscribe at any time.