Skip to main content

🔍 Lesson 10: Subqueries & Nested Queries

Sometimes you need the answer to one question before you can ask another. "Show me all books priced above the average" — you need to know the average first, then use it as a filter. A subquery is a query inside another query. It lets you break complex questions into logical steps, all in a single SQL statement.

🎯 Learning Objectives

By the end of this lesson, you will be able to:

  • Write scalar subqueries that return a single value
  • Use subqueries with IN to match against a list of values
  • Use EXISTS to check for the presence of related rows
  • Write correlated subqueries that reference the outer query
  • Use subqueries in SELECT, WHERE, and FROM clauses
  • Know when a subquery is the right tool and when it isn't

Estimated Time: 50 minutes

Prerequisites: Aggregate functions & GROUP BY (Lesson 9)

📑 In This Lesson

What Is a Subquery?

A subquery (also called an inner query or nested query) is a complete SELECT statement enclosed in parentheses, placed inside another SQL statement. The outer statement is called the outer query.

graph LR subgraph Outer_Query["Outer Query"] A["SELECT title, price FROM books"] B["WHERE price >"] subgraph Inner_Query["Subquery"] C["(SELECT AVG(price) FROM books)"] end B --> C end

The subquery runs first and produces a result. That result is then used by the outer query. Think of it like filling in a blank:


-- Step 1: What's the average price?
SELECT AVG(price) FROM books;   -- Returns: 10.99

-- Step 2: Which books cost more than that?
SELECT title, price FROM books WHERE price > 10.99;

-- Combined into one query with a subquery:
SELECT title, price FROM books
WHERE price > (SELECT AVG(price) FROM books);
                

Subqueries can appear in several places:

Location What It Does Example
WHERE Filter rows based on a computed value WHERE price > (SELECT AVG(price) ...)
FROM Use a subquery result as a temporary table FROM (SELECT ... ) AS temp
SELECT Compute a value for each row SELECT (SELECT COUNT(*) ...)

Scalar Subqueries (Single Value)

A scalar subquery returns exactly one row and one column — a single value. This is the most common and simplest type.

Comparing Against an Aggregate


-- Books priced above average
SELECT title, price
FROM books
WHERE price > (SELECT AVG(price) FROM books);
                

Output:


+-----------------------+-------+
| title                 | price |
+-----------------------+-------+
| To Kill a Mockingbird | 11.99 |
| Dune                  | 14.99 |
| Lord of the Rings     | 15.99 |
+-----------------------+-------+
                    

MySQL first executes the subquery (SELECT AVG(price) FROM books) → 10.99, then substitutes it into the outer query: WHERE price > 10.99.


-- Books priced below average
SELECT title, price
FROM books
WHERE price < (SELECT AVG(price) FROM books);
                

Output:


+------------------------+-------+
| title                  | price |
+------------------------+-------+
| Pride and Prejudice    |  7.99 |
| The Great Gatsby       |  8.99 |
| The Catcher in the Rye |  8.99 |
| Brave New World        |  9.49 |
| 1984                   |  9.99 |
| Fahrenheit 451         | 10.49 |
| The Hobbit             | 10.99 |
+------------------------+-------+
                    

Using MIN and MAX in Subqueries


-- Which book has the highest price? (full row info!)
SELECT title, author, price
FROM books
WHERE price = (SELECT MAX(price) FROM books);
                

Output:


+-------------------+----------------+-------+
| title             | author         | price |
+-------------------+----------------+-------+
| Lord of the Rings | J.R.R. Tolkien | 15.99 |
+-------------------+----------------+-------+
                    

💡 Subquery vs ORDER BY + LIMIT

Remember how MIN(price) tells you the value but not which row? A subquery solves that. Compare these two equivalent approaches:


-- Approach 1: Subquery
SELECT * FROM books WHERE price = (SELECT MIN(price) FROM books);

-- Approach 2: ORDER BY + LIMIT
SELECT * FROM books ORDER BY price ASC LIMIT 1;
                    

Both work for "find the cheapest book." The subquery approach is more flexible — it can return all books tied at the minimum price, while LIMIT 1 only returns one.


-- Book(s) with the most pages
SELECT title, pages
FROM books
WHERE pages = (SELECT MAX(pages) FROM books);

-- Book(s) published earliest
SELECT title, author, published
FROM books
WHERE published = (SELECT MIN(published) FROM books);
                

Scalar Subquery with Arithmetic


-- Books priced more than DOUBLE the cheapest book
SELECT title, price
FROM books
WHERE price > (SELECT MIN(price) FROM books) * 2;
                

Output:


+-----------------------+-------+
| title                 | price |
+-----------------------+-------+
| To Kill a Mockingbird | 11.99 |
| Dune                  | 14.99 |
| Lord of the Rings     | 15.99 |
+-----------------------+-------+
                    

The cheapest book is $7.99, so double is $15.98 — only books above that price are returned.

Subqueries with IN (List of Values)

When a subquery returns multiple rows (but one column), you can't use = — you need IN. The subquery produces a list, and IN checks if a value is in that list.

Basic IN Subquery


-- Find customers who live in cities that have more than 1 customer
SELECT first_name, last_name, city
FROM customers
WHERE city IN (
    SELECT city
    FROM customers
    WHERE city IS NOT NULL
    GROUP BY city
    HAVING COUNT(*) > 1
);
                

Output:


+------------+-----------+----------+
| first_name | last_name | city     |
+------------+-----------+----------+
| Alice      | Johnson   | New York |
| Bob        | Smith     | Chicago  |
| Frank      | Miller    | New York |
| Grace      | Wilson    | Chicago  |
+------------+-----------+----------+
                    

The inner query returns the cities with more than one customer (New York, Chicago). The outer query then finds all customers in those cities.

💡 How It Works Step by Step

1. Inner query runs: SELECT city ... HAVING COUNT(*) > 1 → returns ('New York', 'Chicago')

2. Outer query becomes: WHERE city IN ('New York', 'Chicago')

3. All customers matching those cities are returned

NOT IN — Excluding Matches


-- Customers NOT in multi-customer cities
SELECT first_name, last_name, city
FROM customers
WHERE city IS NOT NULL
AND city NOT IN (
    SELECT city
    FROM customers
    WHERE city IS NOT NULL
    GROUP BY city
    HAVING COUNT(*) > 1
);
                

Output:


+------------+-----------+-------------+
| first_name | last_name | city        |
+------------+-----------+-------------+
| Carol      | Williams  | Los Angeles |
| Dave       | Brown     | Houston     |
+------------+-----------+-------------+
                    

⚠️ NOT IN and NULL — A Dangerous Trap

If the subquery returns any NULL values, NOT IN returns no rows at all. This is one of SQL's most notorious gotchas:


-- If the subquery returns ('New York', 'Chicago', NULL):
-- NOT IN ('New York', 'Chicago', NULL) → always FALSE!
-- The NULL makes every comparison UNKNOWN

-- Fix: Filter out NULLs in the subquery
WHERE city NOT IN (
    SELECT city FROM customers WHERE city IS NOT NULL ...
);
                    

Always add WHERE column IS NOT NULL inside subqueries used with NOT IN.

Cross-Table IN Subqueries

Subqueries are especially powerful when they reference a different table. Once we add more tables in later lessons (like orders and order_items), you'll see patterns like:


-- Conceptual: Find customers who have placed orders
-- (we'll build these tables in Module 4)
-- SELECT first_name, last_name FROM customers
-- WHERE id IN (SELECT customer_id FROM orders);

-- Conceptual: Find books that have never been ordered
-- SELECT title FROM books
-- WHERE id NOT IN (SELECT book_id FROM order_items);
                

For now, focus on the pattern — you'll use it constantly once tables are linked with foreign keys.

Subqueries with Comparison Operators

Beyond = and IN, you can use ANY, ALL, and SOME with subqueries that return multiple values.

ANY / SOME — Match At Least One

ANY (and its synonym SOME) returns TRUE if the comparison is true for at least one value from the subquery:


-- Books cheaper than ANY out-of-stock book
-- (i.e., cheaper than the MOST EXPENSIVE out-of-stock book)
SELECT title, price
FROM books
WHERE price < ANY (
    SELECT price FROM books WHERE in_stock = FALSE
);
                

💡 ANY Simplified

price < ANY (subquery) means: "price is less than at least one value in the list." This is equivalent to price < MAX(values_from_subquery).

ALL — Match Every Value

ALL returns TRUE only if the comparison is true for every value from the subquery:


-- Books cheaper than ALL out-of-stock books
-- (i.e., cheaper than the CHEAPEST out-of-stock book)
SELECT title, price
FROM books
WHERE price < ALL (
    SELECT price FROM books WHERE in_stock = FALSE
);
                

💡 ALL Simplified

price < ALL (subquery) means: "price is less than every single value in the list." This is equivalent to price < MIN(values_from_subquery).

Expression Equivalent To
= ANY (...) IN (...)
< ANY (...) < MAX(...)
> ANY (...) > MIN(...)
< ALL (...) < MIN(...)
> ALL (...) > MAX(...)

In practice, ANY and ALL are less common than IN and scalar subqueries. Most developers find the MIN/MAX equivalents easier to read.

EXISTS — Checking for Related Rows

EXISTS checks whether a subquery returns any rows at all. It doesn't care about the values — just whether rows exist or not. It returns TRUE if the subquery returns at least one row, FALSE if it returns none.


-- Do we have any books priced over $20?
SELECT EXISTS (SELECT 1 FROM books WHERE price > 20) AS has_expensive;
                

Output:


+---------------+
| has_expensive |
+---------------+
|             0 |
+---------------+
                    

-- Do we have any books priced over $15?
SELECT EXISTS (SELECT 1 FROM books WHERE price > 15) AS has_expensive;
                

Output:


+---------------+
| has_expensive |
+---------------+
|             1 |
+---------------+
                    

💡 Why SELECT 1?

Inside EXISTS, the actual columns selected don't matter — MySQL only checks if any rows are returned. SELECT 1 is a convention that makes this clear, but SELECT * or SELECT column would work identically.

EXISTS with Correlated Subqueries

EXISTS becomes truly powerful when combined with correlated subqueries (covered in the next section). Here's a preview:


-- Conceptual: Find customers who have placed at least one order
-- SELECT c.first_name, c.last_name
-- FROM customers c
-- WHERE EXISTS (
--     SELECT 1 FROM orders o WHERE o.customer_id = c.id
-- );
                

NOT EXISTS


-- Conceptual: Find customers who have NEVER placed an order
-- SELECT c.first_name, c.last_name
-- FROM customers c
-- WHERE NOT EXISTS (
--     SELECT 1 FROM orders o WHERE o.customer_id = c.id
-- );
                

✅ EXISTS vs IN

IN and EXISTS often solve the same problem, but they work differently:

  • IN — best when the subquery returns a small list of values
  • EXISTS — best when checking for the presence of related rows, especially with large tables
  • NOT EXISTS is generally safer than NOT IN because it handles NULLs correctly

Correlated Subqueries

A correlated subquery references a column from the outer query. Unlike a regular subquery that runs once, a correlated subquery runs once for each row of the outer query.

graph TD subgraph Regular["Regular Subquery"] A1["Inner query runs ONCE"] --> A2["Result used for ALL outer rows"] end subgraph Correlated["Correlated Subquery"] B1["Outer query processes Row 1"] --> B2["Inner query runs for Row 1"] B3["Outer query processes Row 2"] --> B4["Inner query runs for Row 2"] B5["Outer query processes Row 3"] --> B6["Inner query runs for Row 3"] B7["...repeat for each row..."] end

Example: Books Above Their Author's Average Price

Imagine an author has multiple books and you want to find books priced above that specific author's average (not the overall average):


-- Books priced above their author's average price
SELECT b1.title, b1.author, b1.price
FROM books b1
WHERE b1.price > (
    SELECT AVG(b2.price)
    FROM books b2
    WHERE b2.author = b1.author   -- references the outer query!
);
                

Notice b1.author in the subquery — it references the outer query's current row. For each book in the outer query, MySQL recalculates the average price for that specific author.

💡 Table Aliases Are Essential

When a correlated subquery references the same table as the outer query, you must use aliases to distinguish them. Here, b1 is the outer copy and b2 is the inner copy of the books table.

Example: Books With Above-Average Page Count for Their Price Range


-- Books with more pages than the average book in their price range
-- (price range = within $2 of the book's price)
SELECT b1.title, b1.price, b1.pages
FROM books b1
WHERE b1.pages > (
    SELECT AVG(b2.pages)
    FROM books b2
    WHERE b2.price BETWEEN b1.price - 2 AND b1.price + 2
);
                

⚠️ Performance Warning

Correlated subqueries can be slow on large tables because the inner query runs once per outer row. If your outer query returns 10,000 rows, the inner query runs 10,000 times. For performance-critical queries, JOINs (covered in Lesson 13) are usually faster.

For learning and small datasets, correlated subqueries are perfectly fine and often more readable.

Subqueries in the FROM Clause

A subquery in the FROM clause creates a derived table (also called an inline view) — a temporary result set that you can query like a regular table.


-- Create a summary, then query it
SELECT price_range, book_count, avg_price
FROM (
    SELECT
        CASE
            WHEN price < 10 THEN 'Budget'
            WHEN price < 13 THEN 'Mid-Range'
            ELSE 'Premium'
        END AS price_range,
        COUNT(*) AS book_count,
        ROUND(AVG(price), 2) AS avg_price
    FROM books
    GROUP BY price_range
) AS price_summary
WHERE book_count > 1
ORDER BY avg_price;
                

Output:


+-------------+------------+-----------+
| price_range | book_count | avg_price |
+-------------+------------+-----------+
| Budget      |          5 |      9.09 |
| Mid-Range   |          3 |     10.82 |
+-------------+------------+-----------+
                    

⚠️ Derived Tables Need an Alias

MySQL requires every derived table to have an alias. The AS price_summary part is mandatory — without it, you'll get a syntax error:


-- ⛔ WRONG — missing alias
SELECT * FROM (SELECT COUNT(*) FROM books);

-- ✅ CORRECT — alias provided
SELECT * FROM (SELECT COUNT(*) AS total FROM books) AS counts;
                    

Why Use FROM Subqueries?

They're useful when you need to:

  • Filter on aggregated results without HAVING (more flexible)
  • Perform calculations on aggregated data — like finding the average of averages
  • Pre-process data before joining it with another table

-- Average number of books per author
-- Step 1: Count books per author (inner query)
-- Step 2: Average those counts (outer query)
SELECT ROUND(AVG(book_count), 1) AS avg_books_per_author
FROM (
    SELECT author, COUNT(*) AS book_count
    FROM books
    GROUP BY author
) AS author_counts;
                

Output:


+----------------------+
| avg_books_per_author |
+----------------------+
|                  1.0 |
+----------------------+
                    

Subqueries in the SELECT Clause

A scalar subquery in the SELECT clause adds a computed column to each row. It must return exactly one value.


-- Show each book with the overall average for comparison
SELECT
    title,
    price,
    (SELECT ROUND(AVG(price), 2) FROM books) AS avg_price,
    ROUND(price - (SELECT AVG(price) FROM books), 2) AS diff_from_avg
FROM books
ORDER BY diff_from_avg DESC;
                

Output:


+-------------------------+-------+-----------+---------------+
| title                   | price | avg_price | diff_from_avg |
+-------------------------+-------+-----------+---------------+
| Lord of the Rings       | 15.99 |     10.99 |          5.00 |
| Dune                    | 14.99 |     10.99 |          4.00 |
| To Kill a Mockingbird   | 11.99 |     10.99 |          1.00 |
| The Hobbit              | 10.99 |     10.99 |          0.00 |
| Fahrenheit 451          | 10.49 |     10.99 |         -0.50 |
| 1984                    |  9.99 |     10.99 |         -1.00 |
| Brave New World         |  9.49 |     10.99 |         -1.50 |
| The Great Gatsby        |  8.99 |     10.99 |         -2.00 |
| The Catcher in the Rye  |  8.99 |     10.99 |         -2.00 |
| Pride and Prejudice     |  7.99 |     10.99 |         -3.00 |
+-------------------------+-------+-----------+---------------+
                    

-- Show each book with a percentage of total value
SELECT
    title,
    price,
    ROUND(price / (SELECT SUM(price) FROM books) * 100, 1) AS pct_of_total
FROM books
ORDER BY pct_of_total DESC;
                

Output:


+-------------------------+-------+--------------+
| title                   | price | pct_of_total |
+-------------------------+-------+--------------+
| Lord of the Rings       | 15.99 |         14.5 |
| Dune                    | 14.99 |         13.6 |
| To Kill a Mockingbird   | 11.99 |         10.9 |
| The Hobbit              | 10.99 |         10.0 |
| Fahrenheit 451          | 10.49 |          9.5 |
| 1984                    |  9.99 |          9.1 |
| Brave New World         |  9.49 |          8.6 |
| The Great Gatsby        |  8.99 |          8.2 |
| The Catcher in the Rye  |  8.99 |          8.2 |
| Pride and Prejudice     |  7.99 |          7.3 |
+-------------------------+-------+--------------+
                    

When to Use Subqueries

✅ Good Uses for Subqueries

  • Comparing to aggregates — "above average," "equal to the max," "below the median"
  • Filtering with IN/NOT IN — "customers in cities that have X"
  • Existence checks — "customers who have/haven't ordered" (with EXISTS)
  • Derived tables — when you need to aggregate, then filter or aggregate again
  • Adding context columns — showing each row alongside a global metric

⚠️ When JOINs Are Better

  • Combining columns from multiple tables — JOINs are clearer and usually faster
  • Correlated subqueries on large tables — JOINs avoid the "run once per row" penalty
  • When readability suffers — deeply nested subqueries become hard to debug

You'll learn JOINs in Lesson 13. Many problems can be solved with either subqueries or JOINs — experienced developers choose whichever is clearer for the specific situation.

Nesting Depth

Subqueries can be nested — a subquery inside a subquery. MySQL supports deep nesting, but readability drops fast:


-- Two levels of nesting (this is about the practical limit)
SELECT title, price
FROM books
WHERE price > (
    SELECT AVG(price) FROM books
    WHERE author IN (
        SELECT DISTINCT author FROM books WHERE in_stock = TRUE
    )
);
                

💡 Rule of Thumb

If you find yourself nesting more than 2 levels deep, step back and consider whether the query can be restructured with JOINs, temporary tables, or common table expressions (CTEs — an advanced topic for later).

Exercises

🏋️ Exercise 1: Scalar Subqueries

Write queries to find:

  1. All books priced above the average price
  2. The title and author of the book with the most pages
  3. All books that cost less than the cheapest out-of-stock book
  4. Books with more pages than the average page count
✅ Solution

-- 1
SELECT title, price FROM books
WHERE price > (SELECT AVG(price) FROM books);

-- 2
SELECT title, author, pages FROM books
WHERE pages = (SELECT MAX(pages) FROM books);

-- 3
SELECT title, price FROM books
WHERE price < (
    SELECT MIN(price) FROM books WHERE in_stock = FALSE
);

-- 4
SELECT title, pages FROM books
WHERE pages > (SELECT AVG(pages) FROM books);
                        

🏋️ Exercise 2: IN Subqueries

Write queries to:

  1. Find all customers who live in a city that has exactly 1 customer
  2. Find all books whose price matches any out-of-stock book's price
  3. Find customers who do NOT live in cities with multiple customers (exclude NULLs)
✅ Solution

-- 1
SELECT first_name, last_name, city FROM customers
WHERE city IN (
    SELECT city FROM customers
    WHERE city IS NOT NULL
    GROUP BY city
    HAVING COUNT(*) = 1
);

-- 2
SELECT title, price FROM books
WHERE price IN (
    SELECT price FROM books WHERE in_stock = FALSE
);

-- 3
SELECT first_name, last_name, city FROM customers
WHERE city IS NOT NULL
AND city NOT IN (
    SELECT city FROM customers
    WHERE city IS NOT NULL
    GROUP BY city
    HAVING COUNT(*) > 1
);
                        

🏋️ Exercise 3: SELECT & FROM Subqueries

Write queries to:

  1. Show each book's title, price, and its percentage of the total price of all books (rounded to 1 decimal)
  2. Show each book's title, price, and how far it is from the average (price - avg)
  3. Calculate the average number of customers per city (hint: count per city first, then average those counts)
✅ Solution

-- 1
SELECT
    title,
    price,
    ROUND(price / (SELECT SUM(price) FROM books) * 100, 1) AS pct_of_total
FROM books
ORDER BY pct_of_total DESC;

-- 2
SELECT
    title,
    price,
    ROUND(price - (SELECT AVG(price) FROM books), 2) AS diff_from_avg
FROM books
ORDER BY diff_from_avg DESC;

-- 3
SELECT ROUND(AVG(customer_count), 1) AS avg_customers_per_city
FROM (
    SELECT city, COUNT(*) AS customer_count
    FROM customers
    WHERE city IS NOT NULL
    GROUP BY city
) AS city_counts;
                        

🏋️ Exercise 4: Combined Challenge

Write a single query that shows all books that are:

  • Currently in stock
  • Priced above the average price of ALL books (including out-of-stock)
  • Include the title, author, price, and the average price for comparison
  • Sort by how far above the average they are (highest first)
✅ Solution

SELECT
    title,
    author,
    price,
    (SELECT ROUND(AVG(price), 2) FROM books) AS avg_price,
    ROUND(price - (SELECT AVG(price) FROM books), 2) AS above_avg
FROM books
WHERE in_stock = TRUE
  AND price > (SELECT AVG(price) FROM books)
ORDER BY above_avg DESC;
                        

🎯 Quick Quiz

Question 1: What type of subquery returns exactly one row and one column?

Question 2: What is the danger of using NOT IN with a subquery?

Question 3: How does a correlated subquery differ from a regular subquery?

Question 4: What is required when using a subquery in the FROM clause?

Summary

🎉 Key Takeaways

  • A subquery is a SELECT inside another SQL statement, enclosed in parentheses
  • Scalar subqueries return one value — use with =, >, <, etc.
  • Use IN when a subquery returns multiple values (a list)
  • NOT IN fails silently when the subquery returns NULL — always filter NULLs
  • EXISTS checks whether any rows exist — great for presence/absence checks
  • Correlated subqueries reference the outer query and run once per row
  • FROM subqueries (derived tables) require an alias and act like temporary tables
  • SELECT subqueries add computed columns alongside regular data
  • Avoid nesting more than 2 levels — consider JOINs or CTEs for complex queries
  • JOINs often outperform correlated subqueries on large datasets

Quick Reference

Pattern Syntax
Compare to aggregate WHERE col > (SELECT AVG(col) FROM t)
Match a value WHERE col = (SELECT MAX(col) FROM t)
Match a list WHERE col IN (SELECT col FROM t2)
Exclude a list WHERE col NOT IN (SELECT col FROM t2 WHERE col IS NOT NULL)
Check existence WHERE EXISTS (SELECT 1 FROM t2 WHERE ...)
Derived table FROM (SELECT ... ) AS alias
Computed column SELECT (SELECT AGG(col) FROM t) AS name

📚 Additional Resources

🚀 What's Next?

You've completed Module 3! You can now filter, sort, aggregate, and nest queries — that's a serious SQL toolkit. In Module 4: Relationships & Joins, you'll learn how real databases connect multiple tables together. First up: Table Relationships — understanding one-to-one, one-to-many, and many-to-many relationships, and why splitting data across tables is the foundation of good database design.

🎉 Module 3 Complete!

You've mastered querying data — filtering, sorting, aggregating, and nesting. Your SQL is now powerful enough to answer complex, multi-step questions about your data!