Subqueries & Nested Queries - MySQL Foundations

What Is a Subquery?

A subquery (also called an inner query or nested query) is a complete SELECT statement enclosed in parentheses, placed inside another SQL statement. The outer statement is called the outer query.

graph LR subgraph Outer_Query["Outer Query"] A["SELECT title, price FROM books"] B["WHERE price >"] subgraph Inner_Query["Subquery"] C["(SELECT AVG(price) FROM books)"] end B --> C end

The subquery runs first and produces a result. That result is then used by the outer query. Think of it like filling in a blank:


-- Step 1: What's the average price?
SELECT AVG(price) FROM books;   -- Returns: 10.99

-- Step 2: Which books cost more than that?
SELECT title, price FROM books WHERE price > 10.99;

-- Combined into one query with a subquery:
SELECT title, price FROM books
WHERE price > (SELECT AVG(price) FROM books);

Subqueries can appear in several places:

Location	What It Does	Example
`WHERE`	Filter rows based on a computed value	`WHERE price > (SELECT AVG(price) ...)`
`FROM`	Use a subquery result as a temporary table	`FROM (SELECT ... ) AS temp`
`SELECT`	Compute a value for each row	`SELECT (SELECT COUNT(*) ...)`

Scalar Subqueries (Single Value)

A scalar subquery returns exactly one row and one column — a single value. This is the most common and simplest type.

Comparing Against an Aggregate


-- Books priced above average
SELECT title, price
FROM books
WHERE price > (SELECT AVG(price) FROM books);

Output:


+-----------------------+-------+
| title                 | price |
+-----------------------+-------+
| To Kill a Mockingbird | 11.99 |
| Dune                  | 14.99 |
| Lord of the Rings     | 15.99 |
+-----------------------+-------+

MySQL first executes the subquery (SELECT AVG(price) FROM books) → 10.99, then substitutes it into the outer query: WHERE price > 10.99.


-- Books priced below average
SELECT title, price
FROM books
WHERE price < (SELECT AVG(price) FROM books);

Output:


+------------------------+-------+
| title                  | price |
+------------------------+-------+
| Pride and Prejudice    |  7.99 |
| The Great Gatsby       |  8.99 |
| The Catcher in the Rye |  8.99 |
| Brave New World        |  9.49 |
| 1984                   |  9.99 |
| Fahrenheit 451         | 10.49 |
| The Hobbit             | 10.99 |
+------------------------+-------+

Using MIN and MAX in Subqueries


-- Which book has the highest price? (full row info!)
SELECT title, author, price
FROM books
WHERE price = (SELECT MAX(price) FROM books);

Output:


+-------------------+----------------+-------+
| title             | author         | price |
+-------------------+----------------+-------+
| Lord of the Rings | J.R.R. Tolkien | 15.99 |
+-------------------+----------------+-------+

💡 Subquery vs ORDER BY + LIMIT

Remember how MIN(price) tells you the value but not which row? A subquery solves that. Compare these two equivalent approaches:


-- Approach 1: Subquery
SELECT * FROM books WHERE price = (SELECT MIN(price) FROM books);

-- Approach 2: ORDER BY + LIMIT
SELECT * FROM books ORDER BY price ASC LIMIT 1;

Both work for "find the cheapest book." The subquery approach is more flexible — it can return all books tied at the minimum price, while LIMIT 1 only returns one.


-- Book(s) with the most pages
SELECT title, pages
FROM books
WHERE pages = (SELECT MAX(pages) FROM books);

-- Book(s) published earliest
SELECT title, author, published
FROM books
WHERE published = (SELECT MIN(published) FROM books);

Scalar Subquery with Arithmetic


-- Books priced more than DOUBLE the cheapest book
SELECT title, price
FROM books
WHERE price > (SELECT MIN(price) FROM books) * 2;

Output:


+-----------------------+-------+
| title                 | price |
+-----------------------+-------+
| To Kill a Mockingbird | 11.99 |
| Dune                  | 14.99 |
| Lord of the Rings     | 15.99 |
+-----------------------+-------+

The cheapest book is $7.99, so double is $15.98 — only books above that price are returned.

Subqueries with IN (List of Values)

When a subquery returns multiple rows (but one column), you can't use = — you need IN. The subquery produces a list, and IN checks if a value is in that list.

Basic IN Subquery


-- Find customers who live in cities that have more than 1 customer
SELECT first_name, last_name, city
FROM customers
WHERE city IN (
    SELECT city
    FROM customers
    WHERE city IS NOT NULL
    GROUP BY city
    HAVING COUNT(*) > 1
);

Output:


+------------+-----------+----------+
| first_name | last_name | city     |
+------------+-----------+----------+
| Alice      | Johnson   | New York |
| Bob        | Smith     | Chicago  |
| Frank      | Miller    | New York |
| Grace      | Wilson    | Chicago  |
+------------+-----------+----------+

The inner query returns the cities with more than one customer (New York, Chicago). The outer query then finds all customers in those cities.

💡 How It Works Step by Step

1. Inner query runs: SELECT city ... HAVING COUNT(*) > 1 → returns ('New York', 'Chicago')

2. Outer query becomes: WHERE city IN ('New York', 'Chicago')

3. All customers matching those cities are returned

NOT IN — Excluding Matches


-- Customers NOT in multi-customer cities
SELECT first_name, last_name, city
FROM customers
WHERE city IS NOT NULL
AND city NOT IN (
    SELECT city
    FROM customers
    WHERE city IS NOT NULL
    GROUP BY city
    HAVING COUNT(*) > 1
);

Output:


+------------+-----------+-------------+
| first_name | last_name | city        |
+------------+-----------+-------------+
| Carol      | Williams  | Los Angeles |
| Dave       | Brown     | Houston     |
+------------+-----------+-------------+

⚠️ NOT IN and NULL — A Dangerous Trap

If the subquery returns any NULL values, NOT IN returns no rows at all. This is one of SQL's most notorious gotchas:


-- If the subquery returns ('New York', 'Chicago', NULL):
-- NOT IN ('New York', 'Chicago', NULL) → always FALSE!
-- The NULL makes every comparison UNKNOWN

-- Fix: Filter out NULLs in the subquery
WHERE city NOT IN (
    SELECT city FROM customers WHERE city IS NOT NULL ...
);

Always add WHERE column IS NOT NULL inside subqueries used with NOT IN.

Cross-Table IN Subqueries

Subqueries are especially powerful when they reference a different table. Once we add more tables in later lessons (like orders and order_items), you'll see patterns like:


-- Conceptual: Find customers who have placed orders
-- (we'll build these tables in Module 4)
-- SELECT first_name, last_name FROM customers
-- WHERE id IN (SELECT customer_id FROM orders);

-- Conceptual: Find books that have never been ordered
-- SELECT title FROM books
-- WHERE id NOT IN (SELECT book_id FROM order_items);

For now, focus on the pattern — you'll use it constantly once tables are linked with foreign keys.

Subqueries with Comparison Operators

Beyond = and IN, you can use ANY, ALL, and SOME with subqueries that return multiple values.

ANY / SOME — Match At Least One

ANY (and its synonym SOME) returns TRUE if the comparison is true for at least one value from the subquery:


-- Books cheaper than ANY out-of-stock book
-- (i.e., cheaper than the MOST EXPENSIVE out-of-stock book)
SELECT title, price
FROM books
WHERE price < ANY (
    SELECT price FROM books WHERE in_stock = FALSE
);

💡 ANY Simplified

price < ANY (subquery) means: "price is less than at least one value in the list." This is equivalent to price < MAX(values_from_subquery).

ALL — Match Every Value

ALL returns TRUE only if the comparison is true for every value from the subquery:


-- Books cheaper than ALL out-of-stock books
-- (i.e., cheaper than the CHEAPEST out-of-stock book)
SELECT title, price
FROM books
WHERE price < ALL (
    SELECT price FROM books WHERE in_stock = FALSE
);

💡 ALL Simplified

price < ALL (subquery) means: "price is less than every single value in the list." This is equivalent to price < MIN(values_from_subquery).

Expression	Equivalent To
`= ANY (...)`	`IN (...)`
`< ANY (...)`	`< MAX(...)`
`> ANY (...)`	`> MIN(...)`
`< ALL (...)`	`< MIN(...)`
`> ALL (...)`	`> MAX(...)`

In practice, ANY and ALL are less common than IN and scalar subqueries. Most developers find the MIN/MAX equivalents easier to read.

EXISTS — Checking for Related Rows

EXISTS checks whether a subquery returns any rows at all. It doesn't care about the values — just whether rows exist or not. It returns TRUE if the subquery returns at least one row, FALSE if it returns none.


-- Do we have any books priced over $20?
SELECT EXISTS (SELECT 1 FROM books WHERE price > 20) AS has_expensive;

Output:


+---------------+
| has_expensive |
+---------------+
|             0 |
+---------------+


-- Do we have any books priced over $15?
SELECT EXISTS (SELECT 1 FROM books WHERE price > 15) AS has_expensive;

Output:


+---------------+
| has_expensive |
+---------------+
|             1 |
+---------------+

💡 Why SELECT 1?

Inside EXISTS, the actual columns selected don't matter — MySQL only checks if any rows are returned. SELECT 1 is a convention that makes this clear, but SELECT * or SELECT column would work identically.

EXISTS with Correlated Subqueries

EXISTS becomes truly powerful when combined with correlated subqueries (covered in the next section). Here's a preview:


-- Conceptual: Find customers who have placed at least one order
-- SELECT c.first_name, c.last_name
-- FROM customers c
-- WHERE EXISTS (
--     SELECT 1 FROM orders o WHERE o.customer_id = c.id
-- );

NOT EXISTS


-- Conceptual: Find customers who have NEVER placed an order
-- SELECT c.first_name, c.last_name
-- FROM customers c
-- WHERE NOT EXISTS (
--     SELECT 1 FROM orders o WHERE o.customer_id = c.id
-- );

✅ EXISTS vs IN

IN and EXISTS often solve the same problem, but they work differently:

IN — best when the subquery returns a small list of values
EXISTS — best when checking for the presence of related rows, especially with large tables
NOT EXISTS is generally safer than NOT IN because it handles NULLs correctly

Correlated Subqueries

A correlated subquery references a column from the outer query. Unlike a regular subquery that runs once, a correlated subquery runs once for each row of the outer query.

graph TD subgraph Regular["Regular Subquery"] A1["Inner query runs ONCE"] --> A2["Result used for ALL outer rows"] end subgraph Correlated["Correlated Subquery"] B1["Outer query processes Row 1"] --> B2["Inner query runs for Row 1"] B3["Outer query processes Row 2"] --> B4["Inner query runs for Row 2"] B5["Outer query processes Row 3"] --> B6["Inner query runs for Row 3"] B7["...repeat for each row..."] end

Example: Books Above Their Author's Average Price

Imagine an author has multiple books and you want to find books priced above that specific author's average (not the overall average):


-- Books priced above their author's average price
SELECT b1.title, b1.author, b1.price
FROM books b1
WHERE b1.price > (
    SELECT AVG(b2.price)
    FROM books b2
    WHERE b2.author = b1.author   -- references the outer query!
);

Notice b1.author in the subquery — it references the outer query's current row. For each book in the outer query, MySQL recalculates the average price for that specific author.

💡 Table Aliases Are Essential

When a correlated subquery references the same table as the outer query, you must use aliases to distinguish them. Here, b1 is the outer copy and b2 is the inner copy of the books table.

Example: Books With Above-Average Page Count for Their Price Range


-- Books with more pages than the average book in their price range
-- (price range = within $2 of the book's price)
SELECT b1.title, b1.price, b1.pages
FROM books b1
WHERE b1.pages > (
    SELECT AVG(b2.pages)
    FROM books b2
    WHERE b2.price BETWEEN b1.price - 2 AND b1.price + 2
);

⚠️ Performance Warning

Correlated subqueries can be slow on large tables because the inner query runs once per outer row. If your outer query returns 10,000 rows, the inner query runs 10,000 times. For performance-critical queries, JOINs (covered in Lesson 13) are usually faster.

For learning and small datasets, correlated subqueries are perfectly fine and often more readable.

Subqueries in the FROM Clause

A subquery in the FROM clause creates a derived table (also called an inline view) — a temporary result set that you can query like a regular table.


-- Create a summary, then query it
SELECT price_range, book_count, avg_price
FROM (
    SELECT
        CASE
            WHEN price < 10 THEN 'Budget'
            WHEN price < 13 THEN 'Mid-Range'
            ELSE 'Premium'
        END AS price_range,
        COUNT(*) AS book_count,
        ROUND(AVG(price), 2) AS avg_price
    FROM books
    GROUP BY price_range
) AS price_summary
WHERE book_count > 1
ORDER BY avg_price;

Output:


+-------------+------------+-----------+
| price_range | book_count | avg_price |
+-------------+------------+-----------+
| Budget      |          5 |      9.09 |
| Mid-Range   |          3 |     10.82 |
+-------------+------------+-----------+

⚠️ Derived Tables Need an Alias

MySQL requires every derived table to have an alias. The AS price_summary part is mandatory — without it, you'll get a syntax error:


-- ⛔ WRONG — missing alias
SELECT * FROM (SELECT COUNT(*) FROM books);

-- ✅ CORRECT — alias provided
SELECT * FROM (SELECT COUNT(*) AS total FROM books) AS counts;

Why Use FROM Subqueries?

They're useful when you need to:

Filter on aggregated results without HAVING (more flexible)
Perform calculations on aggregated data — like finding the average of averages
Pre-process data before joining it with another table


-- Average number of books per author
-- Step 1: Count books per author (inner query)
-- Step 2: Average those counts (outer query)
SELECT ROUND(AVG(book_count), 1) AS avg_books_per_author
FROM (
    SELECT author, COUNT(*) AS book_count
    FROM books
    GROUP BY author
) AS author_counts;

Output:


+----------------------+
| avg_books_per_author |
+----------------------+
|                  1.0 |
+----------------------+

Subqueries in the SELECT Clause

A scalar subquery in the SELECT clause adds a computed column to each row. It must return exactly one value.


-- Show each book with the overall average for comparison
SELECT
    title,
    price,
    (SELECT ROUND(AVG(price), 2) FROM books) AS avg_price,
    ROUND(price - (SELECT AVG(price) FROM books), 2) AS diff_from_avg
FROM books
ORDER BY diff_from_avg DESC;

Output:


+-------------------------+-------+-----------+---------------+
| title                   | price | avg_price | diff_from_avg |
+-------------------------+-------+-----------+---------------+
| Lord of the Rings       | 15.99 |     10.99 |          5.00 |
| Dune                    | 14.99 |     10.99 |          4.00 |
| To Kill a Mockingbird   | 11.99 |     10.99 |          1.00 |
| The Hobbit              | 10.99 |     10.99 |          0.00 |
| Fahrenheit 451          | 10.49 |     10.99 |         -0.50 |
| 1984                    |  9.99 |     10.99 |         -1.00 |
| Brave New World         |  9.49 |     10.99 |         -1.50 |
| The Great Gatsby        |  8.99 |     10.99 |         -2.00 |
| The Catcher in the Rye  |  8.99 |     10.99 |         -2.00 |
| Pride and Prejudice     |  7.99 |     10.99 |         -3.00 |
+-------------------------+-------+-----------+---------------+


-- Show each book with a percentage of total value
SELECT
    title,
    price,
    ROUND(price / (SELECT SUM(price) FROM books) * 100, 1) AS pct_of_total
FROM books
ORDER BY pct_of_total DESC;

Output:


+-------------------------+-------+--------------+
| title                   | price | pct_of_total |
+-------------------------+-------+--------------+
| Lord of the Rings       | 15.99 |         14.5 |
| Dune                    | 14.99 |         13.6 |
| To Kill a Mockingbird   | 11.99 |         10.9 |
| The Hobbit              | 10.99 |         10.0 |
| Fahrenheit 451          | 10.49 |          9.5 |
| 1984                    |  9.99 |          9.1 |
| Brave New World         |  9.49 |          8.6 |
| The Great Gatsby        |  8.99 |          8.2 |
| The Catcher in the Rye  |  8.99 |          8.2 |
| Pride and Prejudice     |  7.99 |          7.3 |
+-------------------------+-------+--------------+

When to Use Subqueries

✅ Good Uses for Subqueries

Comparing to aggregates — "above average," "equal to the max," "below the median"
Filtering with IN/NOT IN — "customers in cities that have X"
Existence checks — "customers who have/haven't ordered" (with EXISTS)
Derived tables — when you need to aggregate, then filter or aggregate again
Adding context columns — showing each row alongside a global metric

⚠️ When JOINs Are Better

Combining columns from multiple tables — JOINs are clearer and usually faster
Correlated subqueries on large tables — JOINs avoid the "run once per row" penalty
When readability suffers — deeply nested subqueries become hard to debug

You'll learn JOINs in Lesson 13. Many problems can be solved with either subqueries or JOINs — experienced developers choose whichever is clearer for the specific situation.

Nesting Depth

Subqueries can be nested — a subquery inside a subquery. MySQL supports deep nesting, but readability drops fast:


-- Two levels of nesting (this is about the practical limit)
SELECT title, price
FROM books
WHERE price > (
    SELECT AVG(price) FROM books
    WHERE author IN (
        SELECT DISTINCT author FROM books WHERE in_stock = TRUE
    )
);

💡 Rule of Thumb

If you find yourself nesting more than 2 levels deep, step back and consider whether the query can be restructured with JOINs, temporary tables, or common table expressions (CTEs — an advanced topic for later).

Exercises

🏋️ Exercise 1: Scalar Subqueries

Write queries to find:

All books priced above the average price
The title and author of the book with the most pages
All books that cost less than the cheapest out-of-stock book
Books with more pages than the average page count

✅ Solution


-- 1
SELECT title, price FROM books
WHERE price > (SELECT AVG(price) FROM books);

-- 2
SELECT title, author, pages FROM books
WHERE pages = (SELECT MAX(pages) FROM books);

-- 3
SELECT title, price FROM books
WHERE price < (
    SELECT MIN(price) FROM books WHERE in_stock = FALSE
);

-- 4
SELECT title, pages FROM books
WHERE pages > (SELECT AVG(pages) FROM books);

🏋️ Exercise 2: IN Subqueries

Write queries to:

Find all customers who live in a city that has exactly 1 customer
Find all books whose price matches any out-of-stock book's price
Find customers who do NOT live in cities with multiple customers (exclude NULLs)

✅ Solution


-- 1
SELECT first_name, last_name, city FROM customers
WHERE city IN (
    SELECT city FROM customers
    WHERE city IS NOT NULL
    GROUP BY city
    HAVING COUNT(*) = 1
);

-- 2
SELECT title, price FROM books
WHERE price IN (
    SELECT price FROM books WHERE in_stock = FALSE
);

-- 3
SELECT first_name, last_name, city FROM customers
WHERE city IS NOT NULL
AND city NOT IN (
    SELECT city FROM customers
    WHERE city IS NOT NULL
    GROUP BY city
    HAVING COUNT(*) > 1
);

🏋️ Exercise 3: SELECT & FROM Subqueries

Write queries to:

Show each book's title, price, and its percentage of the total price of all books (rounded to 1 decimal)
Show each book's title, price, and how far it is from the average (price - avg)
Calculate the average number of customers per city (hint: count per city first, then average those counts)

✅ Solution


-- 1
SELECT
    title,
    price,
    ROUND(price / (SELECT SUM(price) FROM books) * 100, 1) AS pct_of_total
FROM books
ORDER BY pct_of_total DESC;

-- 2
SELECT
    title,
    price,
    ROUND(price - (SELECT AVG(price) FROM books), 2) AS diff_from_avg
FROM books
ORDER BY diff_from_avg DESC;

-- 3
SELECT ROUND(AVG(customer_count), 1) AS avg_customers_per_city
FROM (
    SELECT city, COUNT(*) AS customer_count
    FROM customers
    WHERE city IS NOT NULL
    GROUP BY city
) AS city_counts;

🏋️ Exercise 4: Combined Challenge

Write a single query that shows all books that are:

Currently in stock
Priced above the average price of ALL books (including out-of-stock)
Include the title, author, price, and the average price for comparison
Sort by how far above the average they are (highest first)

✅ Solution


SELECT
    title,
    author,
    price,
    (SELECT ROUND(AVG(price), 2) FROM books) AS avg_price,
    ROUND(price - (SELECT AVG(price) FROM books), 2) AS above_avg
FROM books
WHERE in_stock = TRUE
  AND price > (SELECT AVG(price) FROM books)
ORDER BY above_avg DESC;

🎯 Quick Quiz

Question 1: What type of subquery returns exactly one row and one column?

Question 2: What is the danger of using NOT IN with a subquery?

Question 3: How does a correlated subquery differ from a regular subquery?

Question 4: What is required when using a subquery in the FROM clause?

Summary

🎉 Key Takeaways

A subquery is a SELECT inside another SQL statement, enclosed in parentheses
Scalar subqueries return one value — use with =, >, <, etc.
Use IN when a subquery returns multiple values (a list)
NOT IN fails silently when the subquery returns NULL — always filter NULLs
EXISTS checks whether any rows exist — great for presence/absence checks
Correlated subqueries reference the outer query and run once per row
FROM subqueries (derived tables) require an alias and act like temporary tables
SELECT subqueries add computed columns alongside regular data
Avoid nesting more than 2 levels — consider JOINs or CTEs for complex queries
JOINs often outperform correlated subqueries on large datasets

Quick Reference

Pattern	Syntax
Compare to aggregate	`WHERE col > (SELECT AVG(col) FROM t)`
Match a value	`WHERE col = (SELECT MAX(col) FROM t)`
Match a list	`WHERE col IN (SELECT col FROM t2)`
Exclude a list	`WHERE col NOT IN (SELECT col FROM t2 WHERE col IS NOT NULL)`
Check existence	`WHERE EXISTS (SELECT 1 FROM t2 WHERE ...)`
Derived table	`FROM (SELECT ... ) AS alias`
Computed column	`SELECT (SELECT AGG(col) FROM t) AS name`

📚 Additional Resources

🚀 What's Next?

You've completed Module 3! You can now filter, sort, aggregate, and nest queries — that's a serious SQL toolkit. In Module 4: Relationships & Joins, you'll learn how real databases connect multiple tables together. First up: Table Relationships — understanding one-to-one, one-to-many, and many-to-many relationships, and why splitting data across tables is the foundation of good database design.

🎉 Module 3 Complete!

You've mastered querying data — filtering, sorting, aggregating, and nesting. Your SQL is now powerful enough to answer complex, multi-step questions about your data!

📑 In This Lesson

What Is a Subquery?

Scalar Subqueries (Single Value)

Comparing Against an Aggregate

Output:

Output:

Using MIN and MAX in Subqueries

Output:

💡 Subquery vs ORDER BY + LIMIT

Scalar Subquery with Arithmetic

Output:

Subqueries with IN (List of Values)

Basic IN Subquery

Output:

💡 How It Works Step by Step

NOT IN — Excluding Matches

Output:

⚠️ NOT IN and NULL — A Dangerous Trap

Cross-Table IN Subqueries

Subqueries with Comparison Operators

ANY / SOME — Match At Least One

💡 ANY Simplified

ALL — Match Every Value

💡 ALL Simplified

EXISTS — Checking for Related Rows

Output:

Output:

💡 Why SELECT 1?

EXISTS with Correlated Subqueries

NOT EXISTS

✅ EXISTS vs IN

Correlated Subqueries

Example: Books Above Their Author's Average Price

💡 Table Aliases Are Essential

Example: Books With Above-Average Page Count for Their Price Range

⚠️ Performance Warning

Subqueries in the FROM Clause

Output:

⚠️ Derived Tables Need an Alias

Why Use FROM Subqueries?

Output:

Subqueries in the SELECT Clause

Output:

Output:

When to Use Subqueries

✅ Good Uses for Subqueries

⚠️ When JOINs Are Better

Nesting Depth

💡 Rule of Thumb

Exercises

🏋️ Exercise 1: Scalar Subqueries

🏋️ Exercise 2: IN Subqueries

🏋️ Exercise 3: SELECT & FROM Subqueries

🏋️ Exercise 4: Combined Challenge

🎯 Quick Quiz

Summary

🎉 Key Takeaways

Quick Reference

📚 Additional Resources

🚀 What's Next?

🎉 Module 3 Complete!