🔍 Lesson 10: Subqueries & Nested Queries
Sometimes you need the answer to one question before you can ask another. "Show me all books priced above the average" — you need to know the average first, then use it as a filter. A subquery is a query inside another query. It lets you break complex questions into logical steps, all in a single SQL statement.
🎯 Learning Objectives
By the end of this lesson, you will be able to:
- Write scalar subqueries that return a single value
- Use subqueries with
INto match against a list of values - Use
EXISTSto check for the presence of related rows - Write correlated subqueries that reference the outer query
- Use subqueries in
SELECT,WHERE, andFROMclauses - Know when a subquery is the right tool and when it isn't
Estimated Time: 50 minutes
Prerequisites: Aggregate functions & GROUP BY (Lesson 9)
📑 In This Lesson
What Is a Subquery?
A subquery (also called an inner query or nested query) is a complete SELECT statement enclosed in parentheses, placed inside another SQL statement. The outer statement is called the outer query.
The subquery runs first and produces a result. That result is then used by the outer query. Think of it like filling in a blank:
-- Step 1: What's the average price?
SELECT AVG(price) FROM books; -- Returns: 10.99
-- Step 2: Which books cost more than that?
SELECT title, price FROM books WHERE price > 10.99;
-- Combined into one query with a subquery:
SELECT title, price FROM books
WHERE price > (SELECT AVG(price) FROM books);
Subqueries can appear in several places:
| Location | What It Does | Example |
|---|---|---|
WHERE |
Filter rows based on a computed value | WHERE price > (SELECT AVG(price) ...) |
FROM |
Use a subquery result as a temporary table | FROM (SELECT ... ) AS temp |
SELECT |
Compute a value for each row | SELECT (SELECT COUNT(*) ...) |
Scalar Subqueries (Single Value)
A scalar subquery returns exactly one row and one column — a single value. This is the most common and simplest type.
Comparing Against an Aggregate
-- Books priced above average
SELECT title, price
FROM books
WHERE price > (SELECT AVG(price) FROM books);
Output:
+-----------------------+-------+
| title | price |
+-----------------------+-------+
| To Kill a Mockingbird | 11.99 |
| Dune | 14.99 |
| Lord of the Rings | 15.99 |
+-----------------------+-------+
MySQL first executes the subquery (SELECT AVG(price) FROM books) → 10.99, then substitutes it into the outer query: WHERE price > 10.99.
-- Books priced below average
SELECT title, price
FROM books
WHERE price < (SELECT AVG(price) FROM books);
Output:
+------------------------+-------+
| title | price |
+------------------------+-------+
| Pride and Prejudice | 7.99 |
| The Great Gatsby | 8.99 |
| The Catcher in the Rye | 8.99 |
| Brave New World | 9.49 |
| 1984 | 9.99 |
| Fahrenheit 451 | 10.49 |
| The Hobbit | 10.99 |
+------------------------+-------+
Using MIN and MAX in Subqueries
-- Which book has the highest price? (full row info!)
SELECT title, author, price
FROM books
WHERE price = (SELECT MAX(price) FROM books);
Output:
+-------------------+----------------+-------+
| title | author | price |
+-------------------+----------------+-------+
| Lord of the Rings | J.R.R. Tolkien | 15.99 |
+-------------------+----------------+-------+
💡 Subquery vs ORDER BY + LIMIT
Remember how MIN(price) tells you the value but not which row? A subquery solves that. Compare these two equivalent approaches:
-- Approach 1: Subquery
SELECT * FROM books WHERE price = (SELECT MIN(price) FROM books);
-- Approach 2: ORDER BY + LIMIT
SELECT * FROM books ORDER BY price ASC LIMIT 1;
Both work for "find the cheapest book." The subquery approach is more flexible — it can return all books tied at the minimum price, while LIMIT 1 only returns one.
-- Book(s) with the most pages
SELECT title, pages
FROM books
WHERE pages = (SELECT MAX(pages) FROM books);
-- Book(s) published earliest
SELECT title, author, published
FROM books
WHERE published = (SELECT MIN(published) FROM books);
Scalar Subquery with Arithmetic
-- Books priced more than DOUBLE the cheapest book
SELECT title, price
FROM books
WHERE price > (SELECT MIN(price) FROM books) * 2;
Output:
+-----------------------+-------+
| title | price |
+-----------------------+-------+
| To Kill a Mockingbird | 11.99 |
| Dune | 14.99 |
| Lord of the Rings | 15.99 |
+-----------------------+-------+
The cheapest book is $7.99, so double is $15.98 — only books above that price are returned.
Subqueries with IN (List of Values)
When a subquery returns multiple rows (but one column), you can't use = — you need IN. The subquery produces a list, and IN checks if a value is in that list.
Basic IN Subquery
-- Find customers who live in cities that have more than 1 customer
SELECT first_name, last_name, city
FROM customers
WHERE city IN (
SELECT city
FROM customers
WHERE city IS NOT NULL
GROUP BY city
HAVING COUNT(*) > 1
);
Output:
+------------+-----------+----------+
| first_name | last_name | city |
+------------+-----------+----------+
| Alice | Johnson | New York |
| Bob | Smith | Chicago |
| Frank | Miller | New York |
| Grace | Wilson | Chicago |
+------------+-----------+----------+
The inner query returns the cities with more than one customer (New York, Chicago). The outer query then finds all customers in those cities.
💡 How It Works Step by Step
1. Inner query runs: SELECT city ... HAVING COUNT(*) > 1 → returns ('New York', 'Chicago')
2. Outer query becomes: WHERE city IN ('New York', 'Chicago')
3. All customers matching those cities are returned
NOT IN — Excluding Matches
-- Customers NOT in multi-customer cities
SELECT first_name, last_name, city
FROM customers
WHERE city IS NOT NULL
AND city NOT IN (
SELECT city
FROM customers
WHERE city IS NOT NULL
GROUP BY city
HAVING COUNT(*) > 1
);
Output:
+------------+-----------+-------------+
| first_name | last_name | city |
+------------+-----------+-------------+
| Carol | Williams | Los Angeles |
| Dave | Brown | Houston |
+------------+-----------+-------------+
⚠️ NOT IN and NULL — A Dangerous Trap
If the subquery returns any NULL values, NOT IN returns no rows at all. This is one of SQL's most notorious gotchas:
-- If the subquery returns ('New York', 'Chicago', NULL):
-- NOT IN ('New York', 'Chicago', NULL) → always FALSE!
-- The NULL makes every comparison UNKNOWN
-- Fix: Filter out NULLs in the subquery
WHERE city NOT IN (
SELECT city FROM customers WHERE city IS NOT NULL ...
);
Always add WHERE column IS NOT NULL inside subqueries used with NOT IN.
Cross-Table IN Subqueries
Subqueries are especially powerful when they reference a different table. Once we add more tables in later lessons (like orders and order_items), you'll see patterns like:
-- Conceptual: Find customers who have placed orders
-- (we'll build these tables in Module 4)
-- SELECT first_name, last_name FROM customers
-- WHERE id IN (SELECT customer_id FROM orders);
-- Conceptual: Find books that have never been ordered
-- SELECT title FROM books
-- WHERE id NOT IN (SELECT book_id FROM order_items);
For now, focus on the pattern — you'll use it constantly once tables are linked with foreign keys.
Subqueries with Comparison Operators
Beyond = and IN, you can use ANY, ALL, and SOME with subqueries that return multiple values.
ANY / SOME — Match At Least One
ANY (and its synonym SOME) returns TRUE if the comparison is true for at least one value from the subquery:
-- Books cheaper than ANY out-of-stock book
-- (i.e., cheaper than the MOST EXPENSIVE out-of-stock book)
SELECT title, price
FROM books
WHERE price < ANY (
SELECT price FROM books WHERE in_stock = FALSE
);
💡 ANY Simplified
price < ANY (subquery) means: "price is less than at least one value in the list." This is equivalent to price < MAX(values_from_subquery).
ALL — Match Every Value
ALL returns TRUE only if the comparison is true for every value from the subquery:
-- Books cheaper than ALL out-of-stock books
-- (i.e., cheaper than the CHEAPEST out-of-stock book)
SELECT title, price
FROM books
WHERE price < ALL (
SELECT price FROM books WHERE in_stock = FALSE
);
💡 ALL Simplified
price < ALL (subquery) means: "price is less than every single value in the list." This is equivalent to price < MIN(values_from_subquery).
| Expression | Equivalent To |
|---|---|
= ANY (...) |
IN (...) |
< ANY (...) |
< MAX(...) |
> ANY (...) |
> MIN(...) |
< ALL (...) |
< MIN(...) |
> ALL (...) |
> MAX(...) |
In practice, ANY and ALL are less common than IN and scalar subqueries. Most developers find the MIN/MAX equivalents easier to read.
EXISTS — Checking for Related Rows
EXISTS checks whether a subquery returns any rows at all. It doesn't care about the values — just whether rows exist or not. It returns TRUE if the subquery returns at least one row, FALSE if it returns none.
-- Do we have any books priced over $20?
SELECT EXISTS (SELECT 1 FROM books WHERE price > 20) AS has_expensive;
Output:
+---------------+
| has_expensive |
+---------------+
| 0 |
+---------------+
-- Do we have any books priced over $15?
SELECT EXISTS (SELECT 1 FROM books WHERE price > 15) AS has_expensive;
Output:
+---------------+
| has_expensive |
+---------------+
| 1 |
+---------------+
💡 Why SELECT 1?
Inside EXISTS, the actual columns selected don't matter — MySQL only checks if any rows are returned. SELECT 1 is a convention that makes this clear, but SELECT * or SELECT column would work identically.
EXISTS with Correlated Subqueries
EXISTS becomes truly powerful when combined with correlated subqueries (covered in the next section). Here's a preview:
-- Conceptual: Find customers who have placed at least one order
-- SELECT c.first_name, c.last_name
-- FROM customers c
-- WHERE EXISTS (
-- SELECT 1 FROM orders o WHERE o.customer_id = c.id
-- );
NOT EXISTS
-- Conceptual: Find customers who have NEVER placed an order
-- SELECT c.first_name, c.last_name
-- FROM customers c
-- WHERE NOT EXISTS (
-- SELECT 1 FROM orders o WHERE o.customer_id = c.id
-- );
✅ EXISTS vs IN
IN and EXISTS often solve the same problem, but they work differently:
- IN — best when the subquery returns a small list of values
- EXISTS — best when checking for the presence of related rows, especially with large tables
NOT EXISTSis generally safer thanNOT INbecause it handles NULLs correctly
Subqueries in the FROM Clause
A subquery in the FROM clause creates a derived table (also called an inline view) — a temporary result set that you can query like a regular table.
-- Create a summary, then query it
SELECT price_range, book_count, avg_price
FROM (
SELECT
CASE
WHEN price < 10 THEN 'Budget'
WHEN price < 13 THEN 'Mid-Range'
ELSE 'Premium'
END AS price_range,
COUNT(*) AS book_count,
ROUND(AVG(price), 2) AS avg_price
FROM books
GROUP BY price_range
) AS price_summary
WHERE book_count > 1
ORDER BY avg_price;
Output:
+-------------+------------+-----------+
| price_range | book_count | avg_price |
+-------------+------------+-----------+
| Budget | 5 | 9.09 |
| Mid-Range | 3 | 10.82 |
+-------------+------------+-----------+
⚠️ Derived Tables Need an Alias
MySQL requires every derived table to have an alias. The AS price_summary part is mandatory — without it, you'll get a syntax error:
-- ⛔ WRONG — missing alias
SELECT * FROM (SELECT COUNT(*) FROM books);
-- ✅ CORRECT — alias provided
SELECT * FROM (SELECT COUNT(*) AS total FROM books) AS counts;
Why Use FROM Subqueries?
They're useful when you need to:
- Filter on aggregated results without HAVING (more flexible)
- Perform calculations on aggregated data — like finding the average of averages
- Pre-process data before joining it with another table
-- Average number of books per author
-- Step 1: Count books per author (inner query)
-- Step 2: Average those counts (outer query)
SELECT ROUND(AVG(book_count), 1) AS avg_books_per_author
FROM (
SELECT author, COUNT(*) AS book_count
FROM books
GROUP BY author
) AS author_counts;
Output:
+----------------------+
| avg_books_per_author |
+----------------------+
| 1.0 |
+----------------------+
Subqueries in the SELECT Clause
A scalar subquery in the SELECT clause adds a computed column to each row. It must return exactly one value.
-- Show each book with the overall average for comparison
SELECT
title,
price,
(SELECT ROUND(AVG(price), 2) FROM books) AS avg_price,
ROUND(price - (SELECT AVG(price) FROM books), 2) AS diff_from_avg
FROM books
ORDER BY diff_from_avg DESC;
Output:
+-------------------------+-------+-----------+---------------+
| title | price | avg_price | diff_from_avg |
+-------------------------+-------+-----------+---------------+
| Lord of the Rings | 15.99 | 10.99 | 5.00 |
| Dune | 14.99 | 10.99 | 4.00 |
| To Kill a Mockingbird | 11.99 | 10.99 | 1.00 |
| The Hobbit | 10.99 | 10.99 | 0.00 |
| Fahrenheit 451 | 10.49 | 10.99 | -0.50 |
| 1984 | 9.99 | 10.99 | -1.00 |
| Brave New World | 9.49 | 10.99 | -1.50 |
| The Great Gatsby | 8.99 | 10.99 | -2.00 |
| The Catcher in the Rye | 8.99 | 10.99 | -2.00 |
| Pride and Prejudice | 7.99 | 10.99 | -3.00 |
+-------------------------+-------+-----------+---------------+
-- Show each book with a percentage of total value
SELECT
title,
price,
ROUND(price / (SELECT SUM(price) FROM books) * 100, 1) AS pct_of_total
FROM books
ORDER BY pct_of_total DESC;
Output:
+-------------------------+-------+--------------+
| title | price | pct_of_total |
+-------------------------+-------+--------------+
| Lord of the Rings | 15.99 | 14.5 |
| Dune | 14.99 | 13.6 |
| To Kill a Mockingbird | 11.99 | 10.9 |
| The Hobbit | 10.99 | 10.0 |
| Fahrenheit 451 | 10.49 | 9.5 |
| 1984 | 9.99 | 9.1 |
| Brave New World | 9.49 | 8.6 |
| The Great Gatsby | 8.99 | 8.2 |
| The Catcher in the Rye | 8.99 | 8.2 |
| Pride and Prejudice | 7.99 | 7.3 |
+-------------------------+-------+--------------+
When to Use Subqueries
✅ Good Uses for Subqueries
- Comparing to aggregates — "above average," "equal to the max," "below the median"
- Filtering with IN/NOT IN — "customers in cities that have X"
- Existence checks — "customers who have/haven't ordered" (with EXISTS)
- Derived tables — when you need to aggregate, then filter or aggregate again
- Adding context columns — showing each row alongside a global metric
⚠️ When JOINs Are Better
- Combining columns from multiple tables — JOINs are clearer and usually faster
- Correlated subqueries on large tables — JOINs avoid the "run once per row" penalty
- When readability suffers — deeply nested subqueries become hard to debug
You'll learn JOINs in Lesson 13. Many problems can be solved with either subqueries or JOINs — experienced developers choose whichever is clearer for the specific situation.
Nesting Depth
Subqueries can be nested — a subquery inside a subquery. MySQL supports deep nesting, but readability drops fast:
-- Two levels of nesting (this is about the practical limit)
SELECT title, price
FROM books
WHERE price > (
SELECT AVG(price) FROM books
WHERE author IN (
SELECT DISTINCT author FROM books WHERE in_stock = TRUE
)
);
💡 Rule of Thumb
If you find yourself nesting more than 2 levels deep, step back and consider whether the query can be restructured with JOINs, temporary tables, or common table expressions (CTEs — an advanced topic for later).
Exercises
🏋️ Exercise 1: Scalar Subqueries
Write queries to find:
- All books priced above the average price
- The title and author of the book with the most pages
- All books that cost less than the cheapest out-of-stock book
- Books with more pages than the average page count
✅ Solution
-- 1
SELECT title, price FROM books
WHERE price > (SELECT AVG(price) FROM books);
-- 2
SELECT title, author, pages FROM books
WHERE pages = (SELECT MAX(pages) FROM books);
-- 3
SELECT title, price FROM books
WHERE price < (
SELECT MIN(price) FROM books WHERE in_stock = FALSE
);
-- 4
SELECT title, pages FROM books
WHERE pages > (SELECT AVG(pages) FROM books);
🏋️ Exercise 2: IN Subqueries
Write queries to:
- Find all customers who live in a city that has exactly 1 customer
- Find all books whose price matches any out-of-stock book's price
- Find customers who do NOT live in cities with multiple customers (exclude NULLs)
✅ Solution
-- 1
SELECT first_name, last_name, city FROM customers
WHERE city IN (
SELECT city FROM customers
WHERE city IS NOT NULL
GROUP BY city
HAVING COUNT(*) = 1
);
-- 2
SELECT title, price FROM books
WHERE price IN (
SELECT price FROM books WHERE in_stock = FALSE
);
-- 3
SELECT first_name, last_name, city FROM customers
WHERE city IS NOT NULL
AND city NOT IN (
SELECT city FROM customers
WHERE city IS NOT NULL
GROUP BY city
HAVING COUNT(*) > 1
);
🏋️ Exercise 3: SELECT & FROM Subqueries
Write queries to:
- Show each book's title, price, and its percentage of the total price of all books (rounded to 1 decimal)
- Show each book's title, price, and how far it is from the average (price - avg)
- Calculate the average number of customers per city (hint: count per city first, then average those counts)
✅ Solution
-- 1
SELECT
title,
price,
ROUND(price / (SELECT SUM(price) FROM books) * 100, 1) AS pct_of_total
FROM books
ORDER BY pct_of_total DESC;
-- 2
SELECT
title,
price,
ROUND(price - (SELECT AVG(price) FROM books), 2) AS diff_from_avg
FROM books
ORDER BY diff_from_avg DESC;
-- 3
SELECT ROUND(AVG(customer_count), 1) AS avg_customers_per_city
FROM (
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city
) AS city_counts;
🏋️ Exercise 4: Combined Challenge
Write a single query that shows all books that are:
- Currently in stock
- Priced above the average price of ALL books (including out-of-stock)
- Include the title, author, price, and the average price for comparison
- Sort by how far above the average they are (highest first)
✅ Solution
SELECT
title,
author,
price,
(SELECT ROUND(AVG(price), 2) FROM books) AS avg_price,
ROUND(price - (SELECT AVG(price) FROM books), 2) AS above_avg
FROM books
WHERE in_stock = TRUE
AND price > (SELECT AVG(price) FROM books)
ORDER BY above_avg DESC;
🎯 Quick Quiz
Question 1: What type of subquery returns exactly one row and one column?
Question 2: What is the danger of using NOT IN with a subquery?
Question 3: How does a correlated subquery differ from a regular subquery?
Question 4: What is required when using a subquery in the FROM clause?
Summary
🎉 Key Takeaways
- A subquery is a SELECT inside another SQL statement, enclosed in parentheses
- Scalar subqueries return one value — use with
=,>,<, etc. - Use IN when a subquery returns multiple values (a list)
- NOT IN fails silently when the subquery returns NULL — always filter NULLs
- EXISTS checks whether any rows exist — great for presence/absence checks
- Correlated subqueries reference the outer query and run once per row
- FROM subqueries (derived tables) require an alias and act like temporary tables
- SELECT subqueries add computed columns alongside regular data
- Avoid nesting more than 2 levels — consider JOINs or CTEs for complex queries
- JOINs often outperform correlated subqueries on large datasets
Quick Reference
| Pattern | Syntax |
|---|---|
| Compare to aggregate | WHERE col > (SELECT AVG(col) FROM t) |
| Match a value | WHERE col = (SELECT MAX(col) FROM t) |
| Match a list | WHERE col IN (SELECT col FROM t2) |
| Exclude a list | WHERE col NOT IN (SELECT col FROM t2 WHERE col IS NOT NULL) |
| Check existence | WHERE EXISTS (SELECT 1 FROM t2 WHERE ...) |
| Derived table | FROM (SELECT ... ) AS alias |
| Computed column | SELECT (SELECT AGG(col) FROM t) AS name |
📚 Additional Resources
🚀 What's Next?
You've completed Module 3! You can now filter, sort, aggregate, and nest queries — that's a serious SQL toolkit. In Module 4: Relationships & Joins, you'll learn how real databases connect multiple tables together. First up: Table Relationships — understanding one-to-one, one-to-many, and many-to-many relationships, and why splitting data across tables is the foundation of good database design.
🎉 Module 3 Complete!
You've mastered querying data — filtering, sorting, aggregating, and nesting. Your SQL is now powerful enough to answer complex, multi-step questions about your data!