๐ Lesson 9: Aggregate Functions & GROUP BY
So far you've been retrieving individual rows โ "show me this book" or "list those customers." But what about questions like "how many books do we have?" or "what's the average price?" Aggregate functions collapse many rows into a single answer. Add GROUP BY and you can answer those questions per category โ average price per author, total sales per city, count per genre.
๐ฏ Learning Objectives
By the end of this lesson, you will be able to:
- Use
COUNT,SUM,AVG,MIN, andMAXto summarize data - Understand how aggregate functions handle NULL values
- Group rows with
GROUP BYto get per-category summaries - Filter groups with
HAVING(and understand how it differs fromWHERE) - Combine aggregates with
WHERE,ORDER BY, andLIMIT - Avoid common aggregate mistakes
Estimated Time: 50 minutes
Prerequisites: Sorting & limiting results (Lesson 8)
๐ In This Lesson
What Are Aggregate Functions?
A regular SELECT returns one row per matching row in the table. An aggregate function takes multiple rows and collapses them into a single value โ a count, a sum, an average, a minimum, or a maximum.
MySQL provides five core aggregate functions:
| Function | What It Does | Example |
|---|---|---|
COUNT() |
Counts rows | How many books do we have? |
SUM() |
Adds up values | What's the total value of all books? |
AVG() |
Calculates the average | What's the average book price? |
MIN() |
Finds the smallest value | What's the cheapest book? |
MAX() |
Finds the largest value | What's the most expensive book? |
COUNT โ Counting Rows
COUNT is the most commonly used aggregate. It comes in two forms:
COUNT(*) โ Count All Rows
COUNT(*) counts every row, including rows with NULL values:
-- How many books do we have?
SELECT COUNT(*) AS total_books FROM books;
Output:
+-------------+
| total_books |
+-------------+
| 10 |
+-------------+
-- How many customers?
SELECT COUNT(*) AS total_customers FROM customers;
Output:
+-----------------+
| total_customers |
+-----------------+
| 8 |
+-----------------+
COUNT(column) โ Count Non-NULL Values
COUNT(column) only counts rows where that column is not NULL:
-- How many customers have a city listed?
SELECT COUNT(city) AS customers_with_city FROM customers;
Output:
+---------------------+
| customers_with_city |
+---------------------+
| 6 |
+---------------------+
We have 8 customers total, but only 6 have a city โ 2 have NULL cities (Eve and Hank from our earlier lessons).
COUNT(DISTINCT column) โ Count Unique Values
Add DISTINCT to count only unique (non-duplicate) values:
-- How many different authors do we have?
SELECT COUNT(DISTINCT author) AS unique_authors FROM books;
Output:
+----------------+
| unique_authors |
+----------------+
| 10 |
+----------------+
-- How many different cities are represented?
SELECT COUNT(DISTINCT city) AS unique_cities FROM customers;
Output:
+---------------+
| unique_cities |
+---------------+
| 4 |
+---------------+
COUNT with WHERE
Combine COUNT with WHERE to count a subset:
-- How many books are in stock?
SELECT COUNT(*) AS in_stock_count FROM books
WHERE in_stock = TRUE;
-- How many books cost more than $10?
SELECT COUNT(*) AS expensive_books FROM books
WHERE price > 10;
-- How many books were published after 1950?
SELECT COUNT(*) AS modern_books FROM books
WHERE published > '1950-01-01';
๐ก Always Use an Alias
Without an alias, MySQL labels the result column COUNT(*) โ which is ugly and hard to reference. Always give your aggregates a meaningful alias with AS.
SUM & AVG โ Totals and Averages
SUM โ Adding Up Values
SUM adds all values in a column. It works with numeric columns:
-- Total value of all books (if you bought one of each)
SELECT SUM(price) AS total_value FROM books;
Output:
+-------------+
| total_value |
+-------------+
| 109.90 |
+-------------+
-- Total pages across all books
SELECT SUM(pages) AS total_pages FROM books;
-- Total value of in-stock books only
SELECT SUM(price) AS in_stock_value FROM books
WHERE in_stock = TRUE;
AVG โ Calculating Averages
AVG returns the arithmetic mean (sum รท count):
-- Average book price
SELECT AVG(price) AS avg_price FROM books;
Output:
+-----------+
| avg_price |
+-----------+
| 10.990000 |
+-----------+
That's a lot of decimal places! Use ROUND to clean it up:
-- Average price, rounded to 2 decimal places
SELECT ROUND(AVG(price), 2) AS avg_price FROM books;
Output:
+-----------+
| avg_price |
+-----------+
| 10.99 |
+-----------+
-- Average page count
SELECT ROUND(AVG(pages)) AS avg_pages FROM books;
-- Average price of in-stock books
SELECT ROUND(AVG(price), 2) AS avg_in_stock_price FROM books
WHERE in_stock = TRUE;
Multiple Aggregates in One Query
You can use multiple aggregate functions in a single SELECT:
-- Get a complete price summary
SELECT
COUNT(*) AS total_books,
ROUND(SUM(price), 2) AS total_value,
ROUND(AVG(price), 2) AS avg_price,
MIN(price) AS cheapest,
MAX(price) AS most_expensive
FROM books;
Output:
+-------------+-------------+-----------+----------+----------------+
| total_books | total_value | avg_price | cheapest | most_expensive |
+-------------+-------------+-----------+----------+----------------+
| 10 | 109.90 | 10.99 | 7.99 | 15.99 |
+-------------+-------------+-----------+----------+----------------+
MIN & MAX โ Extremes
MIN and MAX find the smallest and largest values. They work with numbers, text, and dates:
With Numbers
-- Cheapest and most expensive books
SELECT
MIN(price) AS cheapest,
MAX(price) AS most_expensive
FROM books;
Output:
+----------+----------------+
| cheapest | most_expensive |
+----------+----------------+
| 7.99 | 15.99 |
+----------+----------------+
With Dates
-- Oldest and newest books
SELECT
MIN(published) AS oldest_published,
MAX(published) AS newest_published
FROM books;
Output:
+------------------+------------------+
| oldest_published | newest_published |
+------------------+------------------+
| 1813-01-28 | 1965-08-01 |
+------------------+------------------+
With Text
For text, MIN returns the first alphabetically and MAX returns the last:
-- First and last book titles alphabetically
SELECT
MIN(title) AS first_alpha,
MAX(title) AS last_alpha
FROM books;
Output:
+-------------+---------------------+
| first_alpha | last_alpha |
+-------------+---------------------+
| 1984 | To Kill a Mockingbird |
+-------------+---------------------+
โ ๏ธ MIN/MAX Don't Tell You Which Row
MIN(price) tells you the cheapest price is $7.99, but it doesn't tell you which book has that price. To get the full row, use ORDER BY + LIMIT 1 instead:
-- Which book is cheapest? (full row info)
SELECT title, price FROM books
ORDER BY price ASC LIMIT 1;
In Lesson 10, you'll learn subqueries which let you combine both approaches.
Aggregates and NULL
This is one of the most important (and most misunderstood) behaviors in SQL. Here's the rule:
๐ The NULL Rule
All aggregate functions ignore NULL values โ except COUNT(*), which counts all rows regardless.
Let's see what this means in practice:
-- COUNT(*) vs COUNT(city) on the customers table
SELECT
COUNT(*) AS total_rows,
COUNT(city) AS rows_with_city
FROM customers;
Output:
+------------+----------------+
| total_rows | rows_with_city |
+------------+----------------+
| 8 | 6 |
+------------+----------------+
COUNT(*) says 8 because it counts all rows. COUNT(city) says 6 because it skips the 2 rows where city is NULL.
AVG and NULL โ A Subtle Gotcha
Because AVG ignores NULLs, the average only considers non-NULL values:
-- Imagine a ratings column where some books haven't been rated yet:
-- Book A: 5, Book B: 3, Book C: NULL, Book D: 4
-- AVG(rating) = (5 + 3 + 4) / 3 = 4.0 (NOT divided by 4!)
-- The NULL is skipped entirely โ both from the sum AND the count
๐ก If You Want NULLs Treated as Zero
Use COALESCE or IFNULL to replace NULLs before aggregating:
-- Treat NULL ratings as 0
SELECT AVG(IFNULL(rating, 0)) AS avg_with_zeros FROM books;
-- COALESCE does the same thing
SELECT AVG(COALESCE(rating, 0)) AS avg_with_zeros FROM books;
GROUP BY โ Per-Category Summaries
Without GROUP BY, aggregate functions summarize the entire table. With GROUP BY, they summarize each group separately.
Basic GROUP BY
-- How many books does each author have?
SELECT author, COUNT(*) AS book_count
FROM books
GROUP BY author;
Output:
+----------------------+------------+
| author | book_count |
+----------------------+------------+
| Aldous Huxley | 1 |
| F. Scott Fitzgerald | 1 |
| Frank Herbert | 1 |
| George Orwell | 1 |
| Harper Lee | 1 |
| J.D. Salinger | 1 |
| J.R.R. Tolkien | 1 |
| Jane Austen | 1 |
| Ray Bradbury | 1 |
| J.R.R. Tolkien | 1 |
+----------------------+------------+
-- How many customers are in each city?
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city;
Output:
+-------------+----------------+
| city | customer_count |
+-------------+----------------+
| Chicago | 2 |
| Houston | 1 |
| Los Angeles | 1 |
| New York | 2 |
+-------------+----------------+
GROUP BY with Multiple Aggregates
-- Price stats by in_stock status
SELECT
in_stock,
COUNT(*) AS book_count,
ROUND(AVG(price), 2) AS avg_price,
MIN(price) AS cheapest,
MAX(price) AS most_expensive
FROM books
GROUP BY in_stock;
Output:
+----------+------------+-----------+----------+----------------+
| in_stock | book_count | avg_price | cheapest | most_expensive |
+----------+------------+-----------+----------+----------------+
| 0 | 3 | 10.49 | 8.99 | 11.99 |
| 1 | 7 | 11.13 | 7.99 | 15.99 |
+----------+------------+-----------+----------+----------------+
GROUP BY with ORDER BY
You can sort the grouped results โ often you'll sort by the aggregate value:
-- Cities with the most customers first
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city
ORDER BY customer_count DESC;
-- Authors ranked by average book price
SELECT
author,
COUNT(*) AS books,
ROUND(AVG(price), 2) AS avg_price
FROM books
GROUP BY author
ORDER BY avg_price DESC;
GROUP BY with LIMIT
-- Top 3 cities by customer count
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city
ORDER BY customer_count DESC
LIMIT 3;
Grouping by Multiple Columns
You can group by more than one column for finer-grained categories:
-- Count books by in_stock status AND price range
SELECT
in_stock,
CASE
WHEN price < 10 THEN 'Under $10'
WHEN price < 15 THEN '$10-$15'
ELSE '$15+'
END AS price_range,
COUNT(*) AS book_count
FROM books
GROUP BY in_stock, price_range
ORDER BY in_stock DESC, price_range;
๐ก The CASE Expression
Don't worry about CASE too much right now โ it's like an if/else in SQL. We used it here to create price range labels. The important thing is that GROUP BY can group by any expression, not just raw column names.
HAVING โ Filtering Groups
WHERE filters individual rows before grouping. HAVING filters groups after grouping. This is one of the most important distinctions in SQL.
Basic HAVING
-- Show only cities with MORE than 1 customer
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city
HAVING customer_count > 1;
Output:
+----------+----------------+
| city | customer_count |
+----------+----------------+
| Chicago | 2 |
| New York | 2 |
+----------+----------------+
Houston (1 customer) and Los Angeles (1 customer) are excluded because they don't satisfy HAVING customer_count > 1.
WHERE vs HAVING
This is a critical distinction:
| Feature | WHERE | HAVING |
|---|---|---|
| Filters | Individual rows | Groups (after GROUP BY) |
| When it runs | Before grouping | After grouping |
| Can use aggregates? | โ No | โ Yes |
| Can use column values? | โ Yes | โ Yes (but only grouped columns) |
-- โ WRONG โ you can't use an aggregate in WHERE
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE COUNT(*) > 1 -- ERROR! Aggregates not allowed in WHERE
GROUP BY city;
-- โ
CORRECT โ use HAVING for aggregate conditions
SELECT city, COUNT(*) AS customer_count
FROM customers
GROUP BY city
HAVING COUNT(*) > 1;
Using WHERE and HAVING Together
Often you need both โ WHERE to filter rows before grouping, and HAVING to filter groups after:
-- Among in-stock books, find price ranges with more than 2 books
SELECT
CASE
WHEN price < 10 THEN 'Under $10'
WHEN price < 13 THEN '$10-$13'
ELSE '$13+'
END AS price_range,
COUNT(*) AS book_count,
ROUND(AVG(price), 2) AS avg_price
FROM books
WHERE in_stock = TRUE -- filter rows FIRST
GROUP BY price_range
HAVING book_count > 2 -- then filter groups
ORDER BY avg_price;
The execution order is: WHERE removes out-of-stock books โ GROUP BY groups the remaining books by price range โ HAVING removes groups with 2 or fewer books.
More HAVING Examples
-- Authors whose average book price is above $12
SELECT
author,
ROUND(AVG(price), 2) AS avg_price
FROM books
GROUP BY author
HAVING avg_price > 12
ORDER BY avg_price DESC;
-- Cities where total customer spending would exceed $100
-- (hypothetical example with an orders table)
-- SELECT city, SUM(order_total) AS total_spent
-- FROM customers JOIN orders ON customers.id = orders.customer_id
-- GROUP BY city
-- HAVING total_spent > 100;
โ Rule of Thumb
If your condition involves a column value (like price > 10), use WHERE.
If your condition involves an aggregate (like COUNT(*) > 1 or AVG(price) > 12), use HAVING.
Updated Clause Order
With GROUP BY and HAVING added, the full clause order becomes:
-- The full pattern in action:
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL -- 1. Filter rows
GROUP BY city -- 2. Group remaining rows
HAVING customer_count > 1 -- 3. Filter groups
ORDER BY customer_count DESC -- 4. Sort results
LIMIT 5; -- 5. Limit output
โ Updated Memory Aid
The full clause order: S-F-W-G-H-O-L
Select โ From โ Where โ Group By โ Having โ Order By โ Limit
Or remember: "Silly Frogs Will Gobble Hamburgers On Lily-pads."
Common Mistakes
Mistake 1: Non-Aggregated Columns Without GROUP BY
-- โ WRONG โ title is not aggregated or grouped
SELECT title, COUNT(*) FROM books;
-- โ
CORRECT โ group by the non-aggregated column
SELECT title, COUNT(*) AS count FROM books
GROUP BY title;
-- โ
Also correct โ aggregate everything, no group needed
SELECT COUNT(*) AS total FROM books;
โ ๏ธ The GROUP BY Rule
Every column in SELECT must either be: (a) inside an aggregate function, or (b) listed in GROUP BY. MySQL's default mode may allow violations of this rule without an error, but the results will be unpredictable. Always follow this rule.
Mistake 2: Using Aggregates in WHERE
-- โ WRONG โ WHERE can't use aggregates
SELECT author FROM books
WHERE COUNT(*) > 1
GROUP BY author;
-- โ
CORRECT โ use HAVING
SELECT author, COUNT(*) AS book_count FROM books
GROUP BY author
HAVING book_count > 1;
Mistake 3: Forgetting That AVG Ignores NULL
-- If you have ratings: 5, 3, NULL, 4
-- AVG(rating) = 4.0 (divides by 3, not 4)
-- This might not be what you want!
-- If NULLs should count as zero:
SELECT AVG(IFNULL(rating, 0)) AS avg_rating FROM books;
Mistake 4: Wrong Clause Order
-- โ WRONG โ HAVING before GROUP BY
SELECT city, COUNT(*) FROM customers
HAVING COUNT(*) > 1
GROUP BY city;
-- โ
CORRECT
SELECT city, COUNT(*) FROM customers
GROUP BY city
HAVING COUNT(*) > 1;
Exercises
๐๏ธ Exercise 1: Basic Aggregates
Write queries to find:
- The total number of books in the database
- The average book price (rounded to 2 decimal places)
- The total number of pages across all books
- The cheapest and most expensive book prices in a single query
- How many books are currently in stock
โ Solution
-- 1
SELECT COUNT(*) AS total_books FROM books;
-- 2
SELECT ROUND(AVG(price), 2) AS avg_price FROM books;
-- 3
SELECT SUM(pages) AS total_pages FROM books;
-- 4
SELECT
MIN(price) AS cheapest,
MAX(price) AS most_expensive
FROM books;
-- 5
SELECT COUNT(*) AS in_stock_count FROM books
WHERE in_stock = TRUE;
๐๏ธ Exercise 2: GROUP BY
Write queries to:
- Count the number of customers in each city (exclude NULL cities)
- Find the average book price for in-stock vs. out-of-stock books
- List each city and its customer count, sorted by count (highest first)
โ Solution
-- 1
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city;
-- 2
SELECT
in_stock,
COUNT(*) AS book_count,
ROUND(AVG(price), 2) AS avg_price
FROM books
GROUP BY in_stock;
-- 3
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city
ORDER BY customer_count DESC;
๐๏ธ Exercise 3: HAVING
Write queries to:
- Show only cities with more than 1 customer
- Among in-stock books, find which price ranges (under $10, $10-$15, $15+) have at least 2 books
- List authors whose books have an average page count above 250
โ Solution
-- 1
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city
HAVING customer_count > 1;
-- 2
SELECT
CASE
WHEN price < 10 THEN 'Under $10'
WHEN price < 15 THEN '$10-$15'
ELSE '$15+'
END AS price_range,
COUNT(*) AS book_count
FROM books
WHERE in_stock = TRUE
GROUP BY price_range
HAVING book_count >= 2;
-- 3
SELECT
author,
ROUND(AVG(pages)) AS avg_pages
FROM books
GROUP BY author
HAVING avg_pages > 250;
๐๏ธ Exercise 4: Combined Challenge
Write a single query that:
- Only considers in-stock books
- Groups them by author
- Calculates the count, average price, and total pages per author
- Only shows authors with an average price above $10
- Sorts by average price, highest first
- Returns the top 5 results
โ Solution
SELECT
author,
COUNT(*) AS book_count,
ROUND(AVG(price), 2) AS avg_price,
SUM(pages) AS total_pages
FROM books
WHERE in_stock = TRUE
GROUP BY author
HAVING avg_price > 10
ORDER BY avg_price DESC
LIMIT 5;
๐ฏ Quick Quiz
Question 1: What does COUNT(*) count?
Question 2: When should you use HAVING instead of WHERE?
Question 3: How does AVG() handle NULL values?
Question 4: What is the correct clause order?
Summary
๐ Key Takeaways
- Aggregate functions collapse multiple rows into a single value
- COUNT(*) counts all rows; COUNT(column) counts non-NULL values
- SUM and AVG work with numeric columns; use
ROUNDto clean up decimals - MIN and MAX work with numbers, dates, and text
- All aggregates ignore NULL except
COUNT(*) - GROUP BY splits rows into groups for per-category aggregation
- HAVING filters groups after aggregation (WHERE filters rows before)
- Every non-aggregated column in SELECT must be in
GROUP BY - Full clause order: SELECT โ FROM โ WHERE โ GROUP BY โ HAVING โ ORDER BY โ LIMIT
- Use
COUNT(DISTINCT column)to count unique values
Quick Reference
| Pattern | Syntax |
|---|---|
| Count all rows | SELECT COUNT(*) FROM table |
| Count non-NULL | SELECT COUNT(column) FROM table |
| Count unique | SELECT COUNT(DISTINCT col) FROM table |
| Sum values | SELECT SUM(col) FROM table |
| Average (rounded) | SELECT ROUND(AVG(col), 2) FROM table |
| Min / Max | SELECT MIN(col), MAX(col) FROM table |
| Group summary | SELECT col, COUNT(*) FROM t GROUP BY col |
| Filter groups | GROUP BY col HAVING COUNT(*) > N |
| NULL โ 0 in AVG | AVG(IFNULL(col, 0)) |
๐ Additional Resources
- MySQL Docs โ Aggregate Functions
- MySQL Docs โ GROUP BY Handling
- MySQL Docs โ GROUP BY Modifiers
๐ What's Next?
You can now summarize, group, and filter aggregate data โ powerful tools for turning raw rows into meaningful insights. But what if you need a query inside another query? "Show me all books that cost more than the average price" โ that requires knowing the average first, then using it as a filter. In the next lesson, you'll learn subqueries โ queries nested inside other queries โ to solve exactly these kinds of multi-step problems.
๐ Congratulations!
You've unlocked the analytical power of SQL. COUNT, SUM, AVG, MIN, MAX, GROUP BY, HAVING โ these are the tools that transform raw data into business intelligence!