Aggregate Functions & GROUP BY

What Are Aggregate Functions?

A regular SELECT returns one row per matching row in the table. An aggregate function takes multiple rows and collapses them into a single value — a count, a sum, an average, a minimum, or a maximum.

graph LR subgraph Input["10 rows from books table"] R1["$7.99"] R2["$8.99"] R3["$8.99"] R4["$9.49"] R5["$9.99"] R6["$10.49"] R7["$10.99"] R8["$11.99"] R9["$14.99"] R10["$15.99"] end Input -->|"AVG(price)"| Result["$10.99"]

MySQL provides five core aggregate functions:

Function	What It Does	Example
`COUNT()`	Counts rows	How many books do we have?
`SUM()`	Adds up values	What's the total value of all books?
`AVG()`	Calculates the average	What's the average book price?
`MIN()`	Finds the smallest value	What's the cheapest book?
`MAX()`	Finds the largest value	What's the most expensive book?

COUNT — Counting Rows

COUNT is the most commonly used aggregate. It comes in two forms:

COUNT(*) — Count All Rows

COUNT(*) counts every row, including rows with NULL values:


-- How many books do we have?
SELECT COUNT(*) AS total_books FROM books;

Output:


+-------------+
| total_books |
+-------------+
|          10 |
+-------------+


-- How many customers?
SELECT COUNT(*) AS total_customers FROM customers;

Output:


+-----------------+
| total_customers |
+-----------------+
|               8 |
+-----------------+

COUNT(column) — Count Non-NULL Values

COUNT(column) only counts rows where that column is not NULL:


-- How many customers have a city listed?
SELECT COUNT(city) AS customers_with_city FROM customers;

Output:


+---------------------+
| customers_with_city |
+---------------------+
|                   6 |
+---------------------+

We have 8 customers total, but only 6 have a city — 2 have NULL cities (Eve and Hank from our earlier lessons).

COUNT(DISTINCT column) — Count Unique Values

Add DISTINCT to count only unique (non-duplicate) values:


-- How many different authors do we have?
SELECT COUNT(DISTINCT author) AS unique_authors FROM books;

Output:


+----------------+
| unique_authors |
+----------------+
|             10 |
+----------------+


-- How many different cities are represented?
SELECT COUNT(DISTINCT city) AS unique_cities FROM customers;

Output:


+---------------+
| unique_cities |
+---------------+
|             4 |
+---------------+

COUNT with WHERE

Combine COUNT with WHERE to count a subset:


-- How many books are in stock?
SELECT COUNT(*) AS in_stock_count FROM books
WHERE in_stock = TRUE;

-- How many books cost more than $10?
SELECT COUNT(*) AS expensive_books FROM books
WHERE price > 10;

-- How many books were published after 1950?
SELECT COUNT(*) AS modern_books FROM books
WHERE published > '1950-01-01';

💡 Always Use an Alias

Without an alias, MySQL labels the result column COUNT(*) — which is ugly and hard to reference. Always give your aggregates a meaningful alias with AS.

SUM & AVG — Totals and Averages

SUM — Adding Up Values

SUM adds all values in a column. It works with numeric columns:


-- Total value of all books (if you bought one of each)
SELECT SUM(price) AS total_value FROM books;

Output:


+-------------+
| total_value |
+-------------+
|      109.90 |
+-------------+


-- Total pages across all books
SELECT SUM(pages) AS total_pages FROM books;

-- Total value of in-stock books only
SELECT SUM(price) AS in_stock_value FROM books
WHERE in_stock = TRUE;

AVG — Calculating Averages

AVG returns the arithmetic mean (sum ÷ count):


-- Average book price
SELECT AVG(price) AS avg_price FROM books;

Output:


+-----------+
| avg_price |
+-----------+
| 10.990000 |
+-----------+

That's a lot of decimal places! Use ROUND to clean it up:


-- Average price, rounded to 2 decimal places
SELECT ROUND(AVG(price), 2) AS avg_price FROM books;

Output:


+-----------+
| avg_price |
+-----------+
|     10.99 |
+-----------+


-- Average page count
SELECT ROUND(AVG(pages)) AS avg_pages FROM books;

-- Average price of in-stock books
SELECT ROUND(AVG(price), 2) AS avg_in_stock_price FROM books
WHERE in_stock = TRUE;

Multiple Aggregates in One Query

You can use multiple aggregate functions in a single SELECT:


-- Get a complete price summary
SELECT
    COUNT(*) AS total_books,
    ROUND(SUM(price), 2) AS total_value,
    ROUND(AVG(price), 2) AS avg_price,
    MIN(price) AS cheapest,
    MAX(price) AS most_expensive
FROM books;

Output:


+-------------+-------------+-----------+----------+----------------+
| total_books | total_value | avg_price | cheapest | most_expensive |
+-------------+-------------+-----------+----------+----------------+
|          10 |      109.90 |     10.99 |     7.99 |          15.99 |
+-------------+-------------+-----------+----------+----------------+

MIN & MAX — Extremes

MIN and MAX find the smallest and largest values. They work with numbers, text, and dates:

With Numbers


-- Cheapest and most expensive books
SELECT
    MIN(price) AS cheapest,
    MAX(price) AS most_expensive
FROM books;

Output:


+----------+----------------+
| cheapest | most_expensive |
+----------+----------------+
|     7.99 |          15.99 |
+----------+----------------+

With Dates


-- Oldest and newest books
SELECT
    MIN(published) AS oldest_published,
    MAX(published) AS newest_published
FROM books;

Output:


+------------------+------------------+
| oldest_published | newest_published |
+------------------+------------------+
| 1813-01-28       | 1965-08-01       |
+------------------+------------------+

With Text

For text, MIN returns the first alphabetically and MAX returns the last:


-- First and last book titles alphabetically
SELECT
    MIN(title) AS first_alpha,
    MAX(title) AS last_alpha
FROM books;

Output:


+-------------+---------------------+
| first_alpha | last_alpha          |
+-------------+---------------------+
| 1984        | To Kill a Mockingbird |
+-------------+---------------------+

⚠️ MIN/MAX Don't Tell You Which Row

MIN(price) tells you the cheapest price is $7.99, but it doesn't tell you which book has that price. To get the full row, use ORDER BY + LIMIT 1 instead:


-- Which book is cheapest? (full row info)
SELECT title, price FROM books
ORDER BY price ASC LIMIT 1;

In Lesson 10, you'll learn subqueries which let you combine both approaches.

Aggregates and NULL

This is one of the most important (and most misunderstood) behaviors in SQL. Here's the rule:

🔑 The NULL Rule

All aggregate functions ignore NULL values — except COUNT(*), which counts all rows regardless.

Let's see what this means in practice:


-- COUNT(*) vs COUNT(city) on the customers table
SELECT
    COUNT(*) AS total_rows,
    COUNT(city) AS rows_with_city
FROM customers;

Output:


+------------+----------------+
| total_rows | rows_with_city |
+------------+----------------+
|          8 |              6 |
+------------+----------------+

COUNT(*) says 8 because it counts all rows. COUNT(city) says 6 because it skips the 2 rows where city is NULL.

AVG and NULL — A Subtle Gotcha

Because AVG ignores NULLs, the average only considers non-NULL values:


-- Imagine a ratings column where some books haven't been rated yet:
-- Book A: 5, Book B: 3, Book C: NULL, Book D: 4

-- AVG(rating) = (5 + 3 + 4) / 3 = 4.0  (NOT divided by 4!)
-- The NULL is skipped entirely — both from the sum AND the count

💡 If You Want NULLs Treated as Zero

Use COALESCE or IFNULL to replace NULLs before aggregating:


-- Treat NULL ratings as 0
SELECT AVG(IFNULL(rating, 0)) AS avg_with_zeros FROM books;

-- COALESCE does the same thing
SELECT AVG(COALESCE(rating, 0)) AS avg_with_zeros FROM books;

GROUP BY — Per-Category Summaries

Without GROUP BY, aggregate functions summarize the entire table. With GROUP BY, they summarize each group separately.

Basic GROUP BY


-- How many books does each author have?
SELECT author, COUNT(*) AS book_count
FROM books
GROUP BY author;

Output:


+----------------------+------------+
| author               | book_count |
+----------------------+------------+
| Aldous Huxley        |          1 |
| F. Scott Fitzgerald  |          1 |
| Frank Herbert        |          1 |
| George Orwell        |          1 |
| Harper Lee           |          1 |
| J.D. Salinger        |          1 |
| J.R.R. Tolkien       |          1 |
| Jane Austen          |          1 |
| Ray Bradbury         |          1 |
| J.R.R. Tolkien       |          1 |
+----------------------+------------+


-- How many customers are in each city?
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city;

Output:


+-------------+----------------+
| city        | customer_count |
+-------------+----------------+
| Chicago     |              2 |
| Houston     |              1 |
| Los Angeles |              1 |
| New York    |              2 |
+-------------+----------------+

GROUP BY with Multiple Aggregates


-- Price stats by in_stock status
SELECT
    in_stock,
    COUNT(*) AS book_count,
    ROUND(AVG(price), 2) AS avg_price,
    MIN(price) AS cheapest,
    MAX(price) AS most_expensive
FROM books
GROUP BY in_stock;

Output:


+----------+------------+-----------+----------+----------------+
| in_stock | book_count | avg_price | cheapest | most_expensive |
+----------+------------+-----------+----------+----------------+
|        0 |          3 |     10.49 |     8.99 |          11.99 |
|        1 |          7 |     11.13 |     7.99 |          15.99 |
+----------+------------+-----------+----------+----------------+

GROUP BY with ORDER BY

You can sort the grouped results — often you'll sort by the aggregate value:


-- Cities with the most customers first
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city
ORDER BY customer_count DESC;


-- Authors ranked by average book price
SELECT
    author,
    COUNT(*) AS books,
    ROUND(AVG(price), 2) AS avg_price
FROM books
GROUP BY author
ORDER BY avg_price DESC;

GROUP BY with LIMIT


-- Top 3 cities by customer count
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city
ORDER BY customer_count DESC
LIMIT 3;

Grouping by Multiple Columns

You can group by more than one column for finer-grained categories:


-- Count books by in_stock status AND price range
SELECT
    in_stock,
    CASE
        WHEN price < 10 THEN 'Under $10'
        WHEN price < 15 THEN '$10-$15'
        ELSE '$15+'
    END AS price_range,
    COUNT(*) AS book_count
FROM books
GROUP BY in_stock, price_range
ORDER BY in_stock DESC, price_range;

💡 The CASE Expression

Don't worry about CASE too much right now — it's like an if/else in SQL. We used it here to create price range labels. The important thing is that GROUP BY can group by any expression, not just raw column names.

HAVING — Filtering Groups

WHERE filters individual rows before grouping. HAVING filters groups after grouping. This is one of the most important distinctions in SQL.

Basic HAVING


-- Show only cities with MORE than 1 customer
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city
HAVING customer_count > 1;

Output:


+----------+----------------+
| city     | customer_count |
+----------+----------------+
| Chicago  |              2 |
| New York |              2 |
+----------+----------------+

Houston (1 customer) and Los Angeles (1 customer) are excluded because they don't satisfy HAVING customer_count > 1.

WHERE vs HAVING

This is a critical distinction:

Feature	WHERE	HAVING
Filters	Individual rows	Groups (after GROUP BY)
When it runs	Before grouping	After grouping
Can use aggregates?	❌ No	✅ Yes
Can use column values?	✅ Yes	✅ Yes (but only grouped columns)


-- ⛔ WRONG — you can't use an aggregate in WHERE
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE COUNT(*) > 1   -- ERROR! Aggregates not allowed in WHERE
GROUP BY city;

-- ✅ CORRECT — use HAVING for aggregate conditions
SELECT city, COUNT(*) AS customer_count
FROM customers
GROUP BY city
HAVING COUNT(*) > 1;

Using WHERE and HAVING Together

Often you need both — WHERE to filter rows before grouping, and HAVING to filter groups after:


-- Among in-stock books, find price ranges with more than 2 books
SELECT
    CASE
        WHEN price < 10 THEN 'Under $10'
        WHEN price < 13 THEN '$10-$13'
        ELSE '$13+'
    END AS price_range,
    COUNT(*) AS book_count,
    ROUND(AVG(price), 2) AS avg_price
FROM books
WHERE in_stock = TRUE          -- filter rows FIRST
GROUP BY price_range
HAVING book_count > 2          -- then filter groups
ORDER BY avg_price;

The execution order is: WHERE removes out-of-stock books → GROUP BY groups the remaining books by price range → HAVING removes groups with 2 or fewer books.

More HAVING Examples


-- Authors whose average book price is above $12
SELECT
    author,
    ROUND(AVG(price), 2) AS avg_price
FROM books
GROUP BY author
HAVING avg_price > 12
ORDER BY avg_price DESC;

-- Cities where total customer spending would exceed $100
-- (hypothetical example with an orders table)
-- SELECT city, SUM(order_total) AS total_spent
-- FROM customers JOIN orders ON customers.id = orders.customer_id
-- GROUP BY city
-- HAVING total_spent > 100;

✅ Rule of Thumb

If your condition involves a column value (like price > 10), use WHERE.
If your condition involves an aggregate (like COUNT(*) > 1 or AVG(price) > 12), use HAVING.

Updated Clause Order

With GROUP BY and HAVING added, the full clause order becomes:

graph TD A["SELECT columns / aggregates"] --> B["FROM table"] B --> C["WHERE row conditions"] C --> D["GROUP BY columns"] D --> E["HAVING group conditions"] E --> F["ORDER BY columns"] F --> G["LIMIT count"] G --> H["OFFSET skip"]


-- The full pattern in action:
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL         -- 1. Filter rows
GROUP BY city                  -- 2. Group remaining rows
HAVING customer_count > 1      -- 3. Filter groups
ORDER BY customer_count DESC   -- 4. Sort results
LIMIT 5;                       -- 5. Limit output

✅ Updated Memory Aid

The full clause order: S-F-W-G-H-O-L

Select → From → Where → Group By → Having → Order By → Limit

Or remember: "Silly Frogs Will Gobble Hamburgers On Lily-pads."

Common Mistakes

Mistake 1: Non-Aggregated Columns Without GROUP BY


-- ⛔ WRONG — title is not aggregated or grouped
SELECT title, COUNT(*) FROM books;

-- ✅ CORRECT — group by the non-aggregated column
SELECT title, COUNT(*) AS count FROM books
GROUP BY title;

-- ✅ Also correct — aggregate everything, no group needed
SELECT COUNT(*) AS total FROM books;

⚠️ The GROUP BY Rule

Every column in SELECT must either be: (a) inside an aggregate function, or (b) listed in GROUP BY. MySQL's default mode may allow violations of this rule without an error, but the results will be unpredictable. Always follow this rule.

Mistake 2: Using Aggregates in WHERE


-- ⛔ WRONG — WHERE can't use aggregates
SELECT author FROM books
WHERE COUNT(*) > 1
GROUP BY author;

-- ✅ CORRECT — use HAVING
SELECT author, COUNT(*) AS book_count FROM books
GROUP BY author
HAVING book_count > 1;

Mistake 3: Forgetting That AVG Ignores NULL


-- If you have ratings: 5, 3, NULL, 4
-- AVG(rating) = 4.0 (divides by 3, not 4)
-- This might not be what you want!

-- If NULLs should count as zero:
SELECT AVG(IFNULL(rating, 0)) AS avg_rating FROM books;

Mistake 4: Wrong Clause Order


-- ⛔ WRONG — HAVING before GROUP BY
SELECT city, COUNT(*) FROM customers
HAVING COUNT(*) > 1
GROUP BY city;

-- ✅ CORRECT
SELECT city, COUNT(*) FROM customers
GROUP BY city
HAVING COUNT(*) > 1;

Exercises

🏋️ Exercise 1: Basic Aggregates

Write queries to find:

The total number of books in the database
The average book price (rounded to 2 decimal places)
The total number of pages across all books
The cheapest and most expensive book prices in a single query
How many books are currently in stock

✅ Solution


-- 1
SELECT COUNT(*) AS total_books FROM books;

-- 2
SELECT ROUND(AVG(price), 2) AS avg_price FROM books;

-- 3
SELECT SUM(pages) AS total_pages FROM books;

-- 4
SELECT
    MIN(price) AS cheapest,
    MAX(price) AS most_expensive
FROM books;

-- 5
SELECT COUNT(*) AS in_stock_count FROM books
WHERE in_stock = TRUE;

🏋️ Exercise 2: GROUP BY

Write queries to:

Count the number of customers in each city (exclude NULL cities)
Find the average book price for in-stock vs. out-of-stock books
List each city and its customer count, sorted by count (highest first)

✅ Solution


-- 1
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city;

-- 2
SELECT
    in_stock,
    COUNT(*) AS book_count,
    ROUND(AVG(price), 2) AS avg_price
FROM books
GROUP BY in_stock;

-- 3
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city
ORDER BY customer_count DESC;

🏋️ Exercise 3: HAVING

Write queries to:

Show only cities with more than 1 customer
Among in-stock books, find which price ranges (under $10, $10-$15, $15+) have at least 2 books
List authors whose books have an average page count above 250

✅ Solution


-- 1
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city
HAVING customer_count > 1;

-- 2
SELECT
    CASE
        WHEN price < 10 THEN 'Under $10'
        WHEN price < 15 THEN '$10-$15'
        ELSE '$15+'
    END AS price_range,
    COUNT(*) AS book_count
FROM books
WHERE in_stock = TRUE
GROUP BY price_range
HAVING book_count >= 2;

-- 3
SELECT
    author,
    ROUND(AVG(pages)) AS avg_pages
FROM books
GROUP BY author
HAVING avg_pages > 250;

🏋️ Exercise 4: Combined Challenge

Write a single query that:

Only considers in-stock books
Groups them by author
Calculates the count, average price, and total pages per author
Only shows authors with an average price above $10
Sorts by average price, highest first
Returns the top 5 results

✅ Solution


SELECT
    author,
    COUNT(*) AS book_count,
    ROUND(AVG(price), 2) AS avg_price,
    SUM(pages) AS total_pages
FROM books
WHERE in_stock = TRUE
GROUP BY author
HAVING avg_price > 10
ORDER BY avg_price DESC
LIMIT 5;

🎯 Quick Quiz

Question 1: What does COUNT(*) count?

Question 2: When should you use HAVING instead of WHERE?

Question 3: How does AVG() handle NULL values?

Question 4: What is the correct clause order?

Summary

🎉 Key Takeaways

Aggregate functions collapse multiple rows into a single value
COUNT(*) counts all rows; COUNT(column) counts non-NULL values
SUM and AVG work with numeric columns; use ROUND to clean up decimals
MIN and MAX work with numbers, dates, and text
All aggregates ignore NULL except COUNT(*)
GROUP BY splits rows into groups for per-category aggregation
HAVING filters groups after aggregation (WHERE filters rows before)
Every non-aggregated column in SELECT must be in GROUP BY
Full clause order: SELECT → FROM → WHERE → GROUP BY → HAVING → ORDER BY → LIMIT
Use COUNT(DISTINCT column) to count unique values

Quick Reference

Pattern	Syntax
Count all rows	`SELECT COUNT(*) FROM table`
Count non-NULL	`SELECT COUNT(column) FROM table`
Count unique	`SELECT COUNT(DISTINCT col) FROM table`
Sum values	`SELECT SUM(col) FROM table`
Average (rounded)	`SELECT ROUND(AVG(col), 2) FROM table`
Min / Max	`SELECT MIN(col), MAX(col) FROM table`
Group summary	`SELECT col, COUNT(*) FROM t GROUP BY col`
Filter groups	`GROUP BY col HAVING COUNT(*) > N`
NULL → 0 in AVG	`AVG(IFNULL(col, 0))`

📚 Additional Resources

🚀 What's Next?

You can now summarize, group, and filter aggregate data — powerful tools for turning raw rows into meaningful insights. But what if you need a query inside another query? "Show me all books that cost more than the average price" — that requires knowing the average first, then using it as a filter. In the next lesson, you'll learn subqueries — queries nested inside other queries — to solve exactly these kinds of multi-step problems.

🎉 Congratulations!

You've unlocked the analytical power of SQL. COUNT, SUM, AVG, MIN, MAX, GROUP BY, HAVING — these are the tools that transform raw data into business intelligence!

📑 In This Lesson

What Are Aggregate Functions?

COUNT — Counting Rows

COUNT(*) — Count All Rows

Output:

Output:

COUNT(column) — Count Non-NULL Values

Output:

COUNT(DISTINCT column) — Count Unique Values

Output:

Output:

COUNT with WHERE

💡 Always Use an Alias

SUM & AVG — Totals and Averages

SUM — Adding Up Values

Output:

AVG — Calculating Averages

Output:

Output:

Multiple Aggregates in One Query

Output:

MIN & MAX — Extremes

With Numbers

Output:

With Dates

Output:

With Text

Output:

⚠️ MIN/MAX Don't Tell You Which Row

Aggregates and NULL

🔑 The NULL Rule

Output:

AVG and NULL — A Subtle Gotcha

💡 If You Want NULLs Treated as Zero

GROUP BY — Per-Category Summaries

Basic GROUP BY

Output:

Output:

GROUP BY with Multiple Aggregates

Output:

GROUP BY with ORDER BY

GROUP BY with LIMIT

Grouping by Multiple Columns

💡 The CASE Expression

HAVING — Filtering Groups

Basic HAVING

Output:

WHERE vs HAVING

Using WHERE and HAVING Together

More HAVING Examples

✅ Rule of Thumb

Updated Clause Order

✅ Updated Memory Aid

Common Mistakes

Mistake 1: Non-Aggregated Columns Without GROUP BY

⚠️ The GROUP BY Rule

Mistake 2: Using Aggregates in WHERE

Mistake 3: Forgetting That AVG Ignores NULL

Mistake 4: Wrong Clause Order

Exercises

🏋️ Exercise 1: Basic Aggregates

🏋️ Exercise 2: GROUP BY

🏋️ Exercise 3: HAVING

🏋️ Exercise 4: Combined Challenge

🎯 Quick Quiz

Summary

🎉 Key Takeaways

Quick Reference

📚 Additional Resources

🚀 What's Next?

🎉 Congratulations!