Skip to main content

๐Ÿ“ˆ Lesson 9: Aggregate Functions & GROUP BY

So far you've been retrieving individual rows โ€” "show me this book" or "list those customers." But what about questions like "how many books do we have?" or "what's the average price?" Aggregate functions collapse many rows into a single answer. Add GROUP BY and you can answer those questions per category โ€” average price per author, total sales per city, count per genre.

๐ŸŽฏ Learning Objectives

By the end of this lesson, you will be able to:

  • Use COUNT, SUM, AVG, MIN, and MAX to summarize data
  • Understand how aggregate functions handle NULL values
  • Group rows with GROUP BY to get per-category summaries
  • Filter groups with HAVING (and understand how it differs from WHERE)
  • Combine aggregates with WHERE, ORDER BY, and LIMIT
  • Avoid common aggregate mistakes

Estimated Time: 50 minutes

Prerequisites: Sorting & limiting results (Lesson 8)

๐Ÿ“‘ In This Lesson

What Are Aggregate Functions?

A regular SELECT returns one row per matching row in the table. An aggregate function takes multiple rows and collapses them into a single value โ€” a count, a sum, an average, a minimum, or a maximum.

graph LR subgraph Input["10 rows from books table"] R1["$7.99"] R2["$8.99"] R3["$8.99"] R4["$9.49"] R5["$9.99"] R6["$10.49"] R7["$10.99"] R8["$11.99"] R9["$14.99"] R10["$15.99"] end Input -->|"AVG(price)"| Result["$10.99"]

MySQL provides five core aggregate functions:

Function What It Does Example
COUNT() Counts rows How many books do we have?
SUM() Adds up values What's the total value of all books?
AVG() Calculates the average What's the average book price?
MIN() Finds the smallest value What's the cheapest book?
MAX() Finds the largest value What's the most expensive book?

COUNT โ€” Counting Rows

COUNT is the most commonly used aggregate. It comes in two forms:

COUNT(*) โ€” Count All Rows

COUNT(*) counts every row, including rows with NULL values:


-- How many books do we have?
SELECT COUNT(*) AS total_books FROM books;
                

Output:


+-------------+
| total_books |
+-------------+
|          10 |
+-------------+
                    

-- How many customers?
SELECT COUNT(*) AS total_customers FROM customers;
                

Output:


+-----------------+
| total_customers |
+-----------------+
|               8 |
+-----------------+
                    

COUNT(column) โ€” Count Non-NULL Values

COUNT(column) only counts rows where that column is not NULL:


-- How many customers have a city listed?
SELECT COUNT(city) AS customers_with_city FROM customers;
                

Output:


+---------------------+
| customers_with_city |
+---------------------+
|                   6 |
+---------------------+
                    

We have 8 customers total, but only 6 have a city โ€” 2 have NULL cities (Eve and Hank from our earlier lessons).

COUNT(DISTINCT column) โ€” Count Unique Values

Add DISTINCT to count only unique (non-duplicate) values:


-- How many different authors do we have?
SELECT COUNT(DISTINCT author) AS unique_authors FROM books;
                

Output:


+----------------+
| unique_authors |
+----------------+
|             10 |
+----------------+
                    

-- How many different cities are represented?
SELECT COUNT(DISTINCT city) AS unique_cities FROM customers;
                

Output:


+---------------+
| unique_cities |
+---------------+
|             4 |
+---------------+
                    

COUNT with WHERE

Combine COUNT with WHERE to count a subset:


-- How many books are in stock?
SELECT COUNT(*) AS in_stock_count FROM books
WHERE in_stock = TRUE;

-- How many books cost more than $10?
SELECT COUNT(*) AS expensive_books FROM books
WHERE price > 10;

-- How many books were published after 1950?
SELECT COUNT(*) AS modern_books FROM books
WHERE published > '1950-01-01';
                

๐Ÿ’ก Always Use an Alias

Without an alias, MySQL labels the result column COUNT(*) โ€” which is ugly and hard to reference. Always give your aggregates a meaningful alias with AS.

SUM & AVG โ€” Totals and Averages

SUM โ€” Adding Up Values

SUM adds all values in a column. It works with numeric columns:


-- Total value of all books (if you bought one of each)
SELECT SUM(price) AS total_value FROM books;
                

Output:


+-------------+
| total_value |
+-------------+
|      109.90 |
+-------------+
                    

-- Total pages across all books
SELECT SUM(pages) AS total_pages FROM books;

-- Total value of in-stock books only
SELECT SUM(price) AS in_stock_value FROM books
WHERE in_stock = TRUE;
                

AVG โ€” Calculating Averages

AVG returns the arithmetic mean (sum รท count):


-- Average book price
SELECT AVG(price) AS avg_price FROM books;
                

Output:


+-----------+
| avg_price |
+-----------+
| 10.990000 |
+-----------+
                    

That's a lot of decimal places! Use ROUND to clean it up:


-- Average price, rounded to 2 decimal places
SELECT ROUND(AVG(price), 2) AS avg_price FROM books;
                

Output:


+-----------+
| avg_price |
+-----------+
|     10.99 |
+-----------+
                    

-- Average page count
SELECT ROUND(AVG(pages)) AS avg_pages FROM books;

-- Average price of in-stock books
SELECT ROUND(AVG(price), 2) AS avg_in_stock_price FROM books
WHERE in_stock = TRUE;
                

Multiple Aggregates in One Query

You can use multiple aggregate functions in a single SELECT:


-- Get a complete price summary
SELECT
    COUNT(*) AS total_books,
    ROUND(SUM(price), 2) AS total_value,
    ROUND(AVG(price), 2) AS avg_price,
    MIN(price) AS cheapest,
    MAX(price) AS most_expensive
FROM books;
                

Output:


+-------------+-------------+-----------+----------+----------------+
| total_books | total_value | avg_price | cheapest | most_expensive |
+-------------+-------------+-----------+----------+----------------+
|          10 |      109.90 |     10.99 |     7.99 |          15.99 |
+-------------+-------------+-----------+----------+----------------+
                    

MIN & MAX โ€” Extremes

MIN and MAX find the smallest and largest values. They work with numbers, text, and dates:

With Numbers


-- Cheapest and most expensive books
SELECT
    MIN(price) AS cheapest,
    MAX(price) AS most_expensive
FROM books;
                

Output:


+----------+----------------+
| cheapest | most_expensive |
+----------+----------------+
|     7.99 |          15.99 |
+----------+----------------+
                    

With Dates


-- Oldest and newest books
SELECT
    MIN(published) AS oldest_published,
    MAX(published) AS newest_published
FROM books;
                

Output:


+------------------+------------------+
| oldest_published | newest_published |
+------------------+------------------+
| 1813-01-28       | 1965-08-01       |
+------------------+------------------+
                    

With Text

For text, MIN returns the first alphabetically and MAX returns the last:


-- First and last book titles alphabetically
SELECT
    MIN(title) AS first_alpha,
    MAX(title) AS last_alpha
FROM books;
                

Output:


+-------------+---------------------+
| first_alpha | last_alpha          |
+-------------+---------------------+
| 1984        | To Kill a Mockingbird |
+-------------+---------------------+
                    

โš ๏ธ MIN/MAX Don't Tell You Which Row

MIN(price) tells you the cheapest price is $7.99, but it doesn't tell you which book has that price. To get the full row, use ORDER BY + LIMIT 1 instead:


-- Which book is cheapest? (full row info)
SELECT title, price FROM books
ORDER BY price ASC LIMIT 1;
                    

In Lesson 10, you'll learn subqueries which let you combine both approaches.

Aggregates and NULL

This is one of the most important (and most misunderstood) behaviors in SQL. Here's the rule:

๐Ÿ”‘ The NULL Rule

All aggregate functions ignore NULL values โ€” except COUNT(*), which counts all rows regardless.

Let's see what this means in practice:


-- COUNT(*) vs COUNT(city) on the customers table
SELECT
    COUNT(*) AS total_rows,
    COUNT(city) AS rows_with_city
FROM customers;
                

Output:


+------------+----------------+
| total_rows | rows_with_city |
+------------+----------------+
|          8 |              6 |
+------------+----------------+
                    

COUNT(*) says 8 because it counts all rows. COUNT(city) says 6 because it skips the 2 rows where city is NULL.

AVG and NULL โ€” A Subtle Gotcha

Because AVG ignores NULLs, the average only considers non-NULL values:


-- Imagine a ratings column where some books haven't been rated yet:
-- Book A: 5, Book B: 3, Book C: NULL, Book D: 4

-- AVG(rating) = (5 + 3 + 4) / 3 = 4.0  (NOT divided by 4!)
-- The NULL is skipped entirely โ€” both from the sum AND the count
                

๐Ÿ’ก If You Want NULLs Treated as Zero

Use COALESCE or IFNULL to replace NULLs before aggregating:


-- Treat NULL ratings as 0
SELECT AVG(IFNULL(rating, 0)) AS avg_with_zeros FROM books;

-- COALESCE does the same thing
SELECT AVG(COALESCE(rating, 0)) AS avg_with_zeros FROM books;
                    

GROUP BY โ€” Per-Category Summaries

Without GROUP BY, aggregate functions summarize the entire table. With GROUP BY, they summarize each group separately.

graph TD subgraph Before["Without GROUP BY"] A1["All 10 books"] -->|"AVG(price)"| A2["$10.99"] end subgraph After["With GROUP BY author"] B1["Austen's books"] -->|"AVG(price)"| B2["$7.99"] B3["Tolkien's books"] -->|"AVG(price)"| B4["$13.49"] B5["Orwell's books"] -->|"AVG(price)"| B6["$9.99"] B7["...etc..."] -->|"AVG(price)"| B8["..."] end

Basic GROUP BY


-- How many books does each author have?
SELECT author, COUNT(*) AS book_count
FROM books
GROUP BY author;
                

Output:


+----------------------+------------+
| author               | book_count |
+----------------------+------------+
| Aldous Huxley        |          1 |
| F. Scott Fitzgerald  |          1 |
| Frank Herbert        |          1 |
| George Orwell        |          1 |
| Harper Lee           |          1 |
| J.D. Salinger        |          1 |
| J.R.R. Tolkien       |          1 |
| Jane Austen          |          1 |
| Ray Bradbury         |          1 |
| J.R.R. Tolkien       |          1 |
+----------------------+------------+
                    

-- How many customers are in each city?
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city;
                

Output:


+-------------+----------------+
| city        | customer_count |
+-------------+----------------+
| Chicago     |              2 |
| Houston     |              1 |
| Los Angeles |              1 |
| New York    |              2 |
+-------------+----------------+
                    

GROUP BY with Multiple Aggregates


-- Price stats by in_stock status
SELECT
    in_stock,
    COUNT(*) AS book_count,
    ROUND(AVG(price), 2) AS avg_price,
    MIN(price) AS cheapest,
    MAX(price) AS most_expensive
FROM books
GROUP BY in_stock;
                

Output:


+----------+------------+-----------+----------+----------------+
| in_stock | book_count | avg_price | cheapest | most_expensive |
+----------+------------+-----------+----------+----------------+
|        0 |          3 |     10.49 |     8.99 |          11.99 |
|        1 |          7 |     11.13 |     7.99 |          15.99 |
+----------+------------+-----------+----------+----------------+
                    

GROUP BY with ORDER BY

You can sort the grouped results โ€” often you'll sort by the aggregate value:


-- Cities with the most customers first
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city
ORDER BY customer_count DESC;
                

-- Authors ranked by average book price
SELECT
    author,
    COUNT(*) AS books,
    ROUND(AVG(price), 2) AS avg_price
FROM books
GROUP BY author
ORDER BY avg_price DESC;
                

GROUP BY with LIMIT


-- Top 3 cities by customer count
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city
ORDER BY customer_count DESC
LIMIT 3;
                

Grouping by Multiple Columns

You can group by more than one column for finer-grained categories:


-- Count books by in_stock status AND price range
SELECT
    in_stock,
    CASE
        WHEN price < 10 THEN 'Under $10'
        WHEN price < 15 THEN '$10-$15'
        ELSE '$15+'
    END AS price_range,
    COUNT(*) AS book_count
FROM books
GROUP BY in_stock, price_range
ORDER BY in_stock DESC, price_range;
                

๐Ÿ’ก The CASE Expression

Don't worry about CASE too much right now โ€” it's like an if/else in SQL. We used it here to create price range labels. The important thing is that GROUP BY can group by any expression, not just raw column names.

HAVING โ€” Filtering Groups

WHERE filters individual rows before grouping. HAVING filters groups after grouping. This is one of the most important distinctions in SQL.

graph LR A["All rows"] -->|"WHERE filters rows"| B["Matching rows"] B -->|"GROUP BY groups"| C["Groups"] C -->|"HAVING filters groups"| D["Matching groups"]

Basic HAVING


-- Show only cities with MORE than 1 customer
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city
HAVING customer_count > 1;
                

Output:


+----------+----------------+
| city     | customer_count |
+----------+----------------+
| Chicago  |              2 |
| New York |              2 |
+----------+----------------+
                    

Houston (1 customer) and Los Angeles (1 customer) are excluded because they don't satisfy HAVING customer_count > 1.

WHERE vs HAVING

This is a critical distinction:

Feature WHERE HAVING
Filters Individual rows Groups (after GROUP BY)
When it runs Before grouping After grouping
Can use aggregates? โŒ No โœ… Yes
Can use column values? โœ… Yes โœ… Yes (but only grouped columns)

-- โ›” WRONG โ€” you can't use an aggregate in WHERE
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE COUNT(*) > 1   -- ERROR! Aggregates not allowed in WHERE
GROUP BY city;

-- โœ… CORRECT โ€” use HAVING for aggregate conditions
SELECT city, COUNT(*) AS customer_count
FROM customers
GROUP BY city
HAVING COUNT(*) > 1;
                

Using WHERE and HAVING Together

Often you need both โ€” WHERE to filter rows before grouping, and HAVING to filter groups after:


-- Among in-stock books, find price ranges with more than 2 books
SELECT
    CASE
        WHEN price < 10 THEN 'Under $10'
        WHEN price < 13 THEN '$10-$13'
        ELSE '$13+'
    END AS price_range,
    COUNT(*) AS book_count,
    ROUND(AVG(price), 2) AS avg_price
FROM books
WHERE in_stock = TRUE          -- filter rows FIRST
GROUP BY price_range
HAVING book_count > 2          -- then filter groups
ORDER BY avg_price;
                

The execution order is: WHERE removes out-of-stock books โ†’ GROUP BY groups the remaining books by price range โ†’ HAVING removes groups with 2 or fewer books.

More HAVING Examples


-- Authors whose average book price is above $12
SELECT
    author,
    ROUND(AVG(price), 2) AS avg_price
FROM books
GROUP BY author
HAVING avg_price > 12
ORDER BY avg_price DESC;

-- Cities where total customer spending would exceed $100
-- (hypothetical example with an orders table)
-- SELECT city, SUM(order_total) AS total_spent
-- FROM customers JOIN orders ON customers.id = orders.customer_id
-- GROUP BY city
-- HAVING total_spent > 100;
                

โœ… Rule of Thumb

If your condition involves a column value (like price > 10), use WHERE.
If your condition involves an aggregate (like COUNT(*) > 1 or AVG(price) > 12), use HAVING.

Updated Clause Order

With GROUP BY and HAVING added, the full clause order becomes:

graph TD A["SELECT columns / aggregates"] --> B["FROM table"] B --> C["WHERE row conditions"] C --> D["GROUP BY columns"] D --> E["HAVING group conditions"] E --> F["ORDER BY columns"] F --> G["LIMIT count"] G --> H["OFFSET skip"]

-- The full pattern in action:
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL         -- 1. Filter rows
GROUP BY city                  -- 2. Group remaining rows
HAVING customer_count > 1      -- 3. Filter groups
ORDER BY customer_count DESC   -- 4. Sort results
LIMIT 5;                       -- 5. Limit output
                

โœ… Updated Memory Aid

The full clause order: S-F-W-G-H-O-L

Select โ†’ From โ†’ Where โ†’ Group By โ†’ Having โ†’ Order By โ†’ Limit

Or remember: "Silly Frogs Will Gobble Hamburgers On Lily-pads."

Common Mistakes

Mistake 1: Non-Aggregated Columns Without GROUP BY


-- โ›” WRONG โ€” title is not aggregated or grouped
SELECT title, COUNT(*) FROM books;

-- โœ… CORRECT โ€” group by the non-aggregated column
SELECT title, COUNT(*) AS count FROM books
GROUP BY title;

-- โœ… Also correct โ€” aggregate everything, no group needed
SELECT COUNT(*) AS total FROM books;
                

โš ๏ธ The GROUP BY Rule

Every column in SELECT must either be: (a) inside an aggregate function, or (b) listed in GROUP BY. MySQL's default mode may allow violations of this rule without an error, but the results will be unpredictable. Always follow this rule.

Mistake 2: Using Aggregates in WHERE


-- โ›” WRONG โ€” WHERE can't use aggregates
SELECT author FROM books
WHERE COUNT(*) > 1
GROUP BY author;

-- โœ… CORRECT โ€” use HAVING
SELECT author, COUNT(*) AS book_count FROM books
GROUP BY author
HAVING book_count > 1;
                

Mistake 3: Forgetting That AVG Ignores NULL


-- If you have ratings: 5, 3, NULL, 4
-- AVG(rating) = 4.0 (divides by 3, not 4)
-- This might not be what you want!

-- If NULLs should count as zero:
SELECT AVG(IFNULL(rating, 0)) AS avg_rating FROM books;
                

Mistake 4: Wrong Clause Order


-- โ›” WRONG โ€” HAVING before GROUP BY
SELECT city, COUNT(*) FROM customers
HAVING COUNT(*) > 1
GROUP BY city;

-- โœ… CORRECT
SELECT city, COUNT(*) FROM customers
GROUP BY city
HAVING COUNT(*) > 1;
                

Exercises

๐Ÿ‹๏ธ Exercise 1: Basic Aggregates

Write queries to find:

  1. The total number of books in the database
  2. The average book price (rounded to 2 decimal places)
  3. The total number of pages across all books
  4. The cheapest and most expensive book prices in a single query
  5. How many books are currently in stock
โœ… Solution

-- 1
SELECT COUNT(*) AS total_books FROM books;

-- 2
SELECT ROUND(AVG(price), 2) AS avg_price FROM books;

-- 3
SELECT SUM(pages) AS total_pages FROM books;

-- 4
SELECT
    MIN(price) AS cheapest,
    MAX(price) AS most_expensive
FROM books;

-- 5
SELECT COUNT(*) AS in_stock_count FROM books
WHERE in_stock = TRUE;
                        

๐Ÿ‹๏ธ Exercise 2: GROUP BY

Write queries to:

  1. Count the number of customers in each city (exclude NULL cities)
  2. Find the average book price for in-stock vs. out-of-stock books
  3. List each city and its customer count, sorted by count (highest first)
โœ… Solution

-- 1
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city;

-- 2
SELECT
    in_stock,
    COUNT(*) AS book_count,
    ROUND(AVG(price), 2) AS avg_price
FROM books
GROUP BY in_stock;

-- 3
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city
ORDER BY customer_count DESC;
                        

๐Ÿ‹๏ธ Exercise 3: HAVING

Write queries to:

  1. Show only cities with more than 1 customer
  2. Among in-stock books, find which price ranges (under $10, $10-$15, $15+) have at least 2 books
  3. List authors whose books have an average page count above 250
โœ… Solution

-- 1
SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city
HAVING customer_count > 1;

-- 2
SELECT
    CASE
        WHEN price < 10 THEN 'Under $10'
        WHEN price < 15 THEN '$10-$15'
        ELSE '$15+'
    END AS price_range,
    COUNT(*) AS book_count
FROM books
WHERE in_stock = TRUE
GROUP BY price_range
HAVING book_count >= 2;

-- 3
SELECT
    author,
    ROUND(AVG(pages)) AS avg_pages
FROM books
GROUP BY author
HAVING avg_pages > 250;
                        

๐Ÿ‹๏ธ Exercise 4: Combined Challenge

Write a single query that:

  • Only considers in-stock books
  • Groups them by author
  • Calculates the count, average price, and total pages per author
  • Only shows authors with an average price above $10
  • Sorts by average price, highest first
  • Returns the top 5 results
โœ… Solution

SELECT
    author,
    COUNT(*) AS book_count,
    ROUND(AVG(price), 2) AS avg_price,
    SUM(pages) AS total_pages
FROM books
WHERE in_stock = TRUE
GROUP BY author
HAVING avg_price > 10
ORDER BY avg_price DESC
LIMIT 5;
                        

๐ŸŽฏ Quick Quiz

Question 1: What does COUNT(*) count?

Question 2: When should you use HAVING instead of WHERE?

Question 3: How does AVG() handle NULL values?

Question 4: What is the correct clause order?

Summary

๐ŸŽ‰ Key Takeaways

  • Aggregate functions collapse multiple rows into a single value
  • COUNT(*) counts all rows; COUNT(column) counts non-NULL values
  • SUM and AVG work with numeric columns; use ROUND to clean up decimals
  • MIN and MAX work with numbers, dates, and text
  • All aggregates ignore NULL except COUNT(*)
  • GROUP BY splits rows into groups for per-category aggregation
  • HAVING filters groups after aggregation (WHERE filters rows before)
  • Every non-aggregated column in SELECT must be in GROUP BY
  • Full clause order: SELECT โ†’ FROM โ†’ WHERE โ†’ GROUP BY โ†’ HAVING โ†’ ORDER BY โ†’ LIMIT
  • Use COUNT(DISTINCT column) to count unique values

Quick Reference

Pattern Syntax
Count all rows SELECT COUNT(*) FROM table
Count non-NULL SELECT COUNT(column) FROM table
Count unique SELECT COUNT(DISTINCT col) FROM table
Sum values SELECT SUM(col) FROM table
Average (rounded) SELECT ROUND(AVG(col), 2) FROM table
Min / Max SELECT MIN(col), MAX(col) FROM table
Group summary SELECT col, COUNT(*) FROM t GROUP BY col
Filter groups GROUP BY col HAVING COUNT(*) > N
NULL โ†’ 0 in AVG AVG(IFNULL(col, 0))

๐Ÿ“š Additional Resources

๐Ÿš€ What's Next?

You can now summarize, group, and filter aggregate data โ€” powerful tools for turning raw rows into meaningful insights. But what if you need a query inside another query? "Show me all books that cost more than the average price" โ€” that requires knowing the average first, then using it as a filter. In the next lesson, you'll learn subqueries โ€” queries nested inside other queries โ€” to solve exactly these kinds of multi-step problems.

๐ŸŽ‰ Congratulations!

You've unlocked the analytical power of SQL. COUNT, SUM, AVG, MIN, MAX, GROUP BY, HAVING โ€” these are the tools that transform raw data into business intelligence!