Indexes & Query Optimization - MySQL Foundations

The Problem: Full Table Scans

Without an index, MySQL has one strategy for finding data: read every single row in the table and check if it matches your condition. This is called a full table scan.


-- Find a customer by email
SELECT * FROM customers WHERE email = 'alice@example.com';

With 10 rows, this is fine — MySQL checks all 10 and responds instantly. But with 5 million rows, MySQL reads all 5 million to find one match. That's like reading every page of an encyclopedia to find one topic instead of using the index in the back.

graph LR subgraph scan["Full Table Scan (no index)"] R1["Row 1 ❌"] --> R2["Row 2 ❌"] --> R3["Row 3 ❌"] --> R4["...millions more..."] --> RN["Row N ✅ Found!"] end subgraph lookup["Index Lookup"] I["Index"] -->|"Jump directly"| RN2["Row N ✅ Found!"] end

What Is an Index?

A database index is a separate data structure (usually a B-tree) that stores a sorted copy of one or more columns along with pointers back to the full rows. Think of it like the index at the back of a textbook — instead of flipping through every page, you look up "JOINs" in the index, find "page 247," and go straight there.

graph TD subgraph btree["B-Tree Index on email"] Root["d..."] Left["a... – c..."] Right["d... – z..."] L1["alice@..."] L2["bob@..."] L3["carol@..."] L4["dave@..."] L5["zara@..."] Root --> Left Root --> Right Left --> L1 Left --> L2 Left --> L3 Right --> L4 Right --> L5 end

How It Speeds Things Up

Instead of scanning every row, MySQL navigates the B-tree in O(log n) time. For a table with 1 million rows, a full scan checks 1,000,000 rows. A B-tree lookup checks roughly 20. That's the difference between waiting 5 seconds and waiting 5 milliseconds.

💡 Primary Keys Are Already Indexed

Every table's PRIMARY KEY is automatically indexed. That's why SELECT * FROM customers WHERE id = 42 is always fast — MySQL uses the primary key index (called the clustered index) to jump directly to row 42. You never need to create an index on a primary key.

The Trade-Off

Indexes aren't free. Every index:

Uses disk space — the sorted copy takes up storage
Slows down writes — every INSERT, UPDATE, or DELETE must also update each relevant index
Needs maintenance — MySQL keeps indexes in sync automatically, but more indexes = more work per write

The goal is to index columns you search on frequently without indexing everything blindly.

Creating Indexes

Syntax


-- Create an index on a single column
CREATE INDEX index_name ON table_name (column_name);

-- Example: index the email column on customers
CREATE INDEX idx_customers_email ON customers (email);

Naming Convention

A widely used pattern is idx_tablename_columnname:

idx_customers_email
idx_orders_customer_id
idx_books_author_id

The name is for your reference — MySQL doesn't care what you call it, but consistent naming makes maintenance much easier.

Index on a Foreign Key

Foreign key columns are prime candidates for indexes because you JOIN and filter on them constantly:


-- Index the foreign key columns we use in JOINs
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
CREATE INDEX idx_books_author_id ON books (author_id);
CREATE INDEX idx_order_items_order_id ON order_items (order_id);
CREATE INDEX idx_order_items_book_id ON order_items (book_id);

💡 MySQL May Auto-Index Foreign Keys

When you define a FOREIGN KEY constraint, MySQL (InnoDB) automatically creates an index on the foreign key column if one doesn't already exist. So you may already have these indexes! Use SHOW INDEX FROM table_name to check.

Creating an Index During Table Creation

You can also define indexes inside a CREATE TABLE statement:


CREATE TABLE customers (
    id INT AUTO_INCREMENT PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) NOT NULL,
    city VARCHAR(100),
    INDEX idx_customers_email (email),
    INDEX idx_customers_city (city)
);

Composite (Multi-Column) Indexes

A composite index covers two or more columns. It's useful when you frequently filter or sort by a specific combination of columns.


-- Index on last_name + first_name (together)
CREATE INDEX idx_customers_name ON customers (last_name, first_name);

Column Order Matters — The "Left Prefix" Rule

A composite index on (last_name, first_name) can satisfy queries that filter on:

Query Filters On	Uses Index?	Why?
`WHERE last_name = 'Smith'`	✅ Yes	Matches the leftmost column
`WHERE last_name = 'Smith' AND first_name = 'Bob'`	✅ Yes	Matches both columns (full index)
`WHERE first_name = 'Bob'`	❌ No	Skips the leftmost column — can't use index

Think of it like a phone book sorted by last name, then first name. You can quickly find all the "Smiths" (left column). You can quickly find "Bob Smith" (both columns). But you can't efficiently find all the "Bobs" without scanning every entry — the book isn't sorted by first name alone.

⚠️ The Left Prefix Rule

A composite index on (A, B, C) can serve queries filtering on: A, (A, B), or (A, B, C). It cannot serve queries that skip A entirely (like filtering only on B or C). Always put the most-filtered column first.

Real-World Example


-- If you frequently run queries like:
SELECT * FROM orders
WHERE customer_id = 5 AND order_date >= '2026-01-01';

-- Create a composite index matching those columns:
CREATE INDEX idx_orders_customer_date
ON orders (customer_id, order_date);

This index helps MySQL quickly find all orders for customer 5, then narrow down by date — without scanning the whole table.

Unique Indexes

A unique index enforces uniqueness on a column (or combination of columns) while also providing the performance benefits of a regular index. It's essentially a constraint and an index in one.


-- Ensure no two customers share the same email
CREATE UNIQUE INDEX idx_customers_email_unique
ON customers (email);

-- Alternative: defined inside CREATE TABLE
CREATE TABLE customers (
    id INT AUTO_INCREMENT PRIMARY KEY,
    email VARCHAR(100) NOT NULL,
    UNIQUE INDEX idx_customers_email_unique (email)
);

If you try to INSERT a duplicate email, MySQL rejects it with an error — just like a UNIQUE constraint. The difference is that you're also getting index performance for lookups on that column.

💡 UNIQUE Constraint vs UNIQUE INDEX

In MySQL, UNIQUE constraints are implemented as unique indexes internally. Writing UNIQUE (email) or UNIQUE INDEX idx_email (email) in a CREATE TABLE statement produces the same result. They're interchangeable.

EXPLAIN — Your Query Microscope

How do you know if MySQL is actually using your index? Put EXPLAIN in front of any SELECT statement and MySQL shows you its execution plan — exactly how it intends to find your data.


EXPLAIN SELECT * FROM customers WHERE email = 'alice@example.com';

Example Output (simplified):


+----+------+----------------------+----------------------+---------+-------+------+-------+
| id | type | possible_keys        | key                  | key_len | ref   | rows | Extra |
+----+------+----------------------+----------------------+---------+-------+------+-------+
|  1 | ref  | idx_customers_email  | idx_customers_email  | 402     | const |    1 | NULL  |
+----+------+----------------------+----------------------+---------+-------+------+-------+

Key Columns in EXPLAIN Output

Column	What It Tells You	What to Look For
`type`	How MySQL accesses the table	`const`, `ref`, `range` = good. `ALL` = full table scan (bad)
`possible_keys`	Which indexes MySQL could use	Should list relevant indexes. NULL = no usable indexes
`key`	Which index MySQL actually chose	Should show your index name. NULL = no index used
`rows`	Estimated number of rows MySQL will examine	Lower is better. Compare to total table rows
`Extra`	Additional info about the strategy	`Using index` = great. `Using filesort` = extra work for sorting

The `type` Column — From Best to Worst

Type	Meaning	Speed
`system` / `const`	At most one matching row (primary key or unique lookup)	⚡ Fastest
`eq_ref`	One row per join (unique index in a JOIN)	⚡ Excellent
`ref`	Multiple matching rows via non-unique index	✅ Good
`range`	Index range scan (BETWEEN, <, >, IN)	✅ Good
`index`	Full index scan (reads entire index, not table)	⚠️ Okay
`ALL`	Full table scan — reads every row	🐌 Slowest

Before and After: Seeing the Impact


-- BEFORE index: full table scan
EXPLAIN SELECT * FROM customers WHERE city = 'Las Vegas';
-- type: ALL, rows: 50000 (checks every row)

-- Create the index
CREATE INDEX idx_customers_city ON customers (city);

-- AFTER index: index lookup
EXPLAIN SELECT * FROM customers WHERE city = 'Las Vegas';
-- type: ref, key: idx_customers_city, rows: 342 (checks only matches)

✅ The EXPLAIN Habit

Get in the habit of running EXPLAIN on any query you think might be slow. Look for: type: ALL (full table scan), key: NULL (no index used), or a high rows count. These are your optimization targets.

When to Index (and When Not To)

✅ DO Create Indexes On:

Scenario	Why
Columns in `WHERE` clauses	These are your filter conditions — the most common use
Foreign key columns	Used in JOINs constantly (MySQL may auto-create these)
Columns in `ORDER BY`	Avoids expensive "filesort" operations
Columns in `GROUP BY`	Speeds up grouping and aggregation
Columns used in JOIN conditions	Helps MySQL match rows between tables
High-cardinality columns (many unique values)	Email, username, phone — index narrows the search dramatically

❌ DON'T Create Indexes On:

Scenario	Why Not
Tiny tables (under ~1,000 rows)	Full scan is already fast — index overhead isn't worth it
Low-cardinality columns (few unique values)	A boolean `is_active` column with only TRUE/FALSE — index can't narrow much
Columns you rarely filter or sort on	No queries use them = wasted space and write overhead
Tables with heavy INSERT/UPDATE/DELETE	Every write updates every index — too many indexes slow writes
Columns with lots of NULLs	Indexes on mostly-NULL columns provide little benefit

💡 The Sweet Spot

Most tables do well with indexes on: the primary key (automatic), all foreign keys (often automatic), and 1–3 additional columns that appear in your most frequent WHERE, ORDER BY, or JOIN conditions. Start there and add more only when EXPLAIN shows a problem.

Index Killers — What Prevents Index Use

You created an index, but MySQL isn't using it? These common patterns prevent MySQL from using indexes:

1. Functions on Indexed Columns


-- ❌ BAD: Function wraps the indexed column — index can't be used
SELECT * FROM orders WHERE YEAR(order_date) = 2026;

-- ✅ GOOD: Rewrite as a range — index works
SELECT * FROM orders
WHERE order_date >= '2026-01-01' AND order_date < '2027-01-01';

2. Leading Wildcards in LIKE


-- ❌ BAD: Leading wildcard — full scan required
SELECT * FROM customers WHERE email LIKE '%@gmail.com';

-- ✅ GOOD: Trailing wildcard — index works
SELECT * FROM customers WHERE email LIKE 'alice%';

Leading % means "starts with anything" — MySQL can't use the sorted index because it doesn't know where to start looking. Trailing % is fine because the index is sorted by the beginning of the string.

3. Implicit Type Conversion


-- ❌ BAD: phone is VARCHAR, but you're comparing to an integer
SELECT * FROM customers WHERE phone = 5551234;

-- ✅ GOOD: Compare with matching type
SELECT * FROM customers WHERE phone = '5551234';

When MySQL has to convert types, it may wrap the column in a conversion function internally, breaking index usage.

4. OR Conditions on Different Columns


-- ❌ Potentially slow: OR across different columns
SELECT * FROM customers WHERE city = 'Las Vegas' OR last_name = 'Smith';

-- ✅ Better: Use UNION if both columns are indexed
SELECT * FROM customers WHERE city = 'Las Vegas'
UNION
SELECT * FROM customers WHERE last_name = 'Smith';

5. Negation


-- ❌ Often can't use index:
SELECT * FROM customers WHERE city != 'Las Vegas';
SELECT * FROM customers WHERE city NOT IN ('Las Vegas', 'Reno');

-- MySQL must scan to find all the NON-matches

⚠️ When in Doubt, EXPLAIN

Not sure if your query uses an index? Don't guess — run EXPLAIN and check the key and type columns. That's the definitive answer.

Managing Indexes

View Indexes on a Table


-- See all indexes on a table
SHOW INDEX FROM customers;

-- Shorter alternative
SHOW INDEXES FROM orders;

Drop an Index


-- Remove an index you no longer need
DROP INDEX idx_customers_city ON customers;

-- Alternative syntax
ALTER TABLE customers DROP INDEX idx_customers_city;

Rename / Rebuild

MySQL doesn't have a RENAME INDEX command (prior to 5.7). In older versions, drop and recreate. In 5.7+, you can use:


-- MySQL 5.7+
ALTER TABLE customers RENAME INDEX idx_old_name TO idx_new_name;

💡 Check Before You Create

Before creating a new index, run SHOW INDEX FROM table_name to see what already exists. Duplicate indexes waste space and slow down writes for no benefit.

General Query Optimization Tips

Indexes are the biggest lever, but these habits also help:

1. Select Only What You Need


-- ❌ BAD: Fetches all columns (even ones you don't use)
SELECT * FROM orders;

-- ✅ GOOD: Fetch only what you need
SELECT id, order_date, customer_id FROM orders;

Less data to read, transfer, and process. This matters at scale.

2. Use LIMIT for Large Results


-- Don't pull 100,000 rows if you need the top 10
SELECT title, price FROM books ORDER BY price DESC LIMIT 10;

3. Avoid SELECT DISTINCT When Unnecessary


-- DISTINCT forces MySQL to sort/deduplicate — expensive on big tables
-- Only use it when you truly have duplicates to remove
SELECT DISTINCT city FROM customers;  -- Fine: you want unique cities
SELECT DISTINCT * FROM orders;        -- Usually a design smell

4. Use EXISTS Instead of IN for Subqueries


-- ❌ IN can be slow with large subquery results
SELECT * FROM customers
WHERE id IN (SELECT customer_id FROM orders);

-- ✅ EXISTS short-circuits — stops as soon as it finds a match
SELECT * FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);

5. Keep JOINs Efficient


-- Ensure JOIN columns are indexed and use matching types
-- Index on orders.customer_id helps this JOIN enormously
SELECT c.first_name, o.order_date
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id;

✅ Performance Mindset

Don't optimize prematurely — write correct queries first, then use EXPLAIN to find actual bottlenecks. Index the columns that EXPLAIN tells you need it, not every column "just in case."

Exercises

🏋️ Exercise 1: Create Indexes

Write SQL statements to:

Create an index on the city column of the customers table
Create a composite index on (customer_id, order_date) on the orders table
Create a unique index on the email column of the customers table

✅ Solution


-- 1
CREATE INDEX idx_customers_city ON customers (city);

-- 2
CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);

-- 3
CREATE UNIQUE INDEX idx_customers_email_unique ON customers (email);

🏋️ Exercise 2: EXPLAIN Analysis

For each query below, predict whether MySQL will use an index or perform a full table scan. Then verify by running EXPLAIN.

SELECT * FROM customers WHERE id = 5;
SELECT * FROM customers WHERE city = 'Denver'; (assume no index on city)
SELECT * FROM orders WHERE customer_id = 3; (assume foreign key exists)
SELECT * FROM customers WHERE UPPER(email) = 'ALICE@EXAMPLE.COM';

✅ Solution


-- 1: Uses PRIMARY key index → type: const
EXPLAIN SELECT * FROM customers WHERE id = 5;

-- 2: No index on city → type: ALL (full table scan)
EXPLAIN SELECT * FROM customers WHERE city = 'Denver';

-- 3: Foreign key auto-index → type: ref, key: customer_id index
EXPLAIN SELECT * FROM orders WHERE customer_id = 3;

-- 4: UPPER() wraps the column → index on email CAN'T be used → type: ALL
--    Fix: WHERE email = 'alice@example.com' (MySQL collation is
--    case-insensitive by default, so UPPER is unnecessary!)
EXPLAIN SELECT * FROM customers WHERE UPPER(email) = 'ALICE@EXAMPLE.COM';

🏋️ Exercise 3: Fix the Slow Queries

Each query below has a performance problem. Rewrite it to be index-friendly:

SELECT * FROM orders WHERE YEAR(order_date) = 2026;
SELECT * FROM customers WHERE email LIKE '%smith%';
SELECT * FROM customers WHERE phone = 5551234; (phone is VARCHAR)

✅ Solution


-- 1: Replace YEAR() with a range
SELECT * FROM orders
WHERE order_date >= '2026-01-01' AND order_date < '2027-01-01';

-- 2: Leading wildcard can't use index — if you must search
-- mid-string, consider a FULLTEXT index instead:
-- CREATE FULLTEXT INDEX idx_email_ft ON customers (email);
-- SELECT * FROM customers WHERE MATCH(email) AGAINST('smith');
-- Otherwise, restructure the search (e.g., add a domain column)

-- 3: Match the column type — compare VARCHAR to a string
SELECT * FROM customers WHERE phone = '5551234';

🏋️ Exercise 4: Index Design Challenge

You have an employees table with columns: id, first_name, last_name, department, hire_date, salary, is_active.

The most common queries are:

WHERE department = 'Engineering' AND is_active = TRUE
WHERE last_name = 'Smith'
WHERE hire_date BETWEEN '2025-01-01' AND '2025-12-31'
ORDER BY salary DESC LIMIT 10

What indexes would you create? Justify each choice.

✅ Solution


-- For query 1: composite index — department first (higher cardinality)
CREATE INDEX idx_emp_dept_active ON employees (department, is_active);

-- For query 2: single column index
CREATE INDEX idx_emp_last_name ON employees (last_name);

-- For query 3: index on hire_date for range scan
CREATE INDEX idx_emp_hire_date ON employees (hire_date);

-- For query 4: index on salary for sorted access
CREATE INDEX idx_emp_salary ON employees (salary);

-- Note: is_active alone (boolean) wouldn't be a good index —
-- too few unique values. But as the second column in a
-- composite index with department, it helps narrow results
-- after department is matched.

🎯 Quick Quiz

Question 1: What happens without an index on a WHERE column?

Question 2: For a composite index on (A, B, C), which WHERE clause can use it?

Question 3: In EXPLAIN output, which type value indicates the worst performance?

Question 4: Why does WHERE YEAR(order_date) = 2026 prevent index usage?

Summary

🎉 Key Takeaways

An index is a sorted data structure (B-tree) that lets MySQL find rows without scanning the entire table
Primary keys are automatically indexed; foreign keys usually are too
CREATE INDEX idx_name ON table (column) adds a single-column index
Composite indexes cover multiple columns — column order matters (left prefix rule)
Unique indexes enforce uniqueness while providing index performance
EXPLAIN shows the query execution plan — use it to verify index usage
Look for type: ALL (bad) and key: NULL (no index used) in EXPLAIN output
Index killers: functions on columns, leading wildcards, type mismatches, OR across columns
Index columns in WHERE, JOIN ON, ORDER BY, and GROUP BY clauses
Don't over-index — each index slows down writes and uses disk space
Use SHOW INDEX FROM table to see existing indexes

Quick Reference

Command	Purpose
`CREATE INDEX idx ON tbl (col)`	Create a single-column index
`CREATE INDEX idx ON tbl (a, b)`	Create a composite index
`CREATE UNIQUE INDEX idx ON tbl (col)`	Create a unique index
`DROP INDEX idx ON tbl`	Remove an index
`SHOW INDEX FROM tbl`	View all indexes on a table
`EXPLAIN SELECT ...`	Show query execution plan

📚 Additional Resources

🚀 What's Next?

Now that you can make queries fast, it's time to make complex queries simple. In Lesson 15: Views, you'll learn to save complex multi-table JOINs as reusable virtual tables — giving your colleagues (and your future self) clean, readable access to data without rewriting 10-line queries every time.

📑 In This Lesson

The Problem: Full Table Scans

What Is an Index?

How It Speeds Things Up

💡 Primary Keys Are Already Indexed

The Trade-Off

Creating Indexes

Syntax

Naming Convention

Index on a Foreign Key

💡 MySQL May Auto-Index Foreign Keys

Creating an Index During Table Creation

Composite (Multi-Column) Indexes

Column Order Matters — The "Left Prefix" Rule

⚠️ The Left Prefix Rule

Real-World Example

Unique Indexes

💡 UNIQUE Constraint vs UNIQUE INDEX

EXPLAIN — Your Query Microscope

Example Output (simplified):

Key Columns in EXPLAIN Output

The type Column — From Best to Worst

Before and After: Seeing the Impact

✅ The EXPLAIN Habit

When to Index (and When Not To)

✅ DO Create Indexes On:

❌ DON'T Create Indexes On:

💡 The Sweet Spot

Index Killers — What Prevents Index Use

1. Functions on Indexed Columns

2. Leading Wildcards in LIKE

3. Implicit Type Conversion

4. OR Conditions on Different Columns

5. Negation

⚠️ When in Doubt, EXPLAIN

Managing Indexes

View Indexes on a Table

Drop an Index

Rename / Rebuild

💡 Check Before You Create

General Query Optimization Tips

1. Select Only What You Need

2. Use LIMIT for Large Results

3. Avoid SELECT DISTINCT When Unnecessary

4. Use EXISTS Instead of IN for Subqueries

5. Keep JOINs Efficient

✅ Performance Mindset

Exercises

🏋️ Exercise 1: Create Indexes

🏋️ Exercise 2: EXPLAIN Analysis

🏋️ Exercise 3: Fix the Slow Queries

🏋️ Exercise 4: Index Design Challenge

🎯 Quick Quiz

Summary

🎉 Key Takeaways

Quick Reference

📚 Additional Resources

🚀 What's Next?

The `type` Column — From Best to Worst