Skip to main content

🧱 Lesson 5: Data Types & Constraints

Every column in a table needs a data type — it tells MySQL what kind of information goes there and how much space to reserve. Constraints add rules that keep your data clean. Choosing wisely here means fewer bugs, faster queries, and happier databases.

🎯 Learning Objectives

By the end of this lesson, you will be able to:

  • Choose the right numeric type for whole numbers, decimals, and money
  • Choose the right string type for short text, long text, and fixed-length codes
  • Store dates and times correctly with DATE, DATETIME, and TIMESTAMP
  • Understand how MySQL handles BOOLEAN values
  • Apply constraints: PRIMARY KEY, NOT NULL, UNIQUE, DEFAULT, AUTO_INCREMENT, and CHECK
  • Know when to use UNSIGNED, ENUM, and SET

Estimated Time: 45 minutes

Prerequisites: Creating databases and tables (Lesson 4)

📑 In This Lesson

Why Data Types Matter

You might wonder: why not just store everything as text? Three reasons:

graph LR A["Data Types"] --> B["🛡️ Validation"] A --> C["⚡ Performance"] A --> D["💾 Storage"] B --> E["MySQL rejects bad data
e.g., 'hello' into an INT column"] C --> F["Numbers are compared faster
than text; dates can be sorted"] D --> G["INT uses 4 bytes
VARCHAR(10) uses up to 11 bytes
TEXT uses up to 65KB"]

Think of data types like containers: you wouldn't pour soup into a paper bag or store a feather in a shipping container. The right type holds the right data efficiently.

💡 Rule of Thumb: Always pick the smallest type that can comfortably hold your data. Smaller types = less disk space, faster queries, and more rows in memory.

Numeric Types

MySQL has several numeric types, divided into integers (whole numbers) and decimals (numbers with fractional parts).

Integer Types

Type Bytes Signed Range Unsigned Range Use Case
TINYINT 1 -128 to 127 0 to 255 Age, rating (1-5), status codes
SMALLINT 2 -32,768 to 32,767 0 to 65,535 Year, small counts
MEDIUMINT 3 -8.3M to 8.3M 0 to 16.7M Medium-sized IDs
INT 4 -2.1B to 2.1B 0 to 4.2B Most IDs, counts, general use
BIGINT 8 -9.2 quintillion to 9.2 quintillion 0 to 18.4 quintillion Huge IDs, social media post IDs

-- Integer examples
CREATE TABLE integer_demo (
    id          INT AUTO_INCREMENT PRIMARY KEY,
    age         TINYINT UNSIGNED,         -- 0-255, perfect for ages
    year_born   SMALLINT,                 -- up to 32,767
    page_count  INT UNSIGNED,             -- no negative pages!
    tweet_id    BIGINT UNSIGNED           -- Twitter IDs are huge numbers
);
                

UNSIGNED — Positive Only

Adding UNSIGNED after an integer type means it only stores positive numbers (and zero), but doubles the maximum value. Use it when negative values don't make sense — ages, counts, page numbers, IDs.


-- SIGNED (default): -128 to 127
-- UNSIGNED:          0 to 255
age TINYINT UNSIGNED   -- Nobody has a negative age!
                

Decimal / Fractional Types

Type Precision Use Case
DECIMAL(M, D) Exact — M total digits, D after decimal Money, financial calculations, anything requiring exact values
FLOAT ~7 significant digits (approximate) Scientific measurements, coordinates (when slight imprecision is OK)
DOUBLE ~15 significant digits (approximate) Higher-precision scientific calculations

-- DECIMAL examples
price       DECIMAL(8, 2),    -- Up to 999,999.99
tax_rate    DECIMAL(5, 4),    -- Up to 9.9999 (e.g., 0.0825)
gpa         DECIMAL(3, 2),    -- Up to 9.99 (e.g., 3.85)
                

⚠️ Never Use FLOAT for Money

FLOAT and DOUBLE are approximate types — they can produce rounding errors. For example, 0.1 + 0.2 might equal 0.30000000000000004 in floating point. This is catastrophic for financial calculations. Always use DECIMAL for money.

💡 Reading DECIMAL(M, D)

M = total number of digits (both sides of the decimal point). D = digits after the decimal point. So DECIMAL(8, 2) means up to 6 digits before the decimal and 2 after it: 999999.99. The maximum value of M is 65, and the maximum of D is 30.

String Types

MySQL has many string types, but you'll use three most of the time: CHAR, VARCHAR, and TEXT.

Type Max Length Storage Use Case
CHAR(N) 255 characters Always N bytes (padded with spaces) Fixed-length: country codes, state abbreviations, hashes
VARCHAR(N) 65,535 characters Actual length + 1-2 bytes Variable-length: names, emails, titles, URLs
TINYTEXT 255 characters Actual length + 1 byte Short text (rarely used — VARCHAR is better)
TEXT ~65,000 characters Actual length + 2 bytes Blog posts, descriptions, long-form content
MEDIUMTEXT ~16 million characters Actual length + 3 bytes Very long documents, articles
LONGTEXT ~4 billion characters Actual length + 4 bytes Huge data (rarely needed)

CHAR vs. VARCHAR

The key difference is how they store data:

graph LR subgraph CHAR_10["CHAR(10) — Always 10 bytes"] A["'hello '"] --- B["Padded to 10"] end subgraph VARCHAR_10["VARCHAR(10) — Only what's needed"] C["'hello'"] --- D["5 bytes + 1"] end

-- CHAR — fixed length, padded with spaces
country_code  CHAR(2),          -- 'US', 'PH', 'JP' — always 2 chars
state_code    CHAR(2),          -- 'NV', 'CA', 'TX'
md5_hash      CHAR(32),         -- MD5 hashes are always 32 hex characters

-- VARCHAR — variable length, stores only what's needed
first_name    VARCHAR(50),      -- 'Ray' stores as 3 + 1 bytes
email         VARCHAR(100),     -- 'ray@email.com' stores as 13 + 1 bytes
title         VARCHAR(200),     -- Book titles vary widely in length
                

✅ When to Use CHAR vs. VARCHAR

Use CHAR when every value has the exact same length (country codes, state abbreviations, UUIDs, hashes). Use VARCHAR for everything else — it's the most common string type by far.

TEXT Types

Use TEXT for long-form content that doesn't need a length limit:


-- TEXT for long content
bio           TEXT,               -- User biography
description   TEXT,               -- Product description
article_body  MEDIUMTEXT,         -- Full blog article
                

💡 VARCHAR vs. TEXT

Both can store text, but they differ in important ways:

  • VARCHAR can have a DEFAULT value; TEXT cannot (in most cases)
  • VARCHAR can be part of an index fully; TEXT requires a prefix length
  • VARCHAR is stored inline with the row; large TEXT values are stored separately

Rule: If you know the maximum length and it's under ~1000 characters, use VARCHAR. If the content is truly unbounded (blog posts, comments, descriptions), use TEXT.

Date & Time Types

MySQL has dedicated types for dates and times — never store them as strings!

Type Format Range Use Case
DATE YYYY-MM-DD 1000-01-01 to 9999-12-31 Birthdays, publication dates, deadlines
TIME HH:MM:SS -838:59:59 to 838:59:59 Duration, time of day
DATETIME YYYY-MM-DD HH:MM:SS 1000-01-01 to 9999-12-31 Order timestamps, event start times
TIMESTAMP YYYY-MM-DD HH:MM:SS 1970-01-01 to 2038-01-19 Created/updated timestamps, audit trails
YEAR YYYY 1901 to 2155 Publication year, model year

-- Date and time examples
CREATE TABLE events (
    id           INT AUTO_INCREMENT PRIMARY KEY,
    event_name   VARCHAR(100) NOT NULL,
    event_date   DATE NOT NULL,              -- '2026-04-16'
    start_time   TIME,                       -- '14:30:00'
    created_at   DATETIME DEFAULT CURRENT_TIMESTAMP,
    updated_at   TIMESTAMP DEFAULT CURRENT_TIMESTAMP
                         ON UPDATE CURRENT_TIMESTAMP
);
                

DATETIME vs. TIMESTAMP

They look the same but behave differently:

Feature DATETIME TIMESTAMP
Storage 8 bytes 4 bytes
Range 1000–9999 1970–2038
Time zone Stores as-is (no conversion) Converts to UTC on storage, back to local on retrieval
Best for Event dates, birthdates, historical dates Record creation/modification tracking

✅ Auto-Tracking Timestamps

A very common pattern is automatically recording when a row was created and last updated:


created_at  DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at  DATETIME DEFAULT CURRENT_TIMESTAMP
                    ON UPDATE CURRENT_TIMESTAMP
                    

created_at is set once when the row is inserted. updated_at is automatically refreshed every time the row changes. You'll see this pattern in almost every production database.

⚠️ Never Store Dates as Strings

It's tempting to use VARCHAR(10) for dates like '04/16/2026', but this breaks sorting, comparisons, and date arithmetic. MySQL can't calculate "30 days from now" on a string. Always use DATE, DATETIME, or TIMESTAMP.

BOOLEAN, ENUM & SET

BOOLEAN

MySQL doesn't have a true boolean type — BOOLEAN is an alias for TINYINT(1):


-- These are equivalent:
is_active  BOOLEAN DEFAULT TRUE,
is_active  TINYINT(1) DEFAULT 1,
                

MySQL stores TRUE as 1 and FALSE as 0. You can use either in queries:


-- Both work:
SELECT * FROM books WHERE in_stock = TRUE;
SELECT * FROM books WHERE in_stock = 1;
                

ENUM — Pick One from a List

ENUM restricts a column to a predefined set of values. Only one value can be chosen:


CREATE TABLE shirts (
    id     INT AUTO_INCREMENT PRIMARY KEY,
    size   ENUM('XS', 'S', 'M', 'L', 'XL', 'XXL') NOT NULL,
    color  ENUM('red', 'blue', 'green', 'black', 'white') NOT NULL
);

-- Valid:
INSERT INTO shirts (size, color) VALUES ('M', 'blue');

-- Invalid — MySQL will reject or warn:
INSERT INTO shirts (size, color) VALUES ('XXXL', 'purple');
                

💡 ENUM Pros and Cons

Pros: Enforces valid values, very compact storage (1-2 bytes regardless of string length), self-documenting.

Cons: Adding or removing values requires ALTER TABLE, which can be slow on large tables. If the list changes frequently, a lookup table (a separate table with the valid values) is often a better choice.

SET — Pick Multiple from a List

SET is like ENUM but allows multiple values per row:


CREATE TABLE user_preferences (
    id          INT AUTO_INCREMENT PRIMARY KEY,
    username    VARCHAR(50) NOT NULL,
    interests   SET('sports', 'music', 'tech', 'gaming', 'cooking')
);

-- One interest:
INSERT INTO user_preferences (username, interests)
VALUES ('alice', 'music');

-- Multiple interests (comma-separated, no spaces):
INSERT INTO user_preferences (username, interests)
VALUES ('bob', 'tech,gaming,music');
                

📖 In Practice

SET is rarely used in modern databases. For many-to-many relationships like "a user has multiple interests," a junction table (covered in Lesson 11) is the standard and more flexible approach.

Constraints

Constraints are rules enforced by MySQL at the database level. They prevent invalid data from ever entering your tables — acting as a safety net regardless of which application or user is inserting data.

graph TD A["Constraints"] --> B["PRIMARY KEY
Unique identifier for each row"] A --> C["NOT NULL
Column must have a value"] A --> D["UNIQUE
No duplicate values"] A --> E["DEFAULT
Fallback value if none given"] A --> F["AUTO_INCREMENT
Auto-generates sequential IDs"] A --> G["CHECK
Custom validation rule"] A --> H["FOREIGN KEY
Links to another table"]

PRIMARY KEY

Every table should have a primary key — it uniquely identifies each row. A primary key is automatically NOT NULL and UNIQUE.


-- Column-level (most common for single-column keys)
CREATE TABLE products (
    id  INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL
);

-- Table-level (required for composite keys)
CREATE TABLE order_items (
    order_id   INT,
    product_id INT,
    quantity   INT NOT NULL,
    PRIMARY KEY (order_id, product_id)
);
                

💡 Composite Primary Keys

A composite primary key uses two or more columns together to identify a row. In the order_items example, neither order_id nor product_id alone is unique — but the combination of both is. This is common in junction tables for many-to-many relationships.

NOT NULL

Prevents a column from accepting NULL (empty/unknown) values. MySQL will reject any insert or update that tries to set a NOT NULL column to NULL.


-- first_name and email are required; city is optional
first_name  VARCHAR(50) NOT NULL,
email       VARCHAR(100) NOT NULL,
city        VARCHAR(50),              -- allows NULL (default behavior)
                
💡 Tip: Ask yourself: "Does this column always need a value for the row to make sense?" If yes, add NOT NULL. A customer without a name doesn't make sense — so first_name should be NOT NULL. A customer without a phone number might be fine — so phone can allow NULL.

UNIQUE

Ensures no two rows have the same value in a column. Unlike PRIMARY KEY, a table can have multiple UNIQUE columns, and UNIQUE columns can contain NULL values (but only one NULL per column).


-- Each customer must have a unique email
email       VARCHAR(100) NOT NULL UNIQUE,

-- ISBN numbers are unique across all books
isbn        VARCHAR(13) UNIQUE,

-- You can also name the constraint
CONSTRAINT uq_email UNIQUE (email)
                

DEFAULT

Provides a value when one isn't specified during an INSERT:


in_stock    BOOLEAN DEFAULT TRUE,
role        VARCHAR(20) DEFAULT 'customer',
created_at  DATETIME DEFAULT CURRENT_TIMESTAMP,
quantity    INT DEFAULT 0,
                

-- If we insert without specifying 'in_stock':
INSERT INTO books (title, author, price)
VALUES ('Dune', 'Frank Herbert', 12.99);

-- in_stock will automatically be TRUE (1)
                

AUTO_INCREMENT

Automatically generates sequential integer values (1, 2, 3, ...). Used almost exclusively for primary keys:


id  INT AUTO_INCREMENT PRIMARY KEY
                

Key behaviors:

  • You don't include id in your INSERT — MySQL assigns it automatically
  • If you delete row 5, the number 5 is not reused — the next insert gets 6
  • You can check the current value: SELECT LAST_INSERT_ID();
  • Only one AUTO_INCREMENT column per table, and it must be indexed (usually as PRIMARY KEY)

CHECK (MySQL 8.0.16+)

Enforces a custom condition on column values:


CREATE TABLE products (
    id       INT AUTO_INCREMENT PRIMARY KEY,
    name     VARCHAR(100) NOT NULL,
    price    DECIMAL(8, 2) NOT NULL,
    quantity INT NOT NULL DEFAULT 0,

    -- Prices must be positive
    CONSTRAINT chk_price CHECK (price > 0),

    -- Quantity can't be negative
    CONSTRAINT chk_quantity CHECK (quantity >= 0)
);

-- This will FAIL:
INSERT INTO products (name, price, quantity)
VALUES ('Widget', -5.00, 10);
-- ERROR: Check constraint 'chk_price' is violated.
                

-- More CHECK examples
age     TINYINT UNSIGNED CHECK (age >= 0 AND age <= 150),
rating  DECIMAL(3,2) CHECK (rating >= 0 AND rating <= 5),
status  VARCHAR(20) CHECK (status IN ('active', 'inactive', 'pending')),
                

⚠️ CHECK Constraint History

Before MySQL 8.0.16, CHECK constraints were accepted in SQL syntax but silently ignored. If you're using MySQL 8.0.16 or later (which you should be), they work as expected. Always verify your MySQL version: SELECT VERSION();

FOREIGN KEY (Preview)

Foreign keys link a column in one table to the primary key of another table. This enforces referential integrity — you can't add an order for a customer that doesn't exist. We'll cover foreign keys in depth in Lesson 12.


-- Quick preview — full explanation in Lesson 12
CREATE TABLE orders (
    id           INT AUTO_INCREMENT PRIMARY KEY,
    customer_id  INT NOT NULL,
    order_date   DATETIME DEFAULT CURRENT_TIMESTAMP,

    FOREIGN KEY (customer_id) REFERENCES customers(id)
);
                

Constraints Quick Reference

Constraint What It Enforces Can a Table Have Multiple?
PRIMARY KEY Uniquely identifies each row (NOT NULL + UNIQUE) No — one per table
NOT NULL Column must always have a value Yes
UNIQUE No duplicate values in the column Yes
DEFAULT Provides a fallback value on INSERT Yes
AUTO_INCREMENT Auto-generates sequential integer values No — one per table
CHECK Validates against a custom condition Yes
FOREIGN KEY References a row in another table Yes

Choosing the Right Type — A Practical Guide

When designing a table, use this flowchart to pick the right data type:

graph TD A["What kind of data?"] --> B{"Is it a number?"} A --> C{"Is it text?"} A --> D{"Is it a date/time?"} A --> E{"Is it true/false?"} B -->|"Whole number"| F["INT or TINYINT/SMALLINT/BIGINT
based on range needed"] B -->|"Decimal / Money"| G["DECIMAL(M,D)
NEVER use FLOAT for money"] C -->|"Fixed length ≤255"| H["CHAR(N)
Codes, hashes, abbreviations"] C -->|"Variable length ≤65K"| I["VARCHAR(N)
Names, emails, titles"] C -->|"Long text"| J["TEXT or MEDIUMTEXT
Articles, descriptions"] D -->|"Date only"| K["DATE"] D -->|"Date + time"| L["DATETIME or TIMESTAMP"] D -->|"Time only"| M["TIME"] E --> N["BOOLEAN
(stored as TINYINT)"]

Common Column Patterns

Here are types you'll use again and again in real projects:

Column Purpose Recommended Type Example
Primary key INT AUTO_INCREMENT PRIMARY KEY id INT AUTO_INCREMENT PRIMARY KEY
Person's name VARCHAR(50-100) first_name VARCHAR(50) NOT NULL
Email address VARCHAR(100) NOT NULL UNIQUE email VARCHAR(100) NOT NULL UNIQUE
Phone number VARCHAR(20) phone VARCHAR(20)
Money / price DECIMAL(10, 2) price DECIMAL(10, 2) NOT NULL
Age TINYINT UNSIGNED age TINYINT UNSIGNED
Country code CHAR(2) country_code CHAR(2)
URL VARCHAR(2083) website VARCHAR(2083)
Blog post body TEXT or MEDIUMTEXT body TEXT NOT NULL
Active/inactive flag BOOLEAN DEFAULT TRUE is_active BOOLEAN DEFAULT TRUE
Creation timestamp DATETIME DEFAULT CURRENT_TIMESTAMP created_at DATETIME DEFAULT CURRENT_TIMESTAMP
Last-modified timestamp DATETIME ... ON UPDATE updated_at DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP

💡 Why VARCHAR(20) for Phone Numbers?

Phone numbers look like numbers, but they're actually text! They can start with a 0 or +, contain dashes and spaces, and you never do math on them. '+1 (702) 555-0123' is a string, not a number. Always store phone numbers as VARCHAR.

Exercises

🏋️ Exercise 1: Pick the Right Type

For each piece of data below, choose the most appropriate MySQL data type and explain why:

  1. A customer's ZIP code (US format: "89002" or "89002-1234")
  2. The price of a product
  3. The number of items in stock
  4. A user's biography
  5. Whether an account is active or suspended
  6. A country's two-letter ISO code (like "US" or "PH")
  7. A student's GPA (0.00 to 4.00)
  8. The date a book was published
✅ Solution
  1. VARCHAR(10) — ZIP codes can have leading zeros and hyphens; they're text, not numbers
  2. DECIMAL(10, 2) NOT NULL — exact precision for money; never FLOAT
  3. INT UNSIGNED DEFAULT 0 — whole number, can't be negative
  4. TEXT — biographies can be long and have no fixed upper limit
  5. BOOLEAN DEFAULT TRUE — simple true/false toggle
  6. CHAR(2) NOT NULL — always exactly 2 characters, CHAR is perfect
  7. DECIMAL(3, 2) — holds 0.00 to 9.99, perfect for GPA
  8. DATE — date only, no time needed

🏋️ Exercise 2: Build a Table with Constraints

Create an employees table inside the bookstore database with the following requirements:

  • id — auto-incrementing primary key
  • first_name — up to 50 characters, required
  • last_name — up to 50 characters, required
  • email — up to 100 characters, required, must be unique
  • hire_date — date only, required
  • salary — up to 999,999.99, required, must be positive
  • department — one of: 'sales', 'inventory', 'management', 'support'
  • is_active — boolean, defaults to true
  • created_at — auto-set to current timestamp
✅ Solution

USE bookstore;

CREATE TABLE IF NOT EXISTS employees (
    id          INT AUTO_INCREMENT PRIMARY KEY,
    first_name  VARCHAR(50) NOT NULL,
    last_name   VARCHAR(50) NOT NULL,
    email       VARCHAR(100) NOT NULL UNIQUE,
    hire_date   DATE NOT NULL,
    salary      DECIMAL(8, 2) NOT NULL,
    department  ENUM('sales', 'inventory', 'management', 'support'),
    is_active   BOOLEAN DEFAULT TRUE,
    created_at  DATETIME DEFAULT CURRENT_TIMESTAMP,

    CONSTRAINT chk_salary CHECK (salary > 0)
);

DESCRIBE employees;
                        

🏋️ Exercise 3: Fix the Bad Table

This table has several design problems. Identify and fix them:


CREATE TABLE Orders (
    OrderID VARCHAR(10),
    CustomerName VARCHAR(500),
    Price FLOAT,
    OrderDate VARCHAR(20),
    phone INT,
    Status VARCHAR(100)
);
                    
💡 Hint

Look for: naming conventions, missing primary key, wrong data types (money, dates, phone numbers), missing constraints, oversized columns.

✅ Solution

CREATE TABLE orders (
    id             INT AUTO_INCREMENT PRIMARY KEY,
    customer_name  VARCHAR(100) NOT NULL,
    price          DECIMAL(10, 2) NOT NULL,
    order_date     DATETIME DEFAULT CURRENT_TIMESTAMP,
    phone          VARCHAR(20),
    status         ENUM('pending', 'processing', 'shipped', 'delivered', 'cancelled')
                       DEFAULT 'pending',

    CONSTRAINT chk_price CHECK (price > 0)
);
                        

Issues fixed:

  • Table and column names → lowercase snake_case
  • Added INT AUTO_INCREMENT PRIMARY KEY instead of VARCHAR(10)
  • Reduced customer_name from 500 to a reasonable 100 chars
  • FLOATDECIMAL(10, 2) for money
  • VARCHAR(20)DATETIME for dates
  • INTVARCHAR(20) for phone numbers
  • VARCHAR(100)ENUM for status (or VARCHAR with CHECK)
  • Added NOT NULL and CHECK constraints

🎯 Quick Quiz

Question 1: Which data type should you use for storing prices?

Question 2: What does UNSIGNED do to an integer type?

Question 3: Why should phone numbers be stored as VARCHAR instead of INT?

Question 4: When should you use CHAR instead of VARCHAR?

Question 5: What constraint automatically sets a column to the next sequential number?

Summary

🎉 Key Takeaways

  • Numeric types: Use INT for most whole numbers, TINYINT/SMALLINT/BIGINT when you need a specific range, and DECIMAL for exact decimals (especially money)
  • Never use FLOAT for money — it produces rounding errors
  • String types: VARCHAR for most text, CHAR for fixed-length values, TEXT for long-form content
  • Date types: DATE for dates, DATETIME for timestamps, TIMESTAMP for auto-tracking. Never store dates as strings
  • BOOLEAN is really TINYINT(1)TRUE = 1, FALSE = 0
  • ENUM restricts a column to specific values; good for stable, small lists
  • Constraints enforce data integrity: PRIMARY KEY, NOT NULL, UNIQUE, DEFAULT, AUTO_INCREMENT, CHECK
  • UNSIGNED removes negative numbers and doubles the positive range
  • Always pick the smallest type that comfortably holds your data
  • Phone numbers, ZIP codes, and other "numeric-looking" identifiers should be stored as VARCHAR

📚 Additional Resources

🚀 What's Next?

Now that you know what types of data go into tables and how to protect them with constraints, it's time to put data in and get it back out. In the next lesson, you'll learn the four fundamental operations every database developer uses daily — INSERT, SELECT, UPDATE, and DELETE — collectively known as CRUD.

🎉 Congratulations!

You now know how to choose the right data type for any column and enforce rules with constraints. This knowledge will guide every table you design from here on out!