Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for MySQL By James Joyner IV · · 11 min read

Partitioning Large MySQL Tables With AI

RANGE partitioning makes huge MySQL tables manageable, but pruning and the PK rules trip people up. Here's how I design the scheme and use AI to verify partition pruning.

  • #mysql
  • #ai
  • #partitioning
  • #performance
  • #maintenance

I’ve watched plenty of teams reach for partitioning the moment a table crosses some scary row count, and most of the time they didn’t need it — a good index would have done the job. But there’s one scenario where partitioning is genuinely the right tool: a large, append-mostly table you query by date and prune by date. Event logs, time-series metrics, order history with a retention policy. For those, RANGE partitioning by date turns a brutal multi-hour DELETE of old rows into an instant metadata operation, and lets the optimizer skip entire date ranges it knows can’t match.

The trouble is that partitioning has sharp edges that don’t show up until production: a primary key constraint you didn’t anticipate, foreign keys that flat-out won’t work, and a pruning behavior that silently doesn’t kick in if your WHERE clause is shaped wrong. I design the scheme by hand and use AI to pressure-test the design and, critically, to help me read EXPLAIN output and confirm pruning is actually happening. I verify all of it on a copy of the table on a replica before it goes near the live schema, because converting a large table to partitioned is a full rebuild you don’t want to discover problems halfway through.

RANGE Partitioning by Date

The pattern that earns its keep is RANGE partitioning on a date or datetime column, with one partition per month (or week, for very high volume). Each partition is a separate physical chunk that MySQL can target or drop independently.

CREATE TABLE events (
    id          BIGINT      NOT NULL AUTO_INCREMENT,
    occurred_at DATETIME    NOT NULL,
    user_id     BIGINT      NOT NULL,
    payload     JSON,
    PRIMARY KEY (id, occurred_at)        -- note: partition key is in the PK
)
PARTITION BY RANGE (TO_DAYS(occurred_at)) (
    PARTITION p2026_04 VALUES LESS THAN (TO_DAYS('2026-05-01')),
    PARTITION p2026_05 VALUES LESS THAN (TO_DAYS('2026-06-01')),
    PARTITION p2026_06 VALUES LESS THAN (TO_DAYS('2026-07-01')),
    PARTITION pmax     VALUES LESS THAN MAXVALUE
);

I always keep a pmax catch-all partition at the end so a row with an unexpected future date never errors out — it lands in pmax until I add a proper partition for it. TO_DAYS() is one of the functions the optimizer understands for pruning; not every function qualifies, which matters in a moment.

The Primary Key Rule That Surprises Everyone

Here’s the constraint that breaks the most designs: every column in a partitioned table’s primary key (and every unique key) must include all columns of the partitioning expression. I partitioned on occurred_at, so occurred_at has to be part of the primary key — which is why the PK above is (id, occurred_at) and not just id.

This isn’t a style choice; MySQL rejects the table otherwise. The consequence is real: your formerly simple id primary key now has two columns, and any unique constraint elsewhere on the table must also include occurred_at. If your application assumed id alone was unique, it still is in practice (because id is auto-increment), but the enforced uniqueness is on the composite. I make sure that’s acceptable before committing, because retrofitting it later is another full rebuild.

And foreign keys: partitioned InnoDB tables can’t have foreign keys, in either direction. No FK from a partitioned table to another, and no FK pointing into a partitioned table. For an append-only event log that’s usually fine — you weren’t going to FK-constrain a billion-row log table anyway — but it’s a hard stop you need to know going in, not discover during the migration.

Partition Pruning: Prove It With EXPLAIN

Pruning is the entire payoff: when your WHERE clause filters on the partition key, the optimizer reads only the relevant partitions and ignores the rest. But it only works if the predicate is shaped so the optimizer can map it to partitions. EXPLAIN shows you exactly which partitions will be touched.

-- Pruning WORKS: predicate is directly on the partition column
EXPLAIN
SELECT COUNT(*) FROM events
WHERE occurred_at >= '2026-06-01' AND occurred_at < '2026-07-01';
-- partitions column shows: p2026_06
-- Pruning FAILS: wrapping the column in a function the optimizer can't invert
EXPLAIN
SELECT COUNT(*) FROM events
WHERE DATE(occurred_at) = '2026-06-15';
-- partitions column shows: p2026_04,p2026_05,p2026_06,pmax  (all of them!)

The partitions column in the EXPLAIN output is the whole truth. If it lists every partition, you’re getting zero benefit and may be slower than an unpartitioned table with a good index. The fix is almost always to rewrite the predicate as a half-open range on the bare column, exactly as in the first query.

Dropping Old Partitions vs DELETE

This is the operation that makes partitioning worth the constraints. Deleting a month of old rows the normal way scans and removes potentially millions of rows, generates a mountain of undo and binlog, and can block for a long time:

-- The slow, heavy way on a huge table
DELETE FROM events WHERE occurred_at < '2026-04-01';

With partitions, you drop the whole partition — a near-instant metadata operation that reclaims the space immediately:

-- Instant: drops the entire April partition as one operation
ALTER TABLE events DROP PARTITION p2026_04;

No row-by-row deletion, no undo log buildup, no long lock. This single capability is the reason most people who should partition end up doing it.

Automating Maintenance

A date-partitioned table needs a rolling window: add next month’s partition before data arrives, drop the oldest beyond your retention. I script it and run it from a scheduled job rather than relying on the old (and now deprecated) approach of partition-management events.

-- Add next month's partition by splitting it off the MAXVALUE catch-all
ALTER TABLE events REORGANIZE PARTITION pmax INTO (
    PARTITION p2026_07 VALUES LESS THAN (TO_DAYS('2026-08-01')),
    PARTITION pmax     VALUES LESS THAN MAXVALUE
);
# Driven from cron on the first of each month
mysql --defaults-file=/etc/mysql/maint.cnf events < /opt/maint/add_next_partition.sql
mysql --defaults-file=/etc/mysql/maint.cnf events < /opt/maint/drop_old_partition.sql

I always test the add and drop on a replica’s copy first, because REORGANIZE PARTITION on a partition that already contains rows is expensive — splitting pmax is cheap only when pmax is empty, which it is if you stay ahead of the data.

Where AI Fits

Designing the partition scheme is mostly judgment — interval size, retention window, whether the PK change is acceptable — and AI is a good thinking partner for it. But the highest-value use is reading EXPLAIN output to confirm pruning, because that’s where designs silently fail. My prompt:

You are a senior MySQL DBA. Below is the EXPLAIN for a query against a table RANGE-partitioned by month on occurred_at. Tell me whether partition pruning is occurring, which partitions are scanned, and if all partitions are scanned, identify the exact part of the WHERE clause defeating pruning and how to rewrite it. Here is the EXPLAIN: partitions: p2026_04,p2026_05,p2026_06,pmax for WHERE DATE(occurred_at) = '2026-06-15'.

A representative response:

Pruning is NOT occurring — all four partitions are scanned. The cause is DATE(occurred_at): wrapping the partition column in DATE() prevents the optimizer from mapping the predicate to specific partitions, so it conservatively reads them all. Rewrite as a half-open range on the bare column: WHERE occurred_at >= '2026-06-15' AND occurred_at < '2026-06-16'. That should prune to p2026_06 only. Verify by re-running EXPLAIN and confirming the partitions column shows a single partition.

That’s the right diagnosis and, importantly, it tells me to re-run EXPLAIN and confirm rather than trust the suggestion. I do exactly that on the replica — paste the query, read the partitions column, confirm it collapsed to one partition — before the rewrite ships. AI proposes; the optimizer’s own output decides.

Closing

Partitioning is a precise tool for a specific shape of problem: large, date-queried, date-pruned tables where dropping old data instantly is worth living with a composite primary key and no foreign keys. The two things that go wrong — the PK constraint and pruning that silently doesn’t fire — are both catchable before production if you read EXPLAIN carefully, and AI is genuinely useful for that reading. Design deliberately, verify pruning on a replica, and never assume the optimizer is doing what you intended.

For more on tuning MySQL for scale, browse the MySQL guides, and grab the EXPLAIN-reading and partition-design prompts from the prompt library.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.