AI-Assisted Composite and Covering Index Design for MySQL

I’ve inherited a lot of databases where someone’s answer to every slow query was “add an index.” Five years later the table has fourteen indexes, half of them redundant, writes are slow because every INSERT has to maintain all of them, and the queries are still slow because none of the indexes actually match the access pattern. Indexing is a design problem, not a reflex, and getting it right means thinking about column order, selectivity, and whether the index can cover the query outright.

This is exactly the kind of reasoning AI is good at drafting and bad at being trusted with. It can look at a query and a schema and propose a well-ordered composite index with a clear rationale. It cannot know your data distribution or measure the write cost. So I let it design and I do the measuring.

Column order is the whole game

A composite index on (a, b, c) is useful for queries filtering on a, on a and b, or on all three — but useless for a query filtering only on b. The leftmost-prefix rule governs everything, and the order should follow your WHERE, then ORDER BY, then the columns you select. Get this wrong and the index just sits there consuming write throughput.

Take a query like this:

SELECT id, total, created_at
FROM orders
WHERE customer_id = 8842
  AND status = 'shipped'
ORDER BY created_at DESC
LIMIT 20;

The prompt I hand to an AI assistant:

Given this query and the SHOW CREATE TABLE orders below, design the single best composite index. Explain the column order with respect to the equality predicates, the range/sort column, and any covering opportunity. List which columns are equality vs. range. Only use columns that exist in the schema I pasted.

A solid answer comes back as:

Use (customer_id, status, created_at). Both customer_id and status are equality predicates, so they go first; their order between each other doesn’t affect this query but customer_id is more selective, so lead with it. created_at comes last so MySQL can read the matching rows already sorted, avoiding a filesort for the ORDER BY ... LIMIT. To make it covering, append total and id (id is already in the index as the PK), so the query is served entirely from the index without touching the clustered row.

That reasoning is correct and worth understanding, not just copying. The covering trick — adding the selected columns so InnoDB never has to do a secondary lookup into the clustered index — is where the big wins hide.

Build it and prove it covers

I never apply this to production from a chat window. I add it on a replica:

ALTER TABLE orders
  ADD INDEX idx_cust_status_created (customer_id, status, created_at, total);

EXPLAIN
SELECT id, total, created_at
FROM orders
WHERE customer_id = 8842 AND status = 'shipped'
ORDER BY created_at DESC
LIMIT 20;

The proof is in the Extra column. Using index means the query is fully covered — no row lookups at all. If you see Using index condition plus a row fetch, it’s partially covered. If you see Using filesort, the column order is wrong for your ORDER BY. The AI’s claim is only validated when the plan says what it predicted.

Find and kill redundant indexes

Before adding anything, check what’s already there. A new composite often makes an old single-column index redundant — an index on (customer_id) is fully contained in (customer_id, status, created_at). pt-duplicate-key-checker finds these:

pt-duplicate-key-checker --host localhost --user dba --ask-pass

I feed its output to AI too, with the schema, and ask which indexes are safe to drop given the actual query patterns. But I’m careful here: an index that looks redundant might be enforcing a UNIQUE constraint or might be the only thing supporting a foreign key. Always confirm before dropping. The MySQL guides cover the foreign-key gotcha in more depth.

The cost AI can’t see

Here’s what no model knows about your system:

Write amplification. Every index slows down INSERT, UPDATE, and DELETE on the indexed columns. On a write-heavy table, a covering index with five columns can measurably hurt ingestion. You measure this with your own write benchmark, not with a guess.
Cardinality. An index on status where 95% of rows are 'completed' barely helps. The model doesn’t know your distribution unless you tell it. Run SELECT status, COUNT(*) FROM orders GROUP BY status; and include the result in your prompt.
Index size and buffer pool. A wide covering index competes for buffer pool memory with everything else. On a memory-constrained server, that trade-off can backfire.

So my prompt for the final review explicitly asks the model to weigh these:

Given this distribution of status values and that this table takes ~3,000 inserts per minute, is the covering index worth the write cost, or should I use a narrower index and accept the row lookups?

It gives a reasoned answer, and then I benchmark both options on the replica under realistic write load. Whichever wins on real numbers ships.

Keep a prompt for this

Index design comes up constantly, so I templatized it. A reusable “design the best index for this query and schema” prompt, plus a “find redundant indexes” prompt, save real time — and they enforce the discipline of always pasting the real schema and distribution. There’s a starting set at /prompts/.

The pattern is the same as everywhere else in database work: AI is a fast, knowledgeable drafter of index designs and a clear explainer of column-order reasoning. It is not a substitute for running EXPLAIN on a replica with your real data and measuring the write cost. Let it propose the index, let Using index in the plan confirm the covering claim, and let your own benchmark decide whether the trade-off is worth it. That division of labor gets you the right index instead of the fourteenth wrong one.

Column order is the whole game

Build it and prove it covers

Find and kill redundant indexes

The cost AI can’t see

Keep a prompt for this

Download the Free 500-Prompt DevOps AI Toolkit