Debugging Slow MySQL Queries With AI

A page that used to load in 80ms starts taking three seconds, and the on-call channel lights up. The app team swears nothing changed. The database is “fine” by CPU and memory. But somewhere a query that used to hit an index is now scanning a few million rows, and it took a traffic spike to make it visible. This is the most common production MySQL problem I deal with, and the slow query log plus EXPLAIN is how I find it every time.

What’s changed in the last couple of years is that I no longer read raw EXPLAIN output cold. I paste it into an AI assistant along with the query and the table definitions, and let it draft the diagnosis. That doesn’t mean I trust it blindly — I verify everything on a replica — but it turns a slow, error-prone reading task into a fast review task.

Turn on the slow query log first

You can’t fix what you can’t see. On a running server, enable the slow log without a restart:

SET GLOBAL slow_query_log = 'ON';
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';
SET GLOBAL long_query_time = 0.5;
SET GLOBAL log_queries_not_using_indexes = 'ON';

A half-second threshold catches the queries that matter without drowning you. Logging queries that skip indexes is noisy but valuable on a server you’re actively investigating. Let it run through a representative window of traffic, then aggregate with pt-query-digest:

pt-query-digest /var/log/mysql/slow.log > digest.txt

The digest ranks queries by total time consumed, which is what you want — a query that runs 50,000 times at 40ms each hurts more than one that runs once at two seconds. Start at the top of that list.

Read EXPLAIN with help, not from scratch

Take the worst offender and get its plan. Use EXPLAIN ANALYZE (MySQL 8.0.18+ or MariaDB 10.1+) so you see actual rows and timing, not just estimates:

EXPLAIN ANALYZE
SELECT o.id, o.total, c.email
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'pending'
  AND o.created_at >= '2026-06-01'
ORDER BY o.created_at DESC
LIMIT 50;

When the plan shows type: ALL, a large rows estimate, or Using filesort and Using temporary together, you’ve found the cost. Rather than mentally walking the join order, I hand the whole picture to the model. The prompt I use:

Here is a slow MySQL query, its EXPLAIN ANALYZE output, and the SHOW CREATE TABLE for both tables. Tell me which step is the bottleneck, why the optimizer chose a full scan instead of an index, and what change (index, query rewrite, or both) would fix it. Do not assume columns or indexes that aren’t in the schema I gave you.

That last sentence matters. Without it, models happily invent an index that would be perfect — and that doesn’t exist. Giving it the real SHOW CREATE TABLE keeps it honest.

A typical answer, and how I check it

For the query above, a good AI response reads like this:

The bottleneck is the full scan on orders (type: ALL, ~2.1M rows). There’s no index supporting the status + created_at filter, so the optimizer scans everything, then sorts. A composite index on (status, created_at) lets MySQL seek directly to pending rows already ordered by created_at, eliminating both the scan and the filesort. The customers join is already keyed on the primary key and is fine.

That’s a plausible, specific answer. I don’t apply it to production. I clone the schema to a replica or a throwaway instance, add the index there, and re-run EXPLAIN ANALYZE:

ALTER TABLE orders ADD INDEX idx_status_created (status, created_at);

If the plan flips to ref on the new index and the filesort disappears, the AI was right. If it didn’t help — maybe status has low cardinality and the optimizer still prefers the scan — I’ve learned that on a replica instead of in an outage. More on index design lives in the MySQL category.

Watch for the traps AI misses

A few failure modes I’ve seen repeatedly:

Implicit type conversion. A WHERE phone = 1234567890 against a VARCHAR column silently kills index usage. The model won’t catch this unless you show it the column type, which is another reason to always include SHOW CREATE TABLE.
Leading wildcards. LIKE '%term' can’t use a B-tree index, and no amount of indexing fixes it. AI sometimes suggests an index anyway.
Stale statistics. Sometimes the plan is bad because the optimizer’s row estimates are wrong. ANALYZE TABLE orders; refreshes them, and the slow query fixes itself with no schema change. Always rule this out before adding indexes.

When the model’s suggestion is a query rewrite rather than an index, I’m extra careful. Rewriting WHERE created_at >= NOW() - INTERVAL 30 DAY differently can change results subtly. I diff row counts between the old and new query on the same dataset before trusting it.

Make it repeatable

Once you’ve done this a few times, save the prompts. I keep a small library of database-debugging prompts — one for EXPLAIN analysis, one for index review, one for “rewrite this query but prove the result set is identical.” You can browse the shared collection over at /prompts/ and adapt them to your schema.

The workflow that’s stuck for me: slow log to find the target, pt-query-digest to rank, EXPLAIN ANALYZE to get the plan, AI to draft the diagnosis, and a replica to verify before anything touches production. The AI compresses the slowest part — reading the plan and reasoning about it — but the verification step is non-negotiable. An index that helps one query can slow down writes and other queries, and only your own data on a real instance tells you the truth. Keep the human in the loop, and you get the speed of AI analysis with none of the risk of acting on a confident guess.

Turn on the slow query log first

Read EXPLAIN with help, not from scratch

A typical answer, and how I check it

Watch for the traps AI misses

Make it repeatable

Download the Free 500-Prompt DevOps AI Toolkit