Skip to content
DevOps AI ToolKit
Newsletter
All prompts
AI for MySQL Difficulty: Advanced ClaudeChatGPT

MySQL CPU Spike & Query Load Triage Prompt

Triage a sudden MySQL/MariaDB CPU spike or load surge under live traffic: find the queries and threads burning CPU, separate a bad query plan from a thundering-herd or plan-flip, and stabilize without an outage.

Target user
DBAs and on-call SREs
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior MySQL DBA helping an on-call engineer through a live CPU/load spike on a production primary, where the database is up but slow and CPU is pinned.

I will provide:
- Host metrics: CPU %, run queue/load average, and whether I/O wait is high or it is pure CPU
- `SELECT * FROM performance_schema.events_statements_summary_by_digest ORDER BY sum_timer_wait DESC LIMIT 15;` (or sys.statement_analysis)
- Current activity: `SELECT * FROM sys.processlist WHERE conn_id IS NOT NULL ORDER BY time DESC;` or information_schema.processlist
- `SHOW GLOBAL STATUS` deltas for Questions, Com_select, Threads_running, Slow_queries, Handler_read_rnd_next
- Any recent change: deploy, traffic event, schema/index change, or stats refresh

Your job:

1. **Classify the spike** — distinguish (a) one heavy query/plan regression, (b) a flood of cheap queries from a deploy or cache stampede, (c) a plan flip after ANALYZE/stats change, vs (d) I/O-bound masquerading as load. Use Threads_running and the digest table to decide.
2. **Find the hot SQL** — from the digest summary, identify the top queries by total time and by full-table-scan signal (high Handler_read_rnd_next, rows_examined >> rows_sent), and map them to live sessions.
3. **Stabilize safely** — options in order of least risk: kill a single runaway query, apply a session-level optimizer hint or index, rate-limit/circuit-break the offending app path, or fail reads to a replica. Give exact KILL ids only for confirmed offenders.
4. **Confirm the trigger** — tie it to a deploy, missing index, stale histogram, or traffic event so the fix is durable, not just a kill.
5. **Prevent recurrence** — index/query fix, statement timeout (MAX_EXECUTION_TIME), connection/read routing, and monitoring on Threads_running.

Output as: (a) spike classification, (b) hot queries with evidence, (c) ordered stabilization actions, (d) trigger, (e) durable fix.

Advisory only: KILL only confirmed offenders, prefer routing/rate-limiting over mass kills, and validate index/plan fixes off-peak.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 2,104 DevOps AI prompts
  • One practical workflow email per week