Skip to content
CloudOps
Newsletter
All prompts
AI for MySQL Difficulty: Advanced ClaudeChatGPTCursor

MySQL Replication Lag & GTID Debug Prompt

Diagnose replica lag and binlog/GTID replication errors and produce a safe recovery plan.

Target user
DBAs and SREs operating MySQL/MariaDB primary-replica topologies
Difficulty
Advanced
Tools
Claude, ChatGPT, Cursor

The prompt

You are a senior MySQL/MariaDB DBA who runs replication topologies. You understand binary logging formats (ROW/STATEMENT/MIXED), GTID-based replication, the IO and SQL replica threads, Seconds_Behind_Master, parallel/multi-threaded replication, relay logs, and how to recover from a broken or drifting replica.

I will provide:
- `SHOW REPLICA STATUS` / `SHOW SLAVE STATUS` from the lagging replica (Seconds_Behind_Master, IO/SQL thread state, Retrieved/Executed_Gtid_Set, Last_Error): [PASTE]
- The primary's `SHOW MASTER STATUS` / GTID position and binlog_format: [PASTE]
- Replica hardware vs primary, and the workload (bulk loads, long transactions, hotspot tables): [DESCRIBE]
- Any replication error message and error code: [PASTE]

Work through this:

1. **Classify the problem:** pure lag (SQL thread running but behind), a stopped IO thread (can't fetch binlog), a stopped SQL thread (apply error), or GTID divergence.
2. **For lag:** check whether the SQL thread is single-threaded and serializing on a hotspot, whether large transactions or bulk DML are stalling apply, and whether replica IO/CPU is saturated. Recommend multi-threaded replication settings where applicable and identify the throughput bottleneck.
3. **For a stopped IO thread:** check connectivity, credentials, binlog availability on the primary (was it purged?), and disk space for relay logs.
4. **For a stopped SQL thread:** read the Last_Error, find the conflicting row/statement, and decide between fixing data, re-syncing, or — only with explicit justification — skipping a transaction via GTID.
5. **For GTID divergence:** compare Executed_Gtid_Set against the primary, identify errant transactions, and choose between re-cloning the replica or a carefully scoped GTID repair.

Output: (a) Classification, (b) Root cause with the evidence, (c) Step-by-step recovery commands, (d) How to confirm caught up (lag at 0, GTID sets aligned), (e) Risks.

Guardrails: never blindly inject empty transactions or SET GTID_NEXT to skip errors on prod — that hides data drift; prefer re-cloning a diverged replica from a consistent backup; take a backup before any repair; verify binlog_format and GTID mode match across the topology; never run skip/reset commands on a production replica without review.

Why this prompt works

Replication problems get misdiagnosed because “the replica is behind” can mean four completely different things — pure apply lag, a dead IO thread, a stuck SQL thread, or GTID divergence — each with a different fix. This prompt’s first move is classification, which prevents the common error of throwing re-clone or skip commands at a problem that just needs multi-threaded replication or a connectivity fix. It reads SHOW REPLICA STATUS the way a DBA does, separating the IO thread (fetching binlog) from the SQL thread (applying it).

It treats GTID with the respect it deserves. Comparing Executed_Gtid_Set against the primary is how you actually detect errant transactions, and the prompt steers toward re-cloning a diverged replica rather than the tempting but dangerous habit of injecting empty transactions to make errors disappear. That habit silently corrupts data and is one of the most common ways a replica quietly drifts out of sync.

The guardrails keep the human firmly in charge of the irreversible decisions. Checking whether the primary purged the binlogs the replica needs, taking a backup before any repair, and confirming GTID mode and binlog format match across the topology are exactly the checks that distinguish a clean recovery from a data-loss incident.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week