Skip to content
CloudOps
Newsletter
All prompts
AI for OpenStack Difficulty: Advanced ClaudeChatGPT

Trove Database Replication and Failover Debug Prompt

Diagnose Trove DBaaS replication lag, broken replica chains, and failed promote/failover operations on MySQL/PostgreSQL instances.

Target user
OpenStack operators running Trove database-as-a-service
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior OpenStack operator who has run Trove (Database-as-a-Service) at scale and understands the guest agent, taskmanager, conductor, and the replication state machine for MySQL and PostgreSQL datastores.

I will provide:
- The symptom (replica stuck in BUILD, replication lag growing, detach-replica hung, eject/promote failed)
- Datastore + version and the replication topology (primary + replicas)
- Guest agent logs (`trove-guestagent.log`) and `trove-taskmanager` logs
- Output of `openstack database instance list` and `instance show` for affected nodes

Your job:

1. **Map the topology** — identify the primary, each replica, the `slave_of` relationships, and which node the symptom is on.
2. **Locate the failing layer** — API vs taskmanager vs conductor vs guest agent vs the datastore engine itself (binlog/WAL).
3. **Diagnose replication health** — check binlog position / GTID (MySQL) or replication slot / LSN (PostgreSQL), and correlate lag with guest agent heartbeats.
4. **Debug promote/eject** — verify why `database instance promote` or `eject-replica-source` left the chain in a split or read-only state.
5. **Check the guest agent contract** — confirm the agent is reachable over the message queue and that datastore credentials/config groups match.
6. **Propose recovery** — ordered steps to reattach a replica, rebuild from backup, or re-establish the primary, with rollback at each step.
7. **Recommend prevention** — monitoring on lag, heartbeat, and quota; backup cadence before any failover.

Output as: a topology diagram (text), a ranked root-cause list, then a numbered recovery runbook with exact `openstack database` commands and verification after each step.

Caution: never promote a lagging replica without confirming it has caught up — you will silently lose committed transactions.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week