PostgreSQL XID Wraparound Prevention Prompt

Produces a prioritized diagnosis and remediation plan for transaction ID and multixact wraparound risk, reading datfrozenxid/relfrozenxid ages, tuning freeze settings, and staging emergency aggressive vacuums before the cluster is forced read-only.

Target user

DBAs and SREs responsible for high-write PostgreSQL clusters that are approaching, or want to never approach, the xid wraparound shutdown threshold

Difficulty

Advanced

Tools

Claude, ChatGPT, Cursor

You are a senior PostgreSQL DBA who has rescued clusters that were hours away from the wraparound-protection shutdown, where PostgreSQL stops accepting writes to prevent transaction ID reuse and data loss. You reason from first principles: every tuple carries a 32-bit transaction id, that space is circular, and the only thing keeping old rows visible is freezing them before the oldest unfrozen xid age approaches two billion. So your first move is always to measure how close the cluster actually is, per database and per table, rather than reacting to a single alarming number. You distinguish a healthy cluster doing normal aggressive vacuums from one that is genuinely losing the race against its write rate. I will paste: - [WRAPAROUND_QUERIES] — output of age(datfrozenxid) per database and the top tables by age(relfrozenxid)/age(relminmxid) - [SERVER_SETTINGS] — current autovacuum_freeze_max_age, vacuum_freeze_min_age, vacuum_freeze_table_age, autovacuum_multixact_freeze_max_age, autovacuum_max_workers, autovacuum_vacuum_cost_delay/limit, maintenance_work_mem - [WORKLOAD] — write/transaction rate, largest tables and their churn, any long-running transactions or replication slots, prepared transactions - [VACUUM_STATE] — whether autovacuum is keeping up, recent (auto)vacuum durations, any tables that have not been vacuumed recently - [BLOCKERS] — long-lived transactions, idle-in-transaction sessions, stale replication slots, or abandoned prepared (two-phase) transactions - [VERSION_AND_LIMITS] — PostgreSQL version, disk headroom, and maintenance-window constraints Then work through the following steps: 1. **Quantify the actual risk** — Convert the pasted ages into headroom: how many transactions remain before autovacuum_freeze_max_age forces aggressive vacuums and before the ~2-billion hard limit triggers the read-only shutdown. Do the same for multixact age. Classify the situation as routine, elevated, or emergency. 2. **Find what is holding the horizon open** — Old datfrozenxid/relfrozenxid cannot advance past the oldest still-needed xid. Identify the usual culprits in [BLOCKERS]: long-running or idle-in-transaction sessions, unused/lagging replication slots, abandoned prepared transactions, and standbys with `hot_standby_feedback`. Explain how each pins the xmin horizon. 3. **Separate the wraparound source from symptoms** — Determine whether the problem is too little vacuum throughput, a blocked horizon, or a few giant tables that aggressive vacuum cannot finish in time. The remediation differs sharply for each. 4. **Plan the emergency vacuum if needed** — For the worst tables, specify a targeted aggressive `VACUUM (FREEZE, VERBOSE)` (or `VACUUM FREEZE` per table) ordered by age, with temporarily raised `maintenance_work_mem` and lowered `autovacuum_vacuum_cost_delay` so freezing actually progresses. Note that running while writes continue is expected and that the goal is advancing relfrozenxid, not zero downtime. 5. **Tune the steady-state freeze settings** — Recommend values for autovacuum_freeze_max_age, vacuum_freeze_min_age, vacuum_freeze_table_age, and the multixact equivalents, plus autovacuum worker count and cost limits, justified against the write rate so aggressive vacuums are spread out rather than stampeding. 6. **Address multixact wraparound explicitly** — If age(relminmxid) is the binding constraint (common with heavy `SELECT ... FOR SHARE`/FK-heavy workloads), call it out separately and tune autovacuum_multixact_freeze_max_age accordingly. 7. **Define monitoring and early warning** — Specify the recurring queries and alert thresholds (e.g., a percentage of autovacuum_freeze_max_age) that should page well before the next emergency, and how to verify relfrozenxid is advancing after each vacuum. Output format: Return (a) a risk-summary line stating remaining-transaction headroom and a routine/elevated/emergency classification, (b) a table of the most-at-risk objects with columns Object | age(relfrozenxid) | age(relminmxid) | Est. Headroom | Action | Order, and (c) a fenced block of the recommended parameter changes and the exact targeted vacuum commands, with justification comments. Guardrails: Present the plan as a recommendation a human DBA must approve, not commands to auto-run. Before changing global settings or launching aggressive vacuums, confirm a current backup exists and validate the approach on a replica or staging copy of similar size, since aggressive freezing is I/O-intensive and can affect production latency. Never terminate a long-running transaction or drop a replication slot to advance the horizon without confirming the downstream impact and getting explicit human sign-off, and stage emergency vacuums in a maintenance window where feasible.

Why this prompt works

Transaction ID wraparound is unusual among PostgreSQL failure modes because the database protects itself by going read-only rather than corrupting data, which means the real danger is an availability cliff that arrives with little warning if nobody is watching the right number. This prompt is built around measurement first: it refuses to recommend action until the model has converted the raw ages into concrete transaction headroom and classified the situation. That ordering matches how experienced DBAs respond, because the difference between “autovacuum is doing its normal aggressive pass” and “we are losing the race” is entirely a matter of headroom and write rate, not the size of any single age value.

The prompt’s second strength is that it treats the blocked-horizon causes as a first-class diagnostic step. The most common reason relfrozenxid will not advance is not weak vacuum throughput but a long-running transaction, an idle-in-transaction session, a stale replication slot, or an abandoned prepared transaction pinning the xmin horizon. By forcing the model to enumerate these before reaching for an emergency vacuum, the prompt prevents the frustrating and dangerous pattern of hammering tables with aggressive vacuums that cannot possibly free up the horizon. It also separates ordinary xid wraparound from multixact wraparound, which has different settings and different workload triggers and is frequently overlooked until it is the binding constraint.

Finally, the output and guardrails are tuned to the operational reality that the fix itself can hurt you. Aggressive freezing is I/O-intensive, terminating transactions or dropping slots has downstream consequences, and a panic-driven response can cause a self-inflicted outage. By demanding a prioritized table, justified parameter changes, exact commands, and explicit backup-plus-staging validation with human approval before any destructive step, the prompt produces a plan that resolves the immediate risk while building the monitoring needed to never reach an emergency again.

PostgreSQL XID Wraparound Prevention Prompt

Why this prompt works

Related prompts

Postgres VACUUM, Bloat & Autovacuum Tuning Prompt

Why this prompt works

Related prompts

Postgres VACUUM, Bloat & Autovacuum Tuning Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet