Pub/Sub Delivery, Ordering & Dead-Letter Debug Prompt

Diagnose Pub/Sub message loss, duplicate delivery, ordering failures, and growing backlogs by reasoning from subscription metrics and config instead of guessing at ack deadlines.

Target user

Backend and platform engineers running event-driven systems on GCP

Difficulty

Intermediate

Tools

Claude, ChatGPT, Cursor

You are a senior backend engineer who has run Pub/Sub at scale and knows its delivery semantics cold: at-least-once by default, redelivery on missed acks, ordering only within an ordering key on an ordered subscription, and dead-letter topics for poison messages. You reason from the subscription's metrics and configuration, not from assumptions. I will provide: - The subscription config: [`gcloud pubsub subscriptions describe SUB --format=yaml` — ackDeadline, retryPolicy, deadLetterPolicy, enableMessageOrdering, expirationPolicy, flow control] - The symptom: [DUPLICATES / MISSING MESSAGES / OUT-OF-ORDER / GROWING BACKLOG / MESSAGES STUCK] - Relevant metrics: [num_undelivered_messages, oldest_unacked_message_age, ack/expired counts, dead_letter_message_count over time] - The consumer behavior: [how long processing takes, whether it acks before or after the work, pull vs. push, concurrency] Your job: 1. **Classify the symptom** — map it to a delivery semantic. Duplicates are usually missed-ack redelivery or non-idempotent consumers; missing messages are usually expiration or a dead-letter routing; out-of-order means ordering isn't actually enabled or keys aren't set; a growing backlog means consume rate < publish rate or messages are being nacked in a loop. 2. **Ack deadline math** — compare the ackDeadline (and whether the client extends it via lease management) to the real processing time. If processing exceeds the deadline without extension, messages get redelivered — that's the duplicate source. State the specific mismatch. 3. **Ordering reality check** — confirm enableMessageOrdering is set on the subscription AND the publisher sets an ordering key. Ordering across keys is never guaranteed; flag if the consumer assumes global order. 4. **Poison-message handling** — check the deadLetterPolicy and maxDeliveryAttempts. If messages loop without a DLQ, they block the ordering key or burn redelivery forever. Recommend a DLQ config and the IAM the service agent needs to publish to it. 5. **Backlog plan** — if num_undelivered is growing, identify whether to scale consumers, increase flow-control limits, or fix a nack loop, and how to drain safely. Output: (a) the most likely root cause with the metric/config evidence, (b) the specific config or consumer change, (c) the gcloud or client-code snippet, (d) idempotency guidance if duplicates are in play, (e) what to monitor to confirm the fix. Default to at-least-once reasoning — assume duplicates are possible and recommend idempotent consumers rather than fighting redelivery. Flag any change that could drop messages (shortening expiration, removing a DLQ) for me to confirm.

Why this prompt works

Pub/Sub bugs almost always come from a mismatch between what the system actually guarantees and what the consumer assumes. The platform is at-least-once and unordered by default, yet teams write code expecting exactly-once, in-order delivery and then chase “phantom duplicates” for days. This prompt anchors the diagnosis in the subscription config and the live metrics — ackDeadline, oldest_unacked_message_age, dead_letter_message_count — so the model classifies the symptom against the real semantics instead of speculating. The ack-deadline-versus-processing-time math alone resolves the most common Pub/Sub incident there is.

The steps mirror how an experienced engineer triages an event-driven incident. Each symptom class maps to a specific semantic: duplicates to missed acks, missing messages to expiration or dead-lettering, out-of-order to ordering that was never truly enabled. By forcing the model to confirm that ordering is set on both the subscription and the publisher key, it catches the silent failure where someone assumes order they never configured. And the dead-letter check addresses the failure mode that quietly poisons an entire ordering key.

The guardrails exist because the dangerous Pub/Sub fixes are the ones that drop data. Shortening expiration or removing a dead-letter policy can permanently lose messages, so the prompt requires confirmation before either. By defaulting to at-least-once reasoning and pushing toward idempotent consumers rather than fighting redelivery, the prompt steers toward the fix that actually holds up in production instead of a brittle attempt to defeat the platform’s own semantics.

Pub/Sub Delivery, Ordering & Dead-Letter Debug Prompt

Why this prompt works

Related prompts

Cloud Run & Cloud Functions Cold-Start & Failure Debug Prompt

Why this prompt works

Related prompts

Cloud Run & Cloud Functions Cold-Start & Failure Debug Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet