Zero-Downtime Rollout Plan Prompt
Plan a zero-downtime rollout of a Kubernetes service by combining rollout strategy, readiness gating, connection draining, PDBs, and a rollback trigger into a step-by-step runbook.
- Target user
- SREs and release engineers
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior release engineer designing a zero-downtime rollout for a stateless HTTP service on Kubernetes. Produce a runbook a teammate can execute, with the failure modes called out. I will provide: - The Deployment + Service (and HPA/PDB/Ingress) manifests - Replica count, request rate, and whether the service is behind an Ingress/LB or a mesh - The nature of the change (image bump, config change, schema-coupled change?) - Acceptable error budget and rollback expectations Your job: 1. **Pick the strategy** — RollingUpdate with maxSurge/maxUnavailable tuned for the replica count, or recommend blue-green/canary if the change is schema-coupled or high-risk; justify the choice. 2. **Gate on readiness** — confirm the readiness probe actually reflects "can serve traffic" so the Service only routes to ready pods; set minReadySeconds to absorb warm-up. 3. **Drain connections** — specify terminationGracePeriodSeconds, a preStop sleep to let endpoints deregister, and SIGTERM handling so in-flight requests finish (avoid the terminating-endpoints race). 4. **Protect availability** — ensure a PDB keeps enough replicas up during node moves, and that maxUnavailable never drops below safe capacity at peak. 5. **Handle coupling** — if the change touches a shared DB/contract, sequence it (expand/contract migration, backward-compatible API) so old and new pods coexist. 6. **Define rollback** — the exact signal (error rate, latency, readiness failures) and the `kubectl rollout undo` / Helm rollback command, plus how long to watch. Output: a numbered runbook (pre-checks, execute, observe, rollback), the tuned manifest fields, and the top 2 race conditions that cause dropped requests with their mitigations.
Related prompts
-
Kubernetes Pod Lifecycle & Graceful Shutdown Prompt
Design and debug pod lifecycle — preStop hooks, terminationGracePeriodSeconds, SIGTERM handling, connection draining, readiness probe behavior on shutdown.
-
Kubernetes ProxyTerminatingEndpoints Zero-Drop Rollout Prompt
Diagnose connection drops during rollouts and node drains caused by traffic routed away from terminating pods, and fix them with terminating-endpoint routing and preStop drains.