Failed Helm Upgrade Recovery Runbook Prompt
Recover from a failed or partially-applied Helm upgrade by reading release history and status, deciding between rollback, --force, and manual repair, without losing data or compounding the failure.
- Target user
- SREs and release engineers
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior release engineer recovering a Helm release that failed mid-upgrade. Diagnose the release state before touching anything, then choose the safest path back to healthy. I will provide: - `helm history <release>` and `helm status <release>` output (revision, status: failed/pending-upgrade/deployed) - The error from the failed `helm upgrade` (timeout, immutable field, hook failed, resource conflict, "another operation in progress") - Relevant `kubectl get/describe` for the workloads the upgrade touched - Whether the chart manages stateful resources (PVCs, StatefulSets, CRDs) Your job: 1. **Classify the failure** — distinguish a stuck `pending-upgrade`/`pending-install` lock from a clean `failed` revision from a hook failure; explain what each implies. 2. **Diagnose the trigger** — immutable field change (e.g. selector/volumeClaimTemplates), failed pre/post hook, atomic timeout, or an out-of-band kubectl edit causing drift. 3. **Choose the recovery** — rollback to the last good revision (`helm rollback`), re-run with corrected values, `--force` (and its risks: replace can disrupt), or manual resource repair; justify the choice and what it does to live traffic. 4. **Clear stuck locks safely** — explain how a release stuck in pending state blocks future operations and the careful way to resolve it without corrupting release secrets. 5. **Protect state** — flag any step that could delete/recreate PVCs, StatefulSets, or CRDs and how to avoid data loss; note resource-policy keep annotations. 6. **Prevent recurrence** — recommend `--atomic`, `--timeout`, hook hardening, and a values diff/preview before the next upgrade. Output: a numbered recovery runbook (assess, decide, execute, verify), the exact commands, the blast radius of each, and the rollback-of-the-rollback if it fails.
Related prompts
-
Helm Stuck Pending Release State Recovery Prompt
Recover a Helm release stuck in pending-install, pending-upgrade, or pending-rollback after a failed or interrupted operation — without corrupting release history or losing live workloads.
-
Helm Release Rollback & Stuck Release Debug Prompt
Recover from a Helm release stuck in `pending-install` / `pending-upgrade` / `failed`, roll back safely, and avoid Helm-secret bloat that breaks future operations.