You are a senior Kubernetes platform engineer with deep experience operating Helm releases in production. You know that "just delete the secret" is sometimes the right answer and sometimes catastrophic — and you can tell which is which. I will provide: - The release name and namespace, and what `helm status <release> -n <ns>` shows - `helm history <release> -n <ns>` - `helm get values <release> -n <ns>` - The chart version + repo + chart name - What the user was trying to do when the release got stuck (install, upgrade, rollback, uninstall) - The error from `helm install/upgrade` if any - Whether `--atomic`, `--wait`, `--timeout`, or `--cleanup-on-fail` was used - `kubectl get secrets -n <ns> -l owner=helm` (Helm stores release state as secrets here) Your job: 1. **Decode the release state** from `helm status`: - `deployed`: successful current revision - `failed`: last operation failed; chart is partially applied - `pending-install`: install started but didn't finish (e.g., timed out, user Ctrl-C'd) - `pending-upgrade`: same for upgrade - `pending-rollback`: rollback started, didn't finish - `superseded`: an older revision that's been replaced — OK as history - `uninstalling`: uninstall in progress - `uninstalled`: removed (may still exist if `--keep-history`) 2. **For `pending-*` states**, identify the cause: - Helm hung waiting on resources (`--wait` with a slow rollout that didn't finish) - Helm was interrupted (Ctrl-C, CI timeout, network blip) - A webhook timed out - User has been running `helm upgrade` while a previous one was still pending → "another operation in progress" lock 3. **For `failed` state**: - What did Helm install/modify before failing? `kubectl get all -l app.kubernetes.io/instance=<release>` - Is the cluster in a partial state (some new things created, some old things deleted)? - Can the chart be re-run safely (idempotent)? 4. **Recommend the recovery path in safest-first order**: - **`helm rollback <release> <revision>`** → safest if a known-good revision exists - **`helm upgrade --force <release>`** → re-applies; useful for failed upgrade - **`helm upgrade --reset-values --reuse-values`** → trick to reset state with same values - **`helm uninstall <release> --no-hooks`** → removes everything; useful when chart is unrecoverable - **DELETE the release secret directly** (`kubectl delete secret -n <ns> sh.helm.release.v1.<release>.<revision>`) → DESTRUCTIVE: Helm "forgets" the release; resources remain orphaned in the cluster - **EDIT the release secret status field** (advanced) → change `pending-upgrade` to `deployed` to unblock 5. **For "another operation in progress" lock**: identify whether a real operation is actually in flight (rare) vs. a stale lock (common). Then unstick. 6. **For Helm-secret bloat** (`max history of 10 secrets in a critical chart`): - Helm stores each revision as a separate Secret in the namespace - Secrets > ~1MB cause issues with `kubectl apply` (annotations) and etcd - Mitigation: `helm history --max=10` + occasional cleanup of pre-`max` revisions - For very large charts: use `--history-max` flag at install time 7. **For `--atomic` operations that failed mid-flight**: - Helm tries to rollback automatically - If that rollback fails (e.g., the chart in the previous revision is now incompatible), you can end up in `pending-rollback` - Recovery typically: identify whether the partial state is closer to the "old" or "new" version, then converge with `helm upgrade --force` 8. Mark every DESTRUCTIVE action explicitly. --- Release name + namespace: [DESCRIBE] Chart + version: [e.g., bitnami/postgresql 14.3.0] `helm status <release>`: ``` [PASTE] ``` `helm history <release>`: ``` [PASTE] ``` What the user was doing when it got stuck: [DESCRIBE] Error from helm install/upgrade: ``` [PASTE] ``` Live resources matching the release: ``` [PASTE kubectl get all -l app.kubernetes.io/instance=<release> -n <ns>] ```

Why this prompt works

Helm errors are confusing because Helm stores state in Kubernetes (release secrets) AND modifies cluster resources, and those two views can diverge. “Stuck pending-upgrade” doesn’t mean Helm is busy — it usually means Helm thinks it’s busy because nobody told it the previous operation gave up. This prompt forces an inventory: what does Helm think, what’s actually in the cluster, and what’s the safest reconciliation?

How to use it

Always run helm get values <release> > backup.yaml before any destructive recovery. Values are often un-version-controlled state.
Always run helm history <release> to see if there’s a known-good revision to roll back to.
Check live cluster state separately: kubectl get all,pvc,configmap,secret -l app.kubernetes.io/instance=<release>. Confirm what’s actually there before letting Helm “reconcile.”
Distinguish “stuck” from “slow”: a release pending-upgrade 10 seconds after an upgrade is normal; 10 minutes is stuck.

Useful commands

# Inventory
helm list -A
helm list -A --pending           # only stuck releases
helm status <release> -n <ns>
helm history <release> -n <ns>

# Get state
helm get values <release> -n <ns> > current-values.yaml
helm get values <release> -n <ns> --all > all-values.yaml   # including defaults
helm get manifest <release> -n <ns> > current-manifest.yaml
helm get notes <release> -n <ns>
helm get hooks <release> -n <ns>

# Helm release secrets (state storage)
kubectl get secrets -n <ns> -l owner=helm
# A release with 5 revisions has 5 secrets named sh.helm.release.v1.<release>.v1..v5
kubectl describe secret -n <ns> sh.helm.release.v1.<release>.v3 | head

# Live resources tied to release
kubectl get all,pvc,configmap,secret -n <ns> -l app.kubernetes.io/instance=<release>

# Recovery options (safe → less safe)
helm rollback <release> <revision> -n <ns>
helm rollback <release> -n <ns>                           # to previous

helm upgrade <release> <chart> -n <ns> --reuse-values --force
helm upgrade <release> <chart> -n <ns> --atomic --timeout 5m

# Last resort
helm uninstall <release> -n <ns>                          # destroys release
helm uninstall <release> -n <ns> --no-hooks               # skip pre/post hooks
kubectl delete secret -n <ns> sh.helm.release.v1.<release>.v<N>   # surgical "Helm forget"

# "Another operation in progress" recovery (stale lock)
# 1. Confirm no real operation is in flight (no Helm CLI running)
# 2. Find the pending release secret:
kubectl get secrets -n <ns> -l owner=helm,status=pending-upgrade -o json | \
  jq -r '.items[] | .metadata.name'
# 3. Either edit status field (advanced) or delete that one secret to release the lock
# WARNING: deleting loses the upgrade attempt's recorded values

# Cleanup old revisions (after stability)
helm history <release> -n <ns> --max 5
# Helm by default keeps 10; you can force cleanup with:
kubectl delete secret -n <ns> sh.helm.release.v1.<release>.v<old-revision>

Recovery decision tree

helm status <release> shows:
│
├── deployed      → no recovery needed; you have a working release
│
├── failed        → recoverable
│   ├── Cluster state matches old version: helm rollback <release>
│   ├── Cluster state mostly matches new version: helm upgrade --force
│   └── Both directions look ugly: helm uninstall + reinstall (data loss risk)
│
├── pending-install      → install never finished
│   ├── Real operation in flight: WAIT
│   ├── Stale lock: delete the pending release secret OR retry install
│   └── If install partially created resources: clean them OR include in next install
│
├── pending-upgrade     → upgrade never finished
│   ├── Real operation in flight: WAIT
│   ├── Stale lock: identify last good revision; rollback OR delete pending secret + retry
│
├── pending-rollback   → previous rollback didn't finish
│   ├── Identify intended target revision; resume manually if possible
│   └── Worst case: delete pending-rollback secret + helm rollback to a known good
│
└── uninstalling       → uninstall in progress; usually just slow
    └── If stuck > 30 min: check for hooks blocking; --no-hooks retry

Common findings this catches

“Another operation in progress” but nothing is actually running → stale pending-upgrade lock from a CI job that was killed. Delete that revision’s secret (and possibly that revision’s resources if partially applied).
helm rollback succeeds but resources don’t change → the chart’s helm.sh/resource-policy: keep annotation kept old resources around; kubectl delete them manually before rollback.
Release failed because a CRD wasn’t installed before its CR → install CRDs separately (or use --skip-crds=false + crds/ dir in chart) before the main chart.
PVC deleted on helm uninstall because chart didn’t have helm.sh/resource-policy: keep on the PVC template. Data loss. Restore from backup.
Helm release secret > 1MB causing kubectl apply annotation issues → chart has too many large resources; consider splitting into sub-charts.
helm upgrade --atomic failed and auto-rollback also failed → ended in pending-rollback. Manual rebuild required.

Helm release secret status values

pending-install, pending-upgrade, pending-rollback, deployed, failed, superseded, uninstalling, uninstalled.

To edit (advanced):

SECRET=sh.helm.release.v1.<release>.v<N>
kubectl get secret $SECRET -n <ns> -o json | \
  jq '.metadata.labels.status="deployed"' | \
  kubectl apply -f -
# Helm 3 also encodes status inside the binary release data; CLI tools like
# `helm-mapkubeapis` and the secret-edit approach require care

Preventive practices

Pin chart versions in CI (--version); never helm upgrade to “latest” implicitly.
Use --atomic --timeout 10m in CI to get clean failure modes.
Set --history-max 20 at install; clean up older revisions periodically.
Use helm.sh/resource-policy: keep annotation on PVCs and irreplaceable resources.
Commit values.yaml for every environment to git; never rely solely on --set flags in operator memory.

When to escalate

Production release stuck and rollback target’s resources are gone (deleted out-of-band) — engage chart owner; manual reconstruction.
A Helm chart whose --cleanup-on-fail left orphan PVCs with production data — escalate; data recovery is the priority over cleanup.
Multi-chart releases where one chart’s resources depend on another’s — coordinate the recovery; piecemeal fixes can compound.

Reading prompts? Get all 500 in one free PDF

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response

Instant PDF download — yours free, forever

Plus one practical AI-workflow email a week (no spam)

Helm Release Rollback & Stuck Release Debug Prompt

Why this prompt works

How to use it

Useful commands

Recovery decision tree

Common findings this catches

Helm release secret status values

Preventive practices

When to escalate

Related prompts

Helm Chart Review Prompt

Kubernetes Pod Troubleshooting Prompt

Kubernetes RBAC Audit Prompt

Helm Secrets + SOPS Encrypted Values Workflow Prompt

Reading prompts? Get all 500 in one free PDF