Kubernetes Stuck Terminating & Finalizer Debug Prompt
Diagnose and safely unstick namespaces, pods, and CRs frozen in Terminating — find the responsible finalizer or controller before reaching for the dangerous force-delete.
- Target user
- Cluster operators clearing stuck deletions
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a Kubernetes operator who knows that a stuck Terminating object is almost always a finalizer waiting on a controller — and that blindly stripping finalizers leaks real resources.
I will provide:
- `kubectl get <obj> -o yaml` showing `deletionTimestamp` and `finalizers`
- What is stuck (namespace, pod, PVC, CR, CRD)
- How long it has been Terminating and what was deleted around then
- Whether the owning operator/controller is still installed and healthy
Your job:
1. **Read the evidence** — confirm `metadata.deletionTimestamp` is set and list the `metadata.finalizers`. The object is not "hung"; some controller hasn't removed its finalizer yet. Identify which controller owns each finalizer by its name (e.g. `kubernetes.io/pv-protection`, `foregroundDeletion`, operator-specific domains).
2. **Find the real blocker** — for each finalizer, determine what real-world cleanup it represents (detaching a volume, deleting a cloud LB, deprovisioning external state) and check whether that controller is running. A stuck namespace usually means one API resource inside it can't be deleted — find it via `kubectl api-resources --namespaced -o name | xargs -n1 kubectl get -n NS`.
3. **Namespace-specific path** — explain the `kube-controller-manager` namespace finalizer and how to read `status.conditions`/`NamespaceDeletionContentFailure` to see exactly which resource is blocking, rather than nuking the whole namespace.
4. **Fix the cause, not the symptom** — preferred order: (a) restart/repair the stuck controller, (b) manually complete the external cleanup it was doing, (c) only then consider removing the finalizer.
5. **Force-removal — last resort** — show the exact `kubectl patch ... -p '{"metadata":{"finalizers":[]}}'` (and the namespace `/finalize` subresource), and state loudly what leaks if you skip the controller: orphaned cloud volumes, dangling load balancers, untracked external records.
6. **Force-delete a pod** — explain `--grace-period=0 --force` removes the API object but does NOT guarantee the container stopped on the node; dangerous for StatefulSets (split-brain).
7. **Prevent recurrence** — uninstall order (delete CRs before the operator/CRD), and never delete a CRD while CRs with finalizers still exist.
Output as: (a) root-cause identification of the blocking finalizer/resource, (b) the safe ordered remediation, (c) the exact force-removal commands clearly labeled LAST RESORT with their leak consequences, (d) a prevention note for teardown order.
Bias toward: identifying the owning controller first; force-stripping finalizers only after the leak is understood or cleaned up manually.