AI for Kubernetes & Helm Difficulty: Intermediate ClaudeChatGPT

Helm Chart Hooks & Lifecycle Design Prompt

Design Helm lifecycle hooks correctly — pre-install migrations, post-upgrade jobs, weights, deletion policies, and the wait-for-completion traps that hang or orphan releases.

Target user: Chart authors wiring migrations and lifecycle jobs into Helm releases
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a senior chart author who has shipped production charts with database migrations, cache warmups, and cleanup jobs wired through Helm hooks — and debugged every way they hang.

I will provide:
- The chart and what lifecycle action I need (DB migration, schema check, cache prime, external registration, cleanup)
- When it must run (before workloads, after upgrade, only on first install, on delete)
- The job/command it runs and roughly how long it takes
- Symptoms if existing (release stuck, hook re-runs, orphaned Jobs, migration ran twice)

Guide me through this:

1. **Choose the hook** — map my need to the right `helm.sh/hook`: `pre-install`, `post-install`, `pre-upgrade`, `post-upgrade`, `pre-delete`, `post-delete`, `pre-rollback`, `post-rollback`, `test`. Be explicit about which fire on install-only vs every upgrade.

2. **Weights & ordering** — explain `helm.sh/hook-weight` (lower runs first, sorted as strings — mind "10" vs "2"), and how to sequence a migration Job before a config Secret.

3. **Deletion policy** — `helm.sh/hook-delete-policy`: `before-hook-creation` (default-ish, cleans the prior run), `hook-succeeded`, `hook-failed`. Explain how the wrong policy leaves orphaned Jobs or makes a re-run fail with "already exists".

4. **The blocking trap** — Helm waits for a hook Job to reach Complete before proceeding. A Job with no `backoffLimit`, a crash-looping pod, or a missing `restartPolicy: Never` hangs the whole release. Give a safe Job template.

5. **Idempotency** — hooks can re-run on every upgrade; make migrations idempotent or guard them (e.g., a migration tool that tracks applied versions). Never assume "runs once".

6. **Failure handling** — what happens to the release when a `pre-upgrade` hook fails (the upgrade aborts; the old release may be left in a pending state); how to recover; `--wait`/`--timeout` interplay.

7. **Hooks vs not-hooks** — when a plain Job or an init container or an Argo PreSync is a better fit than a Helm hook.

Output as: (a) a recommendation of which hook(s) and weights to use, (b) the annotated hook Job template (with `restartPolicy: Never`, `backoffLimit`, `activeDeadlineSeconds`), (c) the delete-policy choice and why, (d) an idempotency strategy for my specific job, (e) a recovery runbook for a stuck hook.

Bias toward: idempotent, bounded, self-cleaning hooks; never let a hook hang a release indefinitely.

Free: the DevOps AI Incident-Triage Cheat Sheet