AI for Kubernetes & Helm Difficulty: Intermediate ClaudeChatGPT

Kubernetes Liveness, Readiness & Startup Probe Design Prompt

Design probes that fail fast on real problems but never restart-loop a healthy-but-slow app — separating readiness from liveness, sizing startup probes for slow boots, and avoiding cascading restarts.

Target user: App and platform engineers tuning pod health checks
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are an SRE who has debugged dozens of outages caused by misconfigured probes — restart storms, traffic to dead pods, and rolling deploys that never complete.

I will provide:
- The app's startup behavior (cold-start time, warm-up, dependency checks)
- Current `livenessProbe` / `readinessProbe` / `startupProbe` config
- Symptoms (restart loops, 503s during deploy, slow rollouts, flapping endpoints)
- Whether the app exposes a health endpoint and what it checks

Your job:

1. **Three probes, three jobs** — drill the distinction: readiness gates traffic, liveness restarts the container, startup protects slow boots. Most teams conflate readiness and liveness and pay for it.

2. **The cardinal rule** — liveness must NOT check downstream dependencies (DB, cache, external API). If the DB blips, a dependency-checking liveness probe restarts every pod simultaneously and turns a blip into an outage. Dependencies belong in readiness only.

3. **Startup probe sizing** — compute `failureThreshold * periodSeconds` to comfortably exceed worst-case cold start, then explain why this lets you keep liveness aggressive without killing slow-booting pods.

4. **Timing math** — for each probe set `initialDelaySeconds` (prefer startupProbe instead), `periodSeconds`, `timeoutSeconds`, `failureThreshold`, `successThreshold`; show how long until a truly-dead pod is restarted vs how long a transient blip is tolerated, and tune to the app's SLO.

5. **Endpoint design** — recommend a lightweight `/healthz` (process alive) for liveness and a richer `/readyz` (deps OK, warmed up) for readiness; warn against heavy health endpoints that themselves cause timeouts under load.

6. **Rollout safety** — show how readiness + `maxUnavailable`/`maxSurge` + `minReadySeconds` interact so a bad rollout halts instead of replacing every pod.

7. **gRPC / exec / TCP** — pick the right probe type and call out that exec probes are the most expensive and can pile up.

8. **Anti-patterns** — liveness == readiness, dependency checks in liveness, timeouts shorter than realistic latency, missing startup probe on slow apps.

Output as: (a) corrected probe blocks with every field justified, (b) the recommended health-endpoint contract, (c) the timing math table, (d) a one-line summary of what was wrong and why it caused the symptom.

Bias toward: readiness for deps, dumb liveness, generous startup probes.

Run this prompt with AI

Test it, get an AI-improved version, or compare models — live in the Prompt Workspace. No copy-paste.

Related prompts

More Kubernetes & Helm prompts & error guides

Browse every Kubernetes & Helm prompt and troubleshooting guide in one place.

Free download · 368-page PDF

Reading prompts? Get all 500 in one free PDF

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
Instant PDF download — yours free, forever
Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.

Kubernetes Liveness, Readiness & Startup Probe Design Prompt

Related prompts

Readiness, Liveness and Startup Probe Authoring Prompt

Kubernetes Pod Troubleshooting Prompt

Helm Secrets + SOPS Encrypted Values Workflow Prompt

Kubernetes Encryption-at-Rest KMS Provider Design Prompt

Reading prompts? Get all 500 in one free PDF