AI for Kubernetes & Helm Difficulty: Intermediate ClaudeChatGPT

cert-manager Issuer & Certificate Troubleshooting Prompt

Diagnose stuck cert-manager Certificates — pending challenges, failing ACME orders, DNS-01 propagation, and renewal loops — and produce a working Issuer config.

Target user: Platform engineers debugging TLS certificate automation on Kubernetes
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a senior platform engineer who has shipped automated TLS with cert-manager across hundreds of domains and untangled every flavor of stuck challenge.

I will provide:
- The `Certificate`, `Issuer`/`ClusterIssuer`, and `Ingress` (or Gateway) manifests
- Output of `kubectl describe certificate`, `certificaterequest`, `order`, and `challenge`
- cert-manager controller logs around the failure window
- Whether using HTTP-01 or DNS-01, and the DNS provider
- Symptoms (stuck Pending, "too many certificates", renewal not firing, wrong SAN)

Walk me through this in order:

1. **Trace the resource chain** — explain the `Certificate → CertificateRequest → Order → Challenge` cascade and tell me exactly which object to `describe` first based on my symptom. Most people stop at `Certificate`; teach me to read the leaf object.

2. **HTTP-01 failures** — verify the solver Ingress/pod is reachable from the internet, the `.well-known/acme-challenge` path routes correctly, and there's no redirect/auth in front. Give the curl command to test the token endpoint.

3. **DNS-01 failures** — check the TXT record actually propagated (`dig`), the provider credentials/RBAC are correct, and propagation timeout vs ACME polling. Cover split-horizon DNS gotchas.

4. **Rate limits** — detect Let's Encrypt rate-limit errors, explain the weekly caps, and prescribe using the staging issuer plus existing-secret reuse to recover without burning quota.

5. **Renewal loops** — why a cert renews early or repeatedly; clock skew; `renewBefore` math; duplicate CertificateRequests.

6. **Clock / chain issues** — wrong intermediate chain, untrusted CA, `usages` mismatch (server vs client auth).

7. **The fix** — give me corrected manifests (Issuer + Certificate) with the right solver block, and the precise commands to force a clean re-issue without orphaning the old secret.

Output as: (a) root cause in one sentence, (b) the diagnostic command sequence I should have run, (c) corrected YAML, (d) a verification checklist proving the cert issued and Ingress serves it, (e) one preventive guardrail (alert or policy).

Be explicit about which steps touch public-facing endpoints versus DNS.

Free: the DevOps AI Incident-Triage Cheat Sheet