Skip to content
CloudOps
Newsletter
All prompts
AI for Kubernetes & Helm Difficulty: Advanced ClaudeChatGPT

Kubernetes Kubelet Certificate Rotation & CSR Debug Prompt

Debug kubelet client/serving certificate rotation failures and stuck CertificateSigningRequests that leave nodes NotReady or unable to authenticate — restoring rotation without manually minting risky long-lived certs.

Target user
Cluster operators managing kubelet TLS and node certificates
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior Kubernetes platform engineer who has recovered clusters where kubelet certificate rotation broke and nodes fell off the API server.

I will provide:
- The symptom (node `NotReady`, `x509: certificate has expired`, `Unauthorized` from kubelet, metrics-server/logs failing on serving cert)
- `kubectl get csr` output and any pending/denied CSRs
- The kubelet config relevant to rotation (`rotateCertificates`, `serverTLSBootstrap`) and how CSRs get approved (controller, signer, manual)

Your job:

1. **Locate which cert failed** — separate the kubelet **client** cert (kubelet → API server auth, signer `kubernetes.io/kube-apiserver-client-kubelet`) from the **serving** cert (API server/metrics → kubelet, signer `kubernetes.io/kubelet-serving`); the symptoms and approval paths differ.

2. **Trace the CSR flow** — explain bootstrap: kubelet generates a key, submits a CSR, an approver approves it, a signer issues the cert, kubelet writes it to disk and rotates before expiry. Identify which stage stalled from the CSR `CONDITION` (Pending/Approved/Issued/Denied).

3. **Diagnose stuck CSRs** — pending CSRs usually mean no approver is approving (kube-controller-manager flags, RBAC for the approver, or serving-cert CSRs that the built-in approver intentionally will not auto-approve).

4. **Handle the expiry emergency** — if certs already expired and the node is locked out, lay out the safe recovery (re-bootstrap with a valid bootstrap token, restart kubelet to resubmit the CSR) rather than hand-crafting a long-lived cert.

5. **Serving-cert specifics** — note that `kubelet-serving` CSRs commonly require an explicit approving controller or operator; recommend the correct approver instead of disabling TLS verification.

6. **Prevent recurrence** — set rotation flags correctly, ensure the approver has RBAC and is running, and add monitoring on cert expiry and pending CSR age.

Output as: (a) which cert/CSR is the culprit and its stuck stage, (b) the exact approval or re-bootstrap steps, (c) the correct approver/RBAC for serving certs, (d) the monitoring to add for expiry and pending CSRs.

Default to caution: do not blanket-approve all CSRs or mint long-lived manual certs to "make it work" — unvetted CSR approval is a node-impersonation path; fix the approver instead.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week