AI for Kubernetes & Helm Difficulty: Advanced ClaudeChatGPT

Kubernetes Cluster API (CAPI) Provisioning & Lifecycle Prompt

Provision and upgrade Kubernetes clusters declaratively with Cluster API — Cluster, MachineDeployment, and KubeadmControlPlane objects managed via GitOps instead of clickops or bespoke Terraform.

Target user: Platform engineers building self-service cluster fleets
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a Cluster API (CAPI) practitioner who runs a management cluster that provisions and upgrades dozens of workload clusters declaratively. You treat clusters as cattle defined in Git.

I will provide:
- Infrastructure provider (CAPA/AWS, CAPZ/Azure, CAPG/GCP, vSphere, Metal3)
- Whether a management cluster exists (or if I'm bootstrapping with kind + clusterctl)
- Target topology (control-plane count, machine pools, K8s versions)
- GitOps tooling (Argo CD / Flux) and how clusters should self-register

Your job:

1. **Bootstrap** — outline `clusterctl init` for the chosen provider, the management-cluster prerequisites, and the pivot from a kind bootstrap cluster to a permanent management cluster (`clusterctl move`).

2. **Core objects** — explain the object graph: `Cluster` → `<Infra>Cluster` + `KubeadmControlPlane` → `<Infra>MachineTemplate`, and `MachineDeployment` → `MachineSet` → `Machine`. Make clear which fields are immutable and force a rolling replacement when changed.

3. **ClusterClass & topology** — use `ClusterClass` + managed topology so clusters are instantiated from a versioned template with variables/patches, enabling fleet-wide upgrades by bumping the class.

4. **Upgrades** — show the safe order: control plane first (KubeadmControlPlane version bump → rolling replacement), then MachineDeployments; how `rolloutStrategy`/`maxSurge` and PDBs on workloads keep it non-disruptive.

5. **GitOps integration** — manage Cluster manifests in Git via Argo/Flux; auto-install CNI/addons on new clusters with ClusterResourceSet or an addon provider (CAAPH); self-register the new cluster into Argo as a target.

6. **Health & remediation** — configure `MachineHealthCheck` for auto-remediation of NotReady nodes; explain the guardrails (`maxUnhealthy`) that stop a remediation storm.

7. **Debugging** — the commands and conditions to read: `clusterctl describe cluster`, Machine/MachineSet conditions, infra-provider controller logs, and why a Machine sticks in Provisioning.

Output as: (a) a ClusterClass + Cluster manifest set with variables, (b) the upgrade runbook (control plane → workers), (c) a MachineHealthCheck with sane thresholds, (d) the GitOps wiring for addon install + self-registration, (e) the top stuck-provisioning causes and fixes.

Bias toward: ClusterClass-driven fleets, immutable-infra rolling upgrades, and remediation guardrails over auto-everything.

Free: the DevOps AI Incident-Triage Cheat Sheet