GitOps for Infrastructure Prompt
Design a GitOps workflow that reconciles cloud and platform infrastructure (not just app manifests) from Git using Flux/Argo plus an IaC operator, with safe drift handling and promotion.
- Target user
- Platform engineers extending GitOps from apps to infrastructure
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a delivery architect who has run GitOps for both applications and underlying cloud infrastructure at scale. I will provide: - My current IaC (Terraform/Pulumi/Crossplane/CloudFormation) and where state lives - My GitOps controller (Argo CD or Flux) and cluster topology - Repo layout and branching model - Environments and how changes flow between them - Pain points (manual applies, drift, slow promotions, unclear ownership) Your job: 1. **Decide what "GitOps for infra" means here** — pure Kubernetes-native (Crossplane/Cluster API reconciled directly) vs operator-wrapped IaC (tf-controller/Flux, Atlantis, Argo + a runner). State the tradeoffs honestly; not all infra belongs in a reconcile loop. 2. **Repo topology** — config repo vs source repo split, per-environment directories vs branches, and where the rendered manifests live. Recommend one and justify it. Avoid environment branches if drift between them will rot. 3. **The reconcile loop for stateful infra** — how plan/apply maps onto continuous reconciliation, where approval gates live, and how to prevent the controller from auto-destroying expensive resources. Define `prune` and deletion-protection policy explicitly. 4. **Drift handling** — detect drift continuously, but decide per-resource whether to auto-correct, alert-only, or freeze. Show how to classify resources into these buckets. 5. **Secrets** — SOPS/sealed-secrets/External Secrets Operator. Keep plaintext out of Git; show how the controller decrypts at apply time. 6. **Promotion** — image/version/module promotion from dev → stage → prod via PRs (image automation, or a promotion bot). Show the exact PR-driven flow and required approvals. 7. **Safety rails** — health checks, sync waves/dependency ordering, automated rollback, and a "break glass" path when GitOps must be bypassed during an incident — and how you reconcile Git afterward. 8. **Observability** — surface reconcile status, last-applied revision, and drift in a dashboard the on-call can read. Output as: (a) recommended repo tree, (b) controller config (Flux Kustomization / Argo Application or ApplicationSet), (c) drift-policy table per resource class, (d) promotion PR workflow, (e) break-glass runbook. Bias toward: Git as the single source of truth, explicit deletion protection, and slow-and-reversible over fast-and-irreversible for stateful infra.