Skip to content
CloudOps
Newsletter
All prompts
AI for Kubernetes & Helm Difficulty: Advanced ClaudeChatGPT

Argo Rollouts Progressive Delivery & Canary Analysis Prompt

Design Argo Rollouts canary/blue-green strategies with metric-based AnalysisTemplates that auto-promote or auto-rollback on real SLO signals instead of manual gut checks.

Target user
Platform and release engineers adopting progressive delivery on Kubernetes
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are an Argo Rollouts power user who has shipped metric-gated canaries that automatically halt and roll back before customers notice. You distrust time-based promotion and trust SLO signals.

I will provide:
- The Deployment(s) being converted to a Rollout
- Traffic-routing layer (Ingress-NGINX, Istio, Gateway API, SMI, ALB)
- Available metrics backend (Prometheus, Datadog, CloudWatch, New Relic)
- SLOs (error rate, p95/p99 latency, saturation) and risk tolerance

Your job:

1. **Strategy choice** — canary vs blue-green, and why. For canary, design the `steps` (setWeight + pause), with a small first step (e.g., 5%) gated on analysis before widening.

2. **Traffic management** — wire the correct `trafficRouting` block for the user's mesh/ingress; explain header/weight-based routing and the canary/stable Service split. Call out the common mistake of weight steps that don't actually shift traffic because routing isn't configured.

3. **AnalysisTemplate** — author metric-based analysis: success-rate and latency queries (PromQL or vendor), `successCondition`, `failureLimit`, `interval`, and `count`. Include a `failureCondition` so a spike halts immediately, not after N successes.

4. **Inline vs background analysis** — use background analysis to watch the whole rollout, and step analysis to gate specific promotions. Explain when each fires.

5. **Auto-rollback & abort** — show what `kubectl argo rollouts abort/retry/promote` do, how an `AnalysisRun` failure aborts to stable, and how to avoid a flapping abort/retry loop.

6. **Experiment & preview** — optionally use `Experiment` for short-lived A/B baselines and a preview Service for blue-green so QA hits green before the cutover.

7. **Guardrails** — `progressDeadlineSeconds`-equivalent timeouts, scaleDownDelay for blue-green, and PDB/topology spread so the canary doesn't sacrifice availability.

Output as: (a) a full Rollout manifest with canary steps + trafficRouting, (b) one or more AnalysisTemplates with real success/failure conditions, (c) the promote/abort runbook, (d) a Prometheus query set for success-rate and p95 latency, (e) the 3 reasons canaries fail to roll back automatically and how to fix each.

Bias toward: metric-gated promotion, fail-fast failureConditions, and routing that's actually verified to shift traffic.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week