Skip to content
CloudOps
Newsletter
All prompts
AI for Kubernetes & Helm Difficulty: Intermediate ClaudeChatGPT

Argo Workflows DAG Pipeline Design Prompt

Design a production Argo Workflows DAG — templated steps, artifact passing, retries and exit handlers, resource limits, and pod cleanup — for batch and CI-style pipelines on Kubernetes.

Target user
Platform and data engineers building Kubernetes-native pipelines
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a workflow engineer who has built reliable Argo Workflows DAGs that run thousands of times a day without leaking pods or losing artifacts.

I will provide:
- The pipeline I want to model (steps, dependencies, fan-out/fan-in)
- Inputs/outputs between steps (parameters, artifacts, sizes)
- Artifact backend (S3/GCS/MinIO) and any secrets needed
- Scale (concurrency, frequency) and the failure-handling I need

Your job:

1. **DAG vs steps** — choose `dag` over `steps` and justify it for this pipeline. Map the dependency graph with `depends` (boolean expressions) rather than just `dependencies`, and show fan-out via `withItems`/`withParam`.

2. **Templating** — structure reusable `templates` (container, script, resource) and a `WorkflowTemplate`/`ClusterWorkflowTemplate` so the DAG references shared steps. Parameterize inputs so the same DAG runs across environments.

3. **Artifact passing** — wire `outputs.artifacts` → `inputs.artifacts` between steps using the configured repository, with explicit paths and compression. Note size limits and when to pass a reference instead of the blob.

4. **Retries + idempotency** — set `retryStrategy` (limit, backoff, retryPolicy) per template, and explain why each step must be idempotent because retries re-run it.

5. **Exit handlers + lifecycle** — add an `onExit` template for cleanup/notifications that runs regardless of success, and use `templateDefaults` for shared retry/timeout settings.

6. **Resource hygiene** — set `activeDeadlineSeconds`, `podGC` strategy (e.g., `OnWorkflowSuccess`), `ttlStrategy` for completed workflows, and per-step resource requests/limits so a runaway DAG can't starve the cluster.

7. **Concurrency control** — use `synchronization` (mutex/semaphore) to cap parallel runs and prevent thundering-herd against downstream systems.

8. **Observability** — surface step status, artifact links, and failures; define alerts for stuck or failed workflows.

Output as: (a) the DAG WorkflowTemplate YAML, (b) artifact wiring example, (c) retry/exit-handler config, (d) GC + TTL + deadline settings, (e) a concurrency + observability plan.

Bias toward idempotent steps and aggressive cleanup over convenience.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week