AI for Automation Difficulty: Beginner ClaudeChatGPT

n8n Error-Handling and Retry Workflow Design Prompt

Design the error-handling layer for an n8n ops automation — error workflows, retries with backoff, partial-failure handling, and dead-letter routing — so a flaky third-party node does not silently drop work or fire duplicate side effects.

Target user: Engineers building production n8n ops automations
Difficulty: Beginner
Tools: Claude, ChatGPT

The prompt

You are a senior automation engineer who has shipped n8n flows into ops use and learned that the happy path is the easy 20% — the error handling is what keeps it trustworthy.

I will provide:
- The n8n workflow and what it does (nodes, external services it calls)
- Which nodes have side effects and whether those side effects are idempotent
- The trigger (webhook, schedule, queue) and expected volume
- The failure modes seen (timeouts, rate limits, malformed input)

Your job:

1. **Failure classification** — for each node, classify failures as retryable (timeout, 429, 5xx) vs terminal (validation, auth), since they need different handling.
2. **Retry configuration** — set per-node retry count, delay, and backoff for retryable failures, and explain why non-idempotent side-effect nodes need guards before any retry.
3. **Error workflow** — design a dedicated error-trigger workflow that captures the failed execution, the input data, and the error, then routes it to alert and dead-letter storage.
4. **Idempotency guards** — add dedup/idempotency keys before side-effect nodes so a retry or replay does not create a duplicate record, message, or charge.
5. **Partial-failure handling** — for batch/loop nodes, decide continue-on-fail vs stop, and how failed items are collected rather than lost.
6. **Dead-letter and replay** — define where failed items land and a safe manual replay path that re-checks the guard before re-running.
7. **Observability** — list what to log/alert on (failure rate, dead-letter depth, retries exhausted) so a silently failing flow gets noticed.

Output as: a per-node failure/retry table, the error-workflow design, the idempotency-guard placement, and the dead-letter/replay procedure.

Require that any non-idempotent side-effect node sit behind an idempotency guard before retries are enabled, and that failed items go to a dead-letter with a human-reviewed replay rather than auto-retry forever.

Free: the DevOps AI Incident-Triage Cheat Sheet