Automation Error Guide: 'Workflow Task Timed Out' Temporal Deadline Exceeded
Fix Temporal workflow task timed out / deadline exceeded errors: diagnose no available workers, sticky cache eviction, blocking code, large histories, and task queue mismatch.
- #automation
- #troubleshooting
- #errors
- #temporal
Overview
A Temporal workflow task timeout means a worker did not pick up and complete a workflow task within the configured WorkflowTaskTimeout (default 10s). The Temporal service hands a task off to a worker to advance the workflow’s state machine; if no worker reports back in time, the service times out the attempt, increments the attempt counter, and re-schedules it. Repeated timeouts stall the workflow even though it never “fails” outright.
You will see this in the workflow’s event history:
WorkflowTaskTimedOut timeoutType=StartToClose attempt=4 scheduledEventId=12 startedEventId=0
And in the worker log when it can’t keep up:
WARN Workflow task processing took longer than the timeout taskQueue=order-tq WorkflowType=OrderWorkflow
ERROR Failed to poll workflow task service=temporal taskQueue=order-tq error="context deadline exceeded"
It occurs whenever a workflow task is dispatched — on start, after an activity completes, after a timer fires, or on a signal. A workflow that ran fine can start timing out the instant its worker fleet becomes overloaded, gets evicted from sticky cache, or stops polling the right task queue.
Symptoms
- Workflow history shows repeating
WorkflowTaskTimedOutevents with risingattempt. - Workflows sit in
Runningbut make no progress; activities never start. - Worker logs
context deadline exceededpolling the task queue, or “task processing took longer than timeout”. tctl/temporalshows a growing backlog on the task queue.
temporal workflow show --workflow-id order-9921 \
--output json | jq '.events[] | select(.eventType=="WorkflowTaskTimedOut")'
{ "eventType": "WorkflowTaskTimedOut", "workflowTaskTimedOutEventAttributes":
{ "timeoutType": "StartToClose", "scheduledEventId": "12" } }
temporal task-queue describe --task-queue order-tq --task-queue-type workflow
BuildID Pollers LastAccessTime
(none) 0 -
Common Root Causes
1. No worker polling the task queue
Zero pollers means no one will ever pick up the task; it times out every attempt until a worker appears.
temporal task-queue describe --task-queue order-tq --task-queue-type workflow
Pollers: 0
A poller count of 0 with a backlog is the clearest signal — the worker fleet is down, crashed, or never started for this queue.
2. Task queue name mismatch
The worker polls one queue; workflows are started on another (a typo or env-specific name). The service has tasks no worker is listening for.
grep -RniE "taskQueue|task_queue" ./worker | head
temporal workflow show --workflow-id order-9921 -o json | jq '.events[0].workflowExecutionStartedEventAttributes.taskQueue.name'
worker/main.ts:18: taskQueue: 'orders-tq'
"order-tq"
orders-tq vs order-tq — the workflow’s tasks are stranded.
3. Blocking / non-deterministic code in the workflow
Doing real I/O, sleeping with the language sleep, or running CPU-heavy work directly in workflow code blocks the worker’s task processing past the timeout.
# Workflow files should not import network/fs/time-sleep directly
grep -RniE "fetch\(|axios|fs\.|new Date\(\)|setTimeout|requests\.|time.sleep" ./workflows | head
workflows/order.ts:33: const rate = await fetch('https://fx.example.com/rate') // blocks the task
Network calls belong in activities; in workflow code they block the deterministic task and blow the timeout.
4. Sticky cache eviction forces full history replay
When a worker loses its sticky cache (restart, eviction, cache too small), the next task replays the entire history. A large history can exceed the task timeout during replay.
# Cache size and eviction signals in the worker log
grep -RniE "WorkerCacheSize|maxCachedWorkflows|sticky|evict" ./worker | head
temporal workflow show --workflow-id order-9921 -o json | jq '.events | length'
worker/main.ts:7: maxCachedWorkflowExecutions: 50
1843
A tiny cache plus an 1800-event history means frequent full replays that can’t finish in 10s.
5. Worker fleet overloaded / under-provisioned
Too few concurrent task slots for the load: tasks queue up behind slow ones and time out waiting to be processed.
grep -RniE "maxConcurrentWorkflowTask|maxConcurrentActivity" ./worker | head
worker/main.ts:9: maxConcurrentWorkflowTaskExecutions: 2
With only 2 concurrent slots under bursty load, tasks wait long enough to time out. Scale slots or workers.
6. WorkflowTaskTimeout set too low
A short WorkflowTaskTimeout leaves no headroom for replay or a busy worker, so normal variance trips the timeout.
grep -RniE "workflowTaskTimeout|WorkflowTaskTimeout" ./worker ./workflows | head
worker/start.ts:21: workflowTaskTimeout: '2s'
2s is aggressive for any workflow with non-trivial history; raise it toward the 10s default.
Diagnostic Workflow
Step 1: Confirm the timeout type from history
temporal workflow show --workflow-id <WID> -o json \
| jq '.events[] | select(.eventType=="WorkflowTaskTimedOut")'
StartToClose means a worker started but didn’t finish; ScheduleToStart means no worker ever picked it up (point at queue/pollers).
Step 2: Check pollers on the task queue
temporal task-queue describe --task-queue <TQ> --task-queue-type workflow
Pollers: 0 → worker fleet/queue-name problem. Pollers present but timing out → replay/blocking/overload.
Step 3: Verify the queue names match
grep -RniE "taskQueue|task_queue" ./worker
temporal workflow show --workflow-id <WID> -o json \
| jq -r '.events[0].workflowExecutionStartedEventAttributes.taskQueue.name'
The worker’s queue must exactly equal the workflow’s start queue.
Step 4: Inspect history size and worker cache
temporal workflow show --workflow-id <WID> -o json | jq '.events | length'
grep -RniE "maxCachedWorkflow|WorkerCacheSize" ./worker
A large history with a small cache points to replay-time timeouts; raise cache or use Continue-As-New.
Step 5: Audit workflow code for blocking calls and the timeout setting
grep -RniE "fetch\(|axios|requests\.|time.sleep|fs\.|new Date\(\)" ./workflows
grep -RniE "workflowTaskTimeout" ./worker ./workflows
Move I/O into activities and raise an overly tight workflowTaskTimeout.
Example Root Cause Analysis
OrderWorkflow executions all stall in Running. History shows WorkflowTaskTimedOut with timeoutType=ScheduleToStart repeating every few seconds.
ScheduleToStart means no worker ever started the task, so this is a queue/poller problem, not slow code. Checking pollers:
temporal task-queue describe --task-queue order-tq --task-queue-type workflow
Pollers: 0
No pollers. The workers are running, though — so they must be polling a different queue. Comparing the worker config to the workflow’s start queue:
grep -Rni taskQueue ./worker
temporal workflow show --workflow-id order-9921 -o json \
| jq -r '.events[0].workflowExecutionStartedEventAttributes.taskQueue.name'
worker/main.ts:18: taskQueue: 'orders-tq'
order-tq
The workflows are started on order-tq, but a recent rename moved the worker to orders-tq. The tasks have no listener and time out forever.
Fix: align the worker to the queue the workflows actually use and restart:
# set taskQueue: 'order-tq' in worker/main.ts
sudo systemctl restart temporal-order-worker
temporal task-queue describe --task-queue order-tq --task-queue-type workflow
Pollers: 4
Pollers appear, the pending tasks are picked up, and the stalled workflows resume.
Prevention Best Practices
- Alert on
WorkflowTaskTimedOutevents and onPollers: 0for any active task queue — a stalled fleet is invisible otherwise. - Treat the task-queue name as a contract: define it once in shared config so the worker and the workflow-starter can never drift apart.
- Keep workflow code deterministic and non-blocking; all I/O, sleeps over a second, and CPU-heavy work go in activities.
- Size the sticky workflow cache for your history depth, and use Continue-As-New to cap history so replay stays well under the task timeout.
- Provision enough concurrent task slots/workers for peak load, and leave
WorkflowTaskTimeoutat or near the 10s default rather than tightening it. - For ad-hoc triage, the free incident assistant can classify a timeout’s
timeoutTypeinto the likely queue-vs-replay cause. More in the automation guides.
Quick Command Reference
# Find the timeout events and their type
temporal workflow show --workflow-id <WID> -o json \
| jq '.events[] | select(.eventType=="WorkflowTaskTimedOut")'
# Are workers polling this queue?
temporal task-queue describe --task-queue <TQ> --task-queue-type workflow
# Compare worker queue vs workflow start queue
grep -RniE "taskQueue|task_queue" ./worker
temporal workflow show --workflow-id <WID> -o json \
| jq -r '.events[0].workflowExecutionStartedEventAttributes.taskQueue.name'
# History size (replay risk) and cache config
temporal workflow show --workflow-id <WID> -o json | jq '.events | length'
grep -RniE "maxCachedWorkflow|maxConcurrentWorkflowTask|workflowTaskTimeout" ./worker ./workflows
# Blocking calls that shouldn't be in workflow code
grep -RniE "fetch\(|axios|requests\.|time.sleep|fs\.|new Date\(\)" ./workflows
Conclusion
A Temporal workflow task timeout means no worker completed a workflow task within WorkflowTaskTimeout. The usual root causes:
- No worker is polling the task queue (
Pollers: 0). - The worker’s task-queue name doesn’t match where workflows are started.
- Blocking or non-deterministic code in the workflow stalls task processing.
- Sticky-cache eviction forces a full history replay that overruns the timeout.
- The worker fleet is under-provisioned for the load.
WorkflowTaskTimeoutis set too low for the workflow’s history.
Read the timeoutType from history first — ScheduleToStart points at queues/pollers, StartToClose at replay/blocking/overload — and the fix follows from there.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.