Automation Error Guide: 'Workflow Task Timed Out' Temporal

Overview

A Temporal workflow task timeout means a worker did not pick up and complete a workflow task within the configured WorkflowTaskTimeout (default 10s). The Temporal service hands a task off to a worker to advance the workflow’s state machine; if no worker reports back in time, the service times out the attempt, increments the attempt counter, and re-schedules it. Repeated timeouts stall the workflow even though it never “fails” outright.

You will see this in the workflow’s event history:

WorkflowTaskTimedOut  timeoutType=StartToClose  attempt=4  scheduledEventId=12 startedEventId=0

And in the worker log when it can’t keep up:

WARN  Workflow task processing took longer than the timeout taskQueue=order-tq WorkflowType=OrderWorkflow
ERROR Failed to poll workflow task  service=temporal taskQueue=order-tq error="context deadline exceeded"

It occurs whenever a workflow task is dispatched — on start, after an activity completes, after a timer fires, or on a signal. A workflow that ran fine can start timing out the instant its worker fleet becomes overloaded, gets evicted from sticky cache, or stops polling the right task queue.

Symptoms

Workflow history shows repeating WorkflowTaskTimedOut events with rising attempt.
Workflows sit in Running but make no progress; activities never start.
Worker logs context deadline exceeded polling the task queue, or “task processing took longer than timeout”.
tctl/temporal shows a growing backlog on the task queue.

temporal workflow show --workflow-id order-9921 \
  --output json | jq '.events[] | select(.eventType=="WorkflowTaskTimedOut")'

{ "eventType": "WorkflowTaskTimedOut", "workflowTaskTimedOutEventAttributes":
  { "timeoutType": "StartToClose", "scheduledEventId": "12" } }

temporal task-queue describe --task-queue order-tq --task-queue-type workflow

BuildID  Pollers  LastAccessTime
(none)   0        -

Common Root Causes

1. No worker polling the task queue

Zero pollers means no one will ever pick up the task; it times out every attempt until a worker appears.

temporal task-queue describe --task-queue order-tq --task-queue-type workflow

Pollers: 0

A poller count of 0 with a backlog is the clearest signal — the worker fleet is down, crashed, or never started for this queue.

2. Task queue name mismatch

The worker polls one queue; workflows are started on another (a typo or env-specific name). The service has tasks no worker is listening for.

grep -RniE "taskQueue|task_queue" ./worker | head
temporal workflow show --workflow-id order-9921 -o json | jq '.events[0].workflowExecutionStartedEventAttributes.taskQueue.name'

worker/main.ts:18: taskQueue: 'orders-tq'
"order-tq"

orders-tq vs order-tq — the workflow’s tasks are stranded.

3. Blocking / non-deterministic code in the workflow

Doing real I/O, sleeping with the language sleep, or running CPU-heavy work directly in workflow code blocks the worker’s task processing past the timeout.

# Workflow files should not import network/fs/time-sleep directly
grep -RniE "fetch\(|axios|fs\.|new Date\(\)|setTimeout|requests\.|time.sleep" ./workflows | head

workflows/order.ts:33: const rate = await fetch('https://fx.example.com/rate')  // blocks the task

Network calls belong in activities; in workflow code they block the deterministic task and blow the timeout.

4. Sticky cache eviction forces full history replay

When a worker loses its sticky cache (restart, eviction, cache too small), the next task replays the entire history. A large history can exceed the task timeout during replay.

# Cache size and eviction signals in the worker log
grep -RniE "WorkerCacheSize|maxCachedWorkflows|sticky|evict" ./worker | head
temporal workflow show --workflow-id order-9921 -o json | jq '.events | length'

worker/main.ts:7: maxCachedWorkflowExecutions: 50
1843

A tiny cache plus an 1800-event history means frequent full replays that can’t finish in 10s.

5. Worker fleet overloaded / under-provisioned

Too few concurrent task slots for the load: tasks queue up behind slow ones and time out waiting to be processed.

grep -RniE "maxConcurrentWorkflowTask|maxConcurrentActivity" ./worker | head

worker/main.ts:9: maxConcurrentWorkflowTaskExecutions: 2

With only 2 concurrent slots under bursty load, tasks wait long enough to time out. Scale slots or workers.

6. WorkflowTaskTimeout set too low

A short WorkflowTaskTimeout leaves no headroom for replay or a busy worker, so normal variance trips the timeout.

grep -RniE "workflowTaskTimeout|WorkflowTaskTimeout" ./worker ./workflows | head

worker/start.ts:21: workflowTaskTimeout: '2s'

2s is aggressive for any workflow with non-trivial history; raise it toward the 10s default.

Diagnostic Workflow

Step 1: Confirm the timeout type from history

temporal workflow show --workflow-id <WID> -o json \
  | jq '.events[] | select(.eventType=="WorkflowTaskTimedOut")'

StartToClose means a worker started but didn’t finish; ScheduleToStart means no worker ever picked it up (point at queue/pollers).

Step 2: Check pollers on the task queue

temporal task-queue describe --task-queue <TQ> --task-queue-type workflow

Pollers: 0 → worker fleet/queue-name problem. Pollers present but timing out → replay/blocking/overload.

Step 3: Verify the queue names match

grep -RniE "taskQueue|task_queue" ./worker
temporal workflow show --workflow-id <WID> -o json \
  | jq -r '.events[0].workflowExecutionStartedEventAttributes.taskQueue.name'

The worker’s queue must exactly equal the workflow’s start queue.

Step 4: Inspect history size and worker cache

temporal workflow show --workflow-id <WID> -o json | jq '.events | length'
grep -RniE "maxCachedWorkflow|WorkerCacheSize" ./worker

A large history with a small cache points to replay-time timeouts; raise cache or use Continue-As-New.

Step 5: Audit workflow code for blocking calls and the timeout setting

grep -RniE "fetch\(|axios|requests\.|time.sleep|fs\.|new Date\(\)" ./workflows
grep -RniE "workflowTaskTimeout" ./worker ./workflows

Move I/O into activities and raise an overly tight workflowTaskTimeout.

Example Root Cause Analysis

OrderWorkflow executions all stall in Running. History shows WorkflowTaskTimedOut with timeoutType=ScheduleToStart repeating every few seconds.

ScheduleToStart means no worker ever started the task, so this is a queue/poller problem, not slow code. Checking pollers:

temporal task-queue describe --task-queue order-tq --task-queue-type workflow

Pollers: 0

No pollers. The workers are running, though — so they must be polling a different queue. Comparing the worker config to the workflow’s start queue:

grep -Rni taskQueue ./worker
temporal workflow show --workflow-id order-9921 -o json \
  | jq -r '.events[0].workflowExecutionStartedEventAttributes.taskQueue.name'

worker/main.ts:18: taskQueue: 'orders-tq'
order-tq

The workflows are started on order-tq, but a recent rename moved the worker to orders-tq. The tasks have no listener and time out forever.

Fix: align the worker to the queue the workflows actually use and restart:

# set taskQueue: 'order-tq' in worker/main.ts
sudo systemctl restart temporal-order-worker
temporal task-queue describe --task-queue order-tq --task-queue-type workflow

Pollers: 4

Pollers appear, the pending tasks are picked up, and the stalled workflows resume.

Prevention Best Practices

Alert on WorkflowTaskTimedOut events and on Pollers: 0 for any active task queue — a stalled fleet is invisible otherwise.
Treat the task-queue name as a contract: define it once in shared config so the worker and the workflow-starter can never drift apart.
Keep workflow code deterministic and non-blocking; all I/O, sleeps over a second, and CPU-heavy work go in activities.
Size the sticky workflow cache for your history depth, and use Continue-As-New to cap history so replay stays well under the task timeout.
Provision enough concurrent task slots/workers for peak load, and leave WorkflowTaskTimeout at or near the 10s default rather than tightening it.
For ad-hoc triage, the free incident assistant can classify a timeout’s timeoutType into the likely queue-vs-replay cause. More in the automation guides.

Quick Command Reference

# Find the timeout events and their type
temporal workflow show --workflow-id <WID> -o json \
  | jq '.events[] | select(.eventType=="WorkflowTaskTimedOut")'

# Are workers polling this queue?
temporal task-queue describe --task-queue <TQ> --task-queue-type workflow

# Compare worker queue vs workflow start queue
grep -RniE "taskQueue|task_queue" ./worker
temporal workflow show --workflow-id <WID> -o json \
  | jq -r '.events[0].workflowExecutionStartedEventAttributes.taskQueue.name'

# History size (replay risk) and cache config
temporal workflow show --workflow-id <WID> -o json | jq '.events | length'
grep -RniE "maxCachedWorkflow|maxConcurrentWorkflowTask|workflowTaskTimeout" ./worker ./workflows

# Blocking calls that shouldn't be in workflow code
grep -RniE "fetch\(|axios|requests\.|time.sleep|fs\.|new Date\(\)" ./workflows

Conclusion

A Temporal workflow task timeout means no worker completed a workflow task within WorkflowTaskTimeout. The usual root causes:

No worker is polling the task queue (Pollers: 0).
The worker’s task-queue name doesn’t match where workflows are started.
Blocking or non-deterministic code in the workflow stalls task processing.
Sticky-cache eviction forces a full history replay that overruns the timeout.
The worker fleet is under-provisioned for the load.
WorkflowTaskTimeout is set too low for the workflow’s history.

Read the timeoutType from history first — ScheduleToStart points at queues/pollers, StartToClose at replay/blocking/overload — and the fix follows from there.

Automation Error Guide: 'Workflow Task Timed Out' Temporal Deadline Exceeded