Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Automation By James Joyner IV · · 10 min read

Automation Error Guide: 'Workflow Task Timed Out' Temporal Deadline Exceeded

Fix Temporal workflow task timed out / deadline exceeded errors: diagnose no available workers, sticky cache eviction, blocking code, large histories, and task queue mismatch.

  • #automation
  • #troubleshooting
  • #errors
  • #temporal

Overview

A Temporal workflow task timeout means a worker did not pick up and complete a workflow task within the configured WorkflowTaskTimeout (default 10s). The Temporal service hands a task off to a worker to advance the workflow’s state machine; if no worker reports back in time, the service times out the attempt, increments the attempt counter, and re-schedules it. Repeated timeouts stall the workflow even though it never “fails” outright.

You will see this in the workflow’s event history:

WorkflowTaskTimedOut  timeoutType=StartToClose  attempt=4  scheduledEventId=12 startedEventId=0

And in the worker log when it can’t keep up:

WARN  Workflow task processing took longer than the timeout taskQueue=order-tq WorkflowType=OrderWorkflow
ERROR Failed to poll workflow task  service=temporal taskQueue=order-tq error="context deadline exceeded"

It occurs whenever a workflow task is dispatched — on start, after an activity completes, after a timer fires, or on a signal. A workflow that ran fine can start timing out the instant its worker fleet becomes overloaded, gets evicted from sticky cache, or stops polling the right task queue.

Symptoms

  • Workflow history shows repeating WorkflowTaskTimedOut events with rising attempt.
  • Workflows sit in Running but make no progress; activities never start.
  • Worker logs context deadline exceeded polling the task queue, or “task processing took longer than timeout”.
  • tctl/temporal shows a growing backlog on the task queue.
temporal workflow show --workflow-id order-9921 \
  --output json | jq '.events[] | select(.eventType=="WorkflowTaskTimedOut")'
{ "eventType": "WorkflowTaskTimedOut", "workflowTaskTimedOutEventAttributes":
  { "timeoutType": "StartToClose", "scheduledEventId": "12" } }
temporal task-queue describe --task-queue order-tq --task-queue-type workflow
BuildID  Pollers  LastAccessTime
(none)   0        -

Common Root Causes

1. No worker polling the task queue

Zero pollers means no one will ever pick up the task; it times out every attempt until a worker appears.

temporal task-queue describe --task-queue order-tq --task-queue-type workflow
Pollers: 0

A poller count of 0 with a backlog is the clearest signal — the worker fleet is down, crashed, or never started for this queue.

2. Task queue name mismatch

The worker polls one queue; workflows are started on another (a typo or env-specific name). The service has tasks no worker is listening for.

grep -RniE "taskQueue|task_queue" ./worker | head
temporal workflow show --workflow-id order-9921 -o json | jq '.events[0].workflowExecutionStartedEventAttributes.taskQueue.name'
worker/main.ts:18: taskQueue: 'orders-tq'
"order-tq"

orders-tq vs order-tq — the workflow’s tasks are stranded.

3. Blocking / non-deterministic code in the workflow

Doing real I/O, sleeping with the language sleep, or running CPU-heavy work directly in workflow code blocks the worker’s task processing past the timeout.

# Workflow files should not import network/fs/time-sleep directly
grep -RniE "fetch\(|axios|fs\.|new Date\(\)|setTimeout|requests\.|time.sleep" ./workflows | head
workflows/order.ts:33: const rate = await fetch('https://fx.example.com/rate')  // blocks the task

Network calls belong in activities; in workflow code they block the deterministic task and blow the timeout.

4. Sticky cache eviction forces full history replay

When a worker loses its sticky cache (restart, eviction, cache too small), the next task replays the entire history. A large history can exceed the task timeout during replay.

# Cache size and eviction signals in the worker log
grep -RniE "WorkerCacheSize|maxCachedWorkflows|sticky|evict" ./worker | head
temporal workflow show --workflow-id order-9921 -o json | jq '.events | length'
worker/main.ts:7: maxCachedWorkflowExecutions: 50
1843

A tiny cache plus an 1800-event history means frequent full replays that can’t finish in 10s.

5. Worker fleet overloaded / under-provisioned

Too few concurrent task slots for the load: tasks queue up behind slow ones and time out waiting to be processed.

grep -RniE "maxConcurrentWorkflowTask|maxConcurrentActivity" ./worker | head
worker/main.ts:9: maxConcurrentWorkflowTaskExecutions: 2

With only 2 concurrent slots under bursty load, tasks wait long enough to time out. Scale slots or workers.

6. WorkflowTaskTimeout set too low

A short WorkflowTaskTimeout leaves no headroom for replay or a busy worker, so normal variance trips the timeout.

grep -RniE "workflowTaskTimeout|WorkflowTaskTimeout" ./worker ./workflows | head
worker/start.ts:21: workflowTaskTimeout: '2s'

2s is aggressive for any workflow with non-trivial history; raise it toward the 10s default.

Diagnostic Workflow

Step 1: Confirm the timeout type from history

temporal workflow show --workflow-id <WID> -o json \
  | jq '.events[] | select(.eventType=="WorkflowTaskTimedOut")'

StartToClose means a worker started but didn’t finish; ScheduleToStart means no worker ever picked it up (point at queue/pollers).

Step 2: Check pollers on the task queue

temporal task-queue describe --task-queue <TQ> --task-queue-type workflow

Pollers: 0 → worker fleet/queue-name problem. Pollers present but timing out → replay/blocking/overload.

Step 3: Verify the queue names match

grep -RniE "taskQueue|task_queue" ./worker
temporal workflow show --workflow-id <WID> -o json \
  | jq -r '.events[0].workflowExecutionStartedEventAttributes.taskQueue.name'

The worker’s queue must exactly equal the workflow’s start queue.

Step 4: Inspect history size and worker cache

temporal workflow show --workflow-id <WID> -o json | jq '.events | length'
grep -RniE "maxCachedWorkflow|WorkerCacheSize" ./worker

A large history with a small cache points to replay-time timeouts; raise cache or use Continue-As-New.

Step 5: Audit workflow code for blocking calls and the timeout setting

grep -RniE "fetch\(|axios|requests\.|time.sleep|fs\.|new Date\(\)" ./workflows
grep -RniE "workflowTaskTimeout" ./worker ./workflows

Move I/O into activities and raise an overly tight workflowTaskTimeout.

Example Root Cause Analysis

OrderWorkflow executions all stall in Running. History shows WorkflowTaskTimedOut with timeoutType=ScheduleToStart repeating every few seconds.

ScheduleToStart means no worker ever started the task, so this is a queue/poller problem, not slow code. Checking pollers:

temporal task-queue describe --task-queue order-tq --task-queue-type workflow
Pollers: 0

No pollers. The workers are running, though — so they must be polling a different queue. Comparing the worker config to the workflow’s start queue:

grep -Rni taskQueue ./worker
temporal workflow show --workflow-id order-9921 -o json \
  | jq -r '.events[0].workflowExecutionStartedEventAttributes.taskQueue.name'
worker/main.ts:18: taskQueue: 'orders-tq'
order-tq

The workflows are started on order-tq, but a recent rename moved the worker to orders-tq. The tasks have no listener and time out forever.

Fix: align the worker to the queue the workflows actually use and restart:

# set taskQueue: 'order-tq' in worker/main.ts
sudo systemctl restart temporal-order-worker
temporal task-queue describe --task-queue order-tq --task-queue-type workflow
Pollers: 4

Pollers appear, the pending tasks are picked up, and the stalled workflows resume.

Prevention Best Practices

  • Alert on WorkflowTaskTimedOut events and on Pollers: 0 for any active task queue — a stalled fleet is invisible otherwise.
  • Treat the task-queue name as a contract: define it once in shared config so the worker and the workflow-starter can never drift apart.
  • Keep workflow code deterministic and non-blocking; all I/O, sleeps over a second, and CPU-heavy work go in activities.
  • Size the sticky workflow cache for your history depth, and use Continue-As-New to cap history so replay stays well under the task timeout.
  • Provision enough concurrent task slots/workers for peak load, and leave WorkflowTaskTimeout at or near the 10s default rather than tightening it.
  • For ad-hoc triage, the free incident assistant can classify a timeout’s timeoutType into the likely queue-vs-replay cause. More in the automation guides.

Quick Command Reference

# Find the timeout events and their type
temporal workflow show --workflow-id <WID> -o json \
  | jq '.events[] | select(.eventType=="WorkflowTaskTimedOut")'

# Are workers polling this queue?
temporal task-queue describe --task-queue <TQ> --task-queue-type workflow

# Compare worker queue vs workflow start queue
grep -RniE "taskQueue|task_queue" ./worker
temporal workflow show --workflow-id <WID> -o json \
  | jq -r '.events[0].workflowExecutionStartedEventAttributes.taskQueue.name'

# History size (replay risk) and cache config
temporal workflow show --workflow-id <WID> -o json | jq '.events | length'
grep -RniE "maxCachedWorkflow|maxConcurrentWorkflowTask|workflowTaskTimeout" ./worker ./workflows

# Blocking calls that shouldn't be in workflow code
grep -RniE "fetch\(|axios|requests\.|time.sleep|fs\.|new Date\(\)" ./workflows

Conclusion

A Temporal workflow task timeout means no worker completed a workflow task within WorkflowTaskTimeout. The usual root causes:

  1. No worker is polling the task queue (Pollers: 0).
  2. The worker’s task-queue name doesn’t match where workflows are started.
  3. Blocking or non-deterministic code in the workflow stalls task processing.
  4. Sticky-cache eviction forces a full history replay that overruns the timeout.
  5. The worker fleet is under-provisioned for the load.
  6. WorkflowTaskTimeout is set too low for the workflow’s history.

Read the timeoutType from history first — ScheduleToStart points at queues/pollers, StartToClose at replay/blocking/overload — and the fix follows from there.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.