Temporal Signals and Human-in-the-Loop Automation Workflows
Durable workflows that wait days for an approval without burning a thread. How Temporal signals, queries, and timers build safe human-in-the-loop automation.
- #automation
- #temporal
- #orchestration
- #approval-gates
- #sre
The automation that finally sold me on Temporal was a deprovisioning flow that had to wait up to three business days for a manager to approve before deleting a departing employee’s resources. My first version was a cron job polling a “pending approvals” table every five minutes, with all the awkward state-machine bookkeeping that implies. It worked, barely, and it was miserable to reason about.
The Temporal version was a single function that ran the provisioning steps, then literally waited for an approval signal — for three days if it had to — without holding a thread, surviving restarts, and remembering exactly where it was. That’s the superpower of durable execution for human-in-the-loop automation: the workflow code reads like a synchronous script, but the runtime handles the waiting, the crashes, and the days-long timeouts for you.
Why human-in-the-loop is hard without durability
Any automation worth gating on a human has to wait for that human — and humans are slow and unpredictable. Without a durable engine you end up with one of two bad options: hold a process/thread open for hours (wasteful, dies on restart) or shred the workflow into a polling state machine across a database (correct but unreadable).
Temporal gives you a third option. A workflow can await a condition that won’t be true for days, and the engine persists the wait. If the worker process crashes and restarts, the workflow resumes from exactly where it paused. The waiting is free.
Signals: sending data into a running workflow
A signal is how the outside world pushes a message into a running workflow — an approval, a cancellation, a new input. The workflow declares a signal handler and then waits for it:
@workflow.defn
class DeprovisionWorkflow:
def __init__(self):
self._approved = None
@workflow.signal
def approve(self, decision: bool, approver: str):
self._approved = (decision, approver)
@workflow.run
async def run(self, user_id: str):
await disable_accounts(user_id) # safe, reversible first step
# Wait up to 3 days for a human decision — no thread held
try:
await workflow.wait_condition(
lambda: self._approved is not None,
timeout=timedelta(days=3),
)
except asyncio.TimeoutError:
await escalate(user_id) # nobody approved in time
return "escalated"
decision, approver = self._approved
if not decision:
await re_enable_accounts(user_id) # back-out path
return f"denied by {approver}"
await delete_resources(user_id) # destructive step, now authorized
return f"approved by {approver}"
Read that top to bottom: disable, wait for a human, then either undo or proceed. The three-day wait is one line. That readability is the entire point.
Queries: reading state without disturbing it
A query lets an external caller read a workflow’s current state without sending it anything. This is how your approval UI shows “what is this workflow waiting on right now?”
@workflow.query
def status(self) -> str:
if self._approved is None:
return "awaiting_approval"
return "decided"
Your dashboard polls the query to render the pending-approval list. No database of duplicated state to keep in sync — the workflow is the state. This pairs naturally with the kind of operational surface I use for incident response, where on-call needs to see what’s pending and act on it.
The approval gate is the whole design
Notice the structure: the workflow does the reversible setup, pauses at the gate, and only runs the destructive delete_resources after an explicit human signal. This is the human-in-the-loop pattern done right — the automation handles everything up to the consequential action, then stops and waits for a person to own that decision.
Three safety properties fall out of this for free:
- A human owns the destructive decision. The delete cannot happen without a signal carrying an approver’s identity.
- Blast radius is scoped. The workflow only touches
user_id’s resources, captured at start. - There’s a back-out path. A denial re-enables what was disabled; a timeout escalates instead of guessing.
Pro Tip: Always pair the wait_condition with a timeout, and make the timeout do something deliberate — escalate, deny, or notify. A human-in-the-loop step with no timeout is a workflow that can wait forever. The most dangerous approval gate is the one that quietly hangs because the approver went on vacation.
Where AI assists
LLMs are good at the Temporal boilerplate — the signal/query handler scaffolding, the wait-condition-with-timeout structure, the activity stubs. I treat Claude or Cursor as a fast junior engineer: describe the steps and the gate, and it drafts a clean workflow skeleton with the signals wired up.
What stays human is the placement of the gate and the definition of the destructive steps. Which step is the point of no return, what the compensation for a denial actually is, who is allowed to send the approve signal — those are judgment calls with real consequences. The model can also draft the body of an activity, but activities are where the real credentials live, so generated activity code runs against a sandbox and gets reviewed before it ever has prod access. The model never holds the credentials; the workflow’s activities do, and a human reviews them. I keep Temporal workflow prompts in my prompt workspace so they start from vetted patterns.
Activities, not workflows, do the dangerous work
A Temporal subtlety that protects you: workflow code must be deterministic, so all the side effects — API calls, deletes, credential use — live in activities, which run separately and can be retried independently. This separation is a safety feature. The workflow is pure orchestration logic you can read and trust; the activities are the small, isolated, individually-retryable units that actually touch production. Keep destructive operations in narrowly-scoped activities so the blast radius of any single step is exactly one well-defined action.
Operating long-running workflows
Because these workflows can live for days, you watch them differently than a quick job. I alert on workflows that have been waiting longer than expected (a stuck approval), workflows hitting their timeout escalation path, and activity failures that exhaust retries — all routed to the monitoring-alerts dashboard. A workflow silently stuck at a gate for a week is a process failure even when the code is working perfectly.
Conclusion
Temporal turns human-in-the-loop automation from a painful polling state machine into a readable, durable script that waits for people without burning resources. Use signals to bring decisions in, queries to expose state, and timeouts to keep gates from hanging forever. Put the destructive step after the gate, scope it to one entity, and give denials a real back-out path. Let AI draft the scaffolding, but keep the gate placement, the dangerous activities, and the credentials in human hands.
For the bigger picture on durable orchestration, see Temporal and Argo Workflows, and the automation category covers the surrounding safe-automation patterns. The prompt packs include reviewed Temporal workflow templates.