Skip to content
CloudOps
Newsletter Sign up
All guides
AI for Automation By James Joyner IV · · 11 min read

Temporal Signals and Human-in-the-Loop Automation Workflows

Durable workflows that wait days for an approval without burning a thread. How Temporal signals, queries, and timers build safe human-in-the-loop automation.

  • #automation
  • #temporal
  • #orchestration
  • #approval-gates
  • #sre

The automation that finally sold me on Temporal was a deprovisioning flow that had to wait up to three business days for a manager to approve before deleting a departing employee’s resources. My first version was a cron job polling a “pending approvals” table every five minutes, with all the awkward state-machine bookkeeping that implies. It worked, barely, and it was miserable to reason about.

The Temporal version was a single function that ran the provisioning steps, then literally waited for an approval signal — for three days if it had to — without holding a thread, surviving restarts, and remembering exactly where it was. That’s the superpower of durable execution for human-in-the-loop automation: the workflow code reads like a synchronous script, but the runtime handles the waiting, the crashes, and the days-long timeouts for you.

Why human-in-the-loop is hard without durability

Any automation worth gating on a human has to wait for that human — and humans are slow and unpredictable. Without a durable engine you end up with one of two bad options: hold a process/thread open for hours (wasteful, dies on restart) or shred the workflow into a polling state machine across a database (correct but unreadable).

Temporal gives you a third option. A workflow can await a condition that won’t be true for days, and the engine persists the wait. If the worker process crashes and restarts, the workflow resumes from exactly where it paused. The waiting is free.

Signals: sending data into a running workflow

A signal is how the outside world pushes a message into a running workflow — an approval, a cancellation, a new input. The workflow declares a signal handler and then waits for it:

@workflow.defn
class DeprovisionWorkflow:
    def __init__(self):
        self._approved = None

    @workflow.signal
    def approve(self, decision: bool, approver: str):
        self._approved = (decision, approver)

    @workflow.run
    async def run(self, user_id: str):
        await disable_accounts(user_id)          # safe, reversible first step

        # Wait up to 3 days for a human decision — no thread held
        try:
            await workflow.wait_condition(
                lambda: self._approved is not None,
                timeout=timedelta(days=3),
            )
        except asyncio.TimeoutError:
            await escalate(user_id)              # nobody approved in time
            return "escalated"

        decision, approver = self._approved
        if not decision:
            await re_enable_accounts(user_id)    # back-out path
            return f"denied by {approver}"

        await delete_resources(user_id)          # destructive step, now authorized
        return f"approved by {approver}"

Read that top to bottom: disable, wait for a human, then either undo or proceed. The three-day wait is one line. That readability is the entire point.

Queries: reading state without disturbing it

A query lets an external caller read a workflow’s current state without sending it anything. This is how your approval UI shows “what is this workflow waiting on right now?”

    @workflow.query
    def status(self) -> str:
        if self._approved is None:
            return "awaiting_approval"
        return "decided"

Your dashboard polls the query to render the pending-approval list. No database of duplicated state to keep in sync — the workflow is the state. This pairs naturally with the kind of operational surface I use for incident response, where on-call needs to see what’s pending and act on it.

The approval gate is the whole design

Notice the structure: the workflow does the reversible setup, pauses at the gate, and only runs the destructive delete_resources after an explicit human signal. This is the human-in-the-loop pattern done right — the automation handles everything up to the consequential action, then stops and waits for a person to own that decision.

Three safety properties fall out of this for free:

  • A human owns the destructive decision. The delete cannot happen without a signal carrying an approver’s identity.
  • Blast radius is scoped. The workflow only touches user_id’s resources, captured at start.
  • There’s a back-out path. A denial re-enables what was disabled; a timeout escalates instead of guessing.

Pro Tip: Always pair the wait_condition with a timeout, and make the timeout do something deliberate — escalate, deny, or notify. A human-in-the-loop step with no timeout is a workflow that can wait forever. The most dangerous approval gate is the one that quietly hangs because the approver went on vacation.

Where AI assists

LLMs are good at the Temporal boilerplate — the signal/query handler scaffolding, the wait-condition-with-timeout structure, the activity stubs. I treat Claude or Cursor as a fast junior engineer: describe the steps and the gate, and it drafts a clean workflow skeleton with the signals wired up.

What stays human is the placement of the gate and the definition of the destructive steps. Which step is the point of no return, what the compensation for a denial actually is, who is allowed to send the approve signal — those are judgment calls with real consequences. The model can also draft the body of an activity, but activities are where the real credentials live, so generated activity code runs against a sandbox and gets reviewed before it ever has prod access. The model never holds the credentials; the workflow’s activities do, and a human reviews them. I keep Temporal workflow prompts in my prompt workspace so they start from vetted patterns.

Activities, not workflows, do the dangerous work

A Temporal subtlety that protects you: workflow code must be deterministic, so all the side effects — API calls, deletes, credential use — live in activities, which run separately and can be retried independently. This separation is a safety feature. The workflow is pure orchestration logic you can read and trust; the activities are the small, isolated, individually-retryable units that actually touch production. Keep destructive operations in narrowly-scoped activities so the blast radius of any single step is exactly one well-defined action.

Operating long-running workflows

Because these workflows can live for days, you watch them differently than a quick job. I alert on workflows that have been waiting longer than expected (a stuck approval), workflows hitting their timeout escalation path, and activity failures that exhaust retries — all routed to the monitoring-alerts dashboard. A workflow silently stuck at a gate for a week is a process failure even when the code is working perfectly.

Conclusion

Temporal turns human-in-the-loop automation from a painful polling state machine into a readable, durable script that waits for people without burning resources. Use signals to bring decisions in, queries to expose state, and timeouts to keep gates from hanging forever. Put the destructive step after the gate, scope it to one entity, and give denials a real back-out path. Let AI draft the scaffolding, but keep the gate placement, the dangerous activities, and the credentials in human hands.

For the bigger picture on durable orchestration, see Temporal and Argo Workflows, and the automation category covers the surrounding safe-automation patterns. The prompt packs include reviewed Temporal workflow templates.

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,300+ DevOps AI prompts
  • One practical workflow email per week