Risk-Tiered Approval Gates With Policy-as-Code for

For a while our automation had exactly one approval policy: everything needed a human click. It sounds safe. In practice it trained everyone to rubber-stamp, because 95% of the actions were trivial — restart a stuck pod, clear a cache, bump a non-prod replica count — and approval fatigue set in fast. People clicked “approve” without reading, which meant the 5% of genuinely dangerous actions got the same reflexive click as the harmless ones. A gate everyone ignores is worse than no gate, because it manufactures false confidence.

The fix was to stop treating all actions the same and start tiering by risk. Low-risk actions run unattended; medium-risk needs one approver; high-risk needs two and a change record. The decision of which tier an action falls into is encoded as policy-as-code, so it’s consistent, auditable, and version-controlled. This is how I build approval gates that people actually respect.

Risk tiers, not a single gate

The core idea is to score every proposed automated action and route it by score:

Tier 0 — auto-execute. Reversible, scoped, low blast radius. Restart one pod, clear a cache, scale a non-prod service. No human; just an audit log.
Tier 1 — single approval. Touches prod but reversible. Scale prod within bounds, rotate a non-critical credential. One approver.
Tier 2 — dual approval + change record. Destructive or wide blast radius. Delete data, modify network policy, deploy to all regions. Two approvers and a linked change ticket.

The point is proportionality. The friction of the gate matches the consequence of the action. Tier 0 staying frictionless is what preserves attention for Tier 2 — people read the dangerous approvals precisely because they’re not drowning in trivial ones.

Scoring an action with policy-as-code

The tiering logic shouldn’t be buried in application code where it’s invisible and untestable. Encode it as policy. Here’s the shape in Rego (OPA):

package automation.approval

default tier := 2   # fail safe: unknown actions are high-risk

tier := 0 {
    input.action in {"restart_pod", "clear_cache"}
    input.environment != "production"
}

tier := 1 {
    input.action == "scale"
    input.environment == "production"
    input.replicas <= input.current * 2     # within a sane bound
}

tier := 2 {
    input.action in {"delete_volume", "modify_network_policy"}
}

tier := 2 {
    input.environment == "production"
    input.blast_radius > 10                 # touches many resources
}

The automation calls OPA with the proposed action, gets back a tier, and enforces the matching approval requirement. The default tier := 2 is the most important line: anything the policy doesn’t recognize is treated as high-risk. Fail safe, never fail open.

The gate enforces; the policy decides

Keep the two concerns separate. The policy classifies risk; the orchestration enforces the requirement:

def gate(action):
    tier = opa.evaluate("automation/approval/tier", input=action)
    if tier == 0:
        audit.log(action, decision="auto")
        return execute(action)
    approvers = {1: 1, 2: 2}[tier]
    approvals = request_approvals(action, count=approvers, tier=tier)
    if approvals.granted:
        if tier == 2:
            require_change_record(action, approvals)
        audit.log(action, decision="approved", by=approvals.who)
        return execute(action)
    audit.log(action, decision="denied")

This separation means you can change the policy — re-tier an action because it turned out riskier than you thought — without touching the enforcement code, and you can test the policy in isolation.

Pro Tip: Write policy tests for the tier boundaries, not just the obvious cases. The bug is never “delete_volume should be tier 2” — it’s the edge case where a prod scale of 2x current slips into tier 0 because of an operator precedence mistake. OPA’s opa test lets you assert tiers for crafted inputs; treat those tests as guardrails you can’t ship without.

Blast radius drives the tier

Notice the policy keys on blast_radius directly — the number of resources an action touches. This is the variable that should dominate tiering. The same action — “delete” — is Tier 1 against one orphaned test volume and Tier 2 against a wildcard that matches forty production volumes. So the automation must compute and pass the real blast radius (a dry-run count of affected resources) into the policy, not just the action name.

An action that can’t report its blast radius up front shouldn’t be auto-executable at any tier. If you can’t tell the policy how much it will touch, you can’t classify its risk.

Where AI fits — proposer, never approver

This is the load-bearing rule when AI is in the loop. An LLM is a fast junior engineer that proposes actions — and a proposed action enters this gate at the appropriate tier like any other. The model does not get to approve, and crucially, the model’s involvement should raise scrutiny, not lower it. An AI-generated delete is still a delete; it goes through Tier 2 exactly as a human-typed one would.

I use Claude to draft remediation actions and even to draft the Rego policy itself — it’s genuinely good at the policy boilerplate. But two hard lines: the model never holds the credentials that execute the action (it produces a proposal, the gated orchestration executes), and a human owns every approval click for Tier 1 and above. The AI proposing and the AI approving must never be the same loop, or you’ve built an unsupervised system wearing a gate as a costume. This proposer/approver split is the same discipline the code-review dashboard applies to AI-suggested code. I keep my approval-policy prompts in the prompt workspace.

Every tier logs; the back-out path is in the record

Tier 0 still writes an audit log — “auto-execute” is not “invisible.” That log is your back-out path: when something goes wrong, you need to know exactly what ran, with what scope, and on whose authority. For Tier 2, the linked change record carries the back-out plan explicitly, because the most dangerous actions are the ones you most need to be able to reverse.

I review the auto-executed Tier 0 stream periodically too. If an action is firing thousands of times a day, either it’s a candidate for fixing the root cause, or it’s quietly doing more than I assumed and deserves re-tiering.

Conclusion

A single approval gate for everything trains people to rubber-stamp and squanders attention on trivia. Tier actions by risk with policy-as-code, key the tier on real blast radius, default unknown actions to high-risk, and keep proposing (where AI helps) strictly separate from approving (where a human owns the call). Log every tier, attach back-out plans to the dangerous ones, and never let the model hold the credentials that execute.

The automation category covers the related guardrails — blast-radius scoping, dry-runs, and confidence-gated remediation — and the prompt packs include reviewed policy-as-code templates.