Azure Policy as Guardrails With AI: Write the Rules, Not

Our cloud standards lived in a Confluence page. “All resources must be tagged with a cost center.” “No public IPs on production VMs.” “Storage accounts must require TLS 1.2.” It was a good page. It was also completely unenforced, which I discovered when I ran a compliance scan and found that roughly a third of resources violated rules we’d “agreed on” eighteen months earlier. A standard that lives in a wiki is a suggestion. A standard that lives in Azure Policy is a guardrail. The gap between those two is where every governance program quietly dies.

The reason teams don’t convert the wiki into policy is that Azure Policy definitions are fiddly JSON — alias paths, condition logic, effects, parameterization — and nobody enjoys writing them. That’s the exact tedium AI removes. It will turn “deny untagged resources” into a working policy definition, decode why a resource shows non-compliant, and draft the remediation. It does not decide your governance. You own the rules and the rollout; AI does the JSON authoring you’ve been avoiding.

Turn a plain-English rule into a policy definition

The friction is the alias system — the property path Policy evaluates, like Microsoft.Storage/storageAccounts/minimumTlsVersion, which you’d otherwise have to look up. Describe the rule and let AI draft, with one hard requirement: verify the alias is real.

Prompt: “Write an Azure Policy definition that DENIES creation of any resource missing a tag named costCenter. Use the field condition on tags['costCenter'] with the exists operator. Make the effect a parameter so I can switch between Audit and Deny. Output the full policy rule JSON.”

A solid draft:

{
  "properties": {
    "displayName": "Require costCenter tag on all resources",
    "mode": "Indexed",
    "parameters": {
      "effect": {
        "type": "String",
        "allowedValues": ["Audit", "Deny", "Disabled"],
        "defaultValue": "Audit"
      }
    },
    "policyRule": {
      "if": {
        "field": "tags['costCenter']",
        "exists": "false"
      },
      "then": { "effect": "[parameters('effect')]" }
    }
  }
}

Now verify the alias before you trust any policy that references a resource property — AI invents alias paths that look right and don’t exist:

# Confirm an alias actually exists for the property you're gating
az provider operation show --namespace Microsoft.Storage 2>/dev/null
az policy definition list --query "[?policyType=='BuiltIn'].displayName" -o tsv | grep -i tls

For the TLS example, check that Microsoft.Storage/storageAccounts/minimumTlsVersion is a valid alias rather than assuming. A policy on a non-existent alias silently evaluates nothing — it deploys clean and enforces air. That verify step is the whole ballgame.

Always roll out in Audit before Deny

The fastest way to cause an outage with good intentions is to assign a Deny policy to a live scope and discover it blocks a legitimate deployment pattern you forgot about. Assign as Audit first, read the compliance results, then tighten:

# Assign the policy in audit mode at a resource group
az policy assignment create --name "require-costcenter" \
  --policy "$POLICY_DEF_ID" --scope "$RG_ID" \
  --params '{"effect":{"value":"Audit"}}'

# After it evaluates, what's non-compliant?
az policy state list --resource-group "$RG" \
  --query "[?complianceState=='NonCompliant'].{resource:resourceId, policy:policyDefinitionName}" -o table

Hand the non-compliance list to AI before you flip to Deny:

Prompt: “Here is the Audit-mode compliance result for a ‘require costCenter tag’ policy. Summarize how many resources would be blocked if I switch this to Deny, group the violations by resource type, and flag any resource type where denying creation would break a legitimate automated workflow (e.g. resources created by a managed service that can’t set tags). Recommend whether I’m safe to flip to Deny.”

That’s the review that prevents the self-inflicted outage. Some Azure services create child resources you can’t tag, and a naive Deny breaks them. AI spots the pattern in the audit data; you make the go/no-go call. The same audit-before-enforce discipline shows up everywhere in Azure governance and security work.

Use initiatives and remediation, not one-off policies

A pile of individual policies is unmanageable. Group them into an initiative (policy set) that maps to a real standard — “Production Baseline” — and use deployIfNotExists or modify effects to fix drift instead of just flagging it. AI is good at both the grouping and the remediation logic:

Prompt: “I have five policies: require costCenter tag, enforce TLS 1.2 on storage, deny public IPs on VMs, require diagnostic settings on Key Vaults, and audit unencrypted disks. Group them into one Azure Policy initiative called ‘Production Baseline’ with a single effect parameter per policy. Then tell me which of these can use a modify or deployIfNotExists effect to auto-remediate rather than just audit, and the risk of auto-remediation for each.”

Auto-remediation is powerful and sharp — a modify effect that adds a missing tag is safe; a deployIfNotExists that creates diagnostic settings is mostly safe; anything that changes network exposure deserves a human. AI categorizes the risk; you decide which remediations run unattended. Trigger a remediation for existing resources explicitly:

az policy remediation create --name "fix-missing-diag" \
  --policy-assignment "$ASSIGNMENT_ID" --resource-group "$RG"

Read compliance like a report, not a spreadsheet

Once policies are live, the compliance data is your governance dashboard — but the raw az policy state output is dense. Let AI turn it into a status you’d actually send to leadership:

az policy state summarize --subscription "$SUB_ID" \
  --query "policyAssignments[].{name:policyAssignmentId, nonCompliant:results.nonCompliantResources}" -o table

Prompt: “Here is a policy compliance summary across a subscription. Write a three-paragraph status: overall compliance posture, the top three risk areas by non-compliant resource count, and a prioritized remediation plan. Flag any policy at 0% compliance, since that usually means the policy is misconfigured rather than the environment being entirely non-compliant.”

That last instinct — 0% compliance usually means a broken policy, not a broken environment — is exactly the kind of pattern AI catches that a spreadsheet doesn’t surface. A policy on a bad alias reads as everything-non-compliant or everything-compliant; either extreme is a smell. You verify by re-checking the alias.

The discipline

AI authors the policy JSON, decodes compliance, and drafts remediation; you own the rules, the alias verification, and the Audit-to-Deny rollout. Policy is safe to let AI draft heavily because every definition is testable in Audit mode before it can block anything — but the human owns the flip to Deny and the choice of which remediations run unattended. The loop: describe the rule, verify the alias is real, assign as Audit, read the compliance result for legitimate workflows you’d break, then enforce. Do that and your standards stop being a wiki page nobody reads and become guardrails the platform enforces for you.

The Azure Policy authoring prompts I rely on are in the prompts library, and there’s more governance material in the Azure category. The wiki page was never the control. The policy is — and AI finally makes writing the policy faster than writing the wiki page ever was.

Azure Policy as Guardrails With AI: Write the Rules, Not Just the Wiki Page

Turn a plain-English rule into a policy definition

Always roll out in Audit before Deny

Use initiatives and remediation, not one-off policies

Read compliance like a report, not a spreadsheet

The discipline

Download the Free 500-Prompt DevOps AI Toolkit