Debugging NSG and VNet Connectivity on Azure With AI
Half of Azure networking tickets are an NSG rule, a missing route, or a subnet you forgot. Here's how AI helps you read rule tables, decode Network Watcher output, and stop guessing.
- #azure
- #ai
- #networking
- #nsg
- #troubleshooting
The ticket said “the app can’t reach the database.” It always says that. The app subnet and the database subnet were in the same VNet, the database accepted the connection from my laptop over the VPN, and the security team swore nothing had changed. It took forty minutes to find: an NSG associated at the subnet level had a deny rule with a lower priority number than the allow rule someone added at the NIC level, and because subnet NSGs evaluate before NIC NSGs on inbound, the deny won. Classic.
Azure networking failures are almost never exotic. They’re an NSG priority you misread, a route table sending traffic to a firewall that drops it, a service endpoint you didn’t enable, or DNS resolving to a private endpoint that the source subnet can’t reach. The problem is the evidence is scattered across NSGs, route tables, effective rules, and Network Watcher, and it’s all dense tabular output. AI is good at reading dense tabular output and telling you which row matters. You still run the commands and make the call.
Get the effective rules, not the rules you wrote
The single most common mistake is reading the NSG you authored instead of the rules that actually apply to the NIC. Subnet-level and NIC-level NSGs combine, default rules sit underneath yours, and priorities interleave. Azure will compute the effective set for you:
# The rules that actually apply to a NIC, after merging subnet + NIC NSGs
az network nic list-effective-nsg \
--name "$NIC_NAME" --resource-group "$RG" \
--query "value[].effectiveSecurityRules[].{name:name, dir:direction, access:access, prio:priority, src:sourceAddressPrefix, dst:destinationPortRange}" \
-o table
That table is the ground truth. When it’s a hundred rows long, paste it into AI with the specific flow you care about:
Prompt: “Here is the effective NSG rule table for a NIC. I’m trying to make an inbound TCP connection to port 5432 from source 10.20.1.0/24. Walk the rules in priority order and tell me the FIRST rule that matches this flow and whether it allows or denies. Show your matching logic for source, destination, port, and protocol on that rule.”
Azure evaluates rules by ascending priority and stops at the first match — exactly the kind of deterministic logic AI follows reliably, and exactly the kind of thing humans get wrong at row sixty. The answer is verifiable: you can re-read the one rule it points to.
Let Network Watcher do the actual test
Don’t theorize about whether traffic flows — measure it. IP Flow Verify tells you allow/deny and the rule responsible, and Connection Troubleshoot runs a real reachability check end to end.
# Does the platform allow this exact flow, and which rule decides?
az network watcher test-ip-flow \
--vm "$VM_NAME" --resource-group "$RG" \
--direction Inbound --protocol TCP \
--local 10.20.2.4:5432 --remote 10.20.1.10:51000
# Full reachability check including DNS, routing, and NSGs
az network watcher test-connectivity \
--resource-group "$RG" --source-resource "$VM_NAME" \
--dest-address db.internal.example.com --dest-port 5432
test-connectivity returns a JSON topology with per-hop status and, crucially, an issues array on the failing hop. That JSON is verbose and the useful bit is buried. Hand the whole thing to AI:
Prompt: “This is
az network watcher test-connectivityJSON. The connection failed. Find the hop where it broke, name the resource at that hop, and translate theissuesarray into plain English — is this an NSG block, a routing problem, or DNS? Tell me whichazcommand would confirm the root cause.”
I’ve had this point straight at a UDR sending traffic into an Azure Firewall that had no allow rule for the destination — something I’d have spent twenty minutes finding by hand. The AI reads the topology; Network Watcher provided the facts.
Routing is the half people forget
When NSGs check out clean, it’s routing. A user-defined route with a 0.0.0.0/0 next hop pointing at a firewall or NVA quietly intercepts everything, and effective routes (system + UDR + BGP) are where the truth lives:
az network nic show-effective-route-table \
--name "$NIC_NAME" --resource-group "$RG" \
--query "value[].{prefix:addressPrefix, nextHopType:nextHopType, nextHopIp:nextHopIpAddress, source:source}" \
-o table
Paste that table to AI and ask, “For destination 10.20.1.10, which route wins by longest-prefix match, and where does the packet go next?” Azure picks the most specific prefix; AI applies that rule consistently across a messy table where a /16 and a /24 overlap. If the winning next hop is a virtual appliance, your NSGs were never the problem — the firewall is eating the packet.
Private endpoints and DNS, the modern foot-gun
Most of today’s “it worked yesterday” connectivity tickets trace back to private endpoints and Private DNS zones. The service name now resolves to a private IP, and if the source subnet isn’t linked to the right Private DNS zone — or still hits public DNS — you get a connection to the wrong address or a timeout. Confirm what the name actually resolves to from inside the network, then check the zone link:
az network private-dns zone list -o table
az network private-dns link vnet list \
--zone-name "privatelink.postgres.database.azure.com" \
--resource-group "$RG" -o table
Prompt: “From the app subnet,
nslookup mydb.postgres.database.azure.comreturns a public IP, but I configured a private endpoint. Given Azure Private DNS behavior, list the three most likely misconfigurations in order, and the exactazcommand to verify each.”
The usual culprit is a VNet that was never linked to the privatelink.* zone, so it falls through to public resolution. AI knows the standard failure modes; you verify which one is yours.
The workflow that keeps you in control
The pattern across every one of these is the same and it matters: you run the diagnostic command, AI interprets the output, you decide the fix. Never let AI talk you into “just open the NSG to 0.0.0.0/0 to test” — that’s how a debugging shortcut becomes a permanent exposure. The right loop is narrow and reversible: pull effective rules and routes, let AI point at the single responsible row, make one targeted change, and re-run test-ip-flow to confirm. Then write the rule’s reason into its description field so the next engineer doesn’t burn forty minutes on the same priority-ordering surprise I did.
Azure gives you genuinely good network diagnostics — Network Watcher is underused — and AI turns their verbose output into a fast answer without you handing over the keys. If you want more in this vein, the Azure category has related material, and the connectivity-debugging prompts I lean on live in the prompts library. Networking is unforgiving because the failure modes are layered, but they’re a small, finite set. Read the effective rules, trust the measurement over the theory, and let the model do the reading.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.