AI-Assisted Firewall Rule Reviews for nftables

A firewall ruleset starts clean and rots over time. Someone opens a port for a one-off test and never closes it. A 0.0.0.0/0 allow gets added to “unblock the team” and becomes permanent. A new rule shadows an older deny without anyone noticing the deny no longer fires. After a year, the ruleset says one thing and does another, and nobody can confidently tell you what’s actually exposed.

I’ve stared at enough overgrown nftables rulesets to know that reading them correctly is genuinely hard — the evaluation order, the chain priorities, and the default policies all interact. That makes it a great fit for an AI reviewer doing a defensive audit: tell me what this ruleset actually permits, find the rules that are too broad or that never fire, so I can tighten them. Here’s the approach.

What goes wrong in a firewall over time

The recurring problems in a real-world ruleset:

Overly broad source ranges — a service that only needs to talk to a handful of internal hosts is opened to 0.0.0.0/0.
Default-allow gaps — a chain whose policy is accept instead of drop, so anything not explicitly denied gets through.
Shadowed rules — an earlier accept that matches the same traffic as a later, more specific deny, so the deny never takes effect.
Forgotten temporary openings — that debug port from six months ago.
Stale management access — SSH open to the world rather than a bastion or VPN range.

Each is invisible unless you trace the ruleset the way the kernel does: top to bottom, per chain, with the default policy as the fallthrough. That tracing is exactly what an AI is good at, and what humans rush.

Export the live ruleset

Audit what’s loaded, not what’s in a config file that may be out of sync:

# The complete, live ruleset as the kernel sees it
sudo nft list ruleset > /tmp/nft-live.conf

# Default policies per chain — the most important single thing to check
sudo nft list chains | grep -i "policy"

The chain policy is the first thing I look at and the first thing I have the model check. A chain with policy accept; and no terminal drop is a default-allow firewall, which is to say barely a firewall at all.

A firewall-audit prompt that traces evaluation order

I tell the model to reason like the packet filter — in order, per chain:

You are a network security auditor reviewing an nftables ruleset.
Reason about it the way the kernel evaluates it: per chain, top to
bottom, with the chain's default policy as the fallthrough. Find
security issues ONLY:
1. Default policy: any input/forward chain with policy accept and no
   terminal drop — i.e. an effective default-allow.
2. Overly broad allows: rules accepting from 0.0.0.0/0 or ::/0 that
   likely should be scoped to specific sources.
3. Management exposure: SSH (22) or admin ports open to the internet
   rather than a bastion/VPN range.
4. Shadowed rules: any rule that can never match because an earlier
   rule already handles that traffic.
5. Stale/overly-permissive port openings.
For each finding, name the rule, explain what it exposes, and rank by
severity. Do not rewrite the ruleset.

<paste nft list ruleset output>

The shadowed-rule detection is where the model genuinely outperforms a tired human. Consider:

# An earlier broad accept shadows the later, intended deny
tcp dport 22 accept
ip saddr 203.0.113.0/24 tcp dport 22 drop   # never fires

The intent was clearly to block SSH from that subnet, but the unconditional accept above it already let everything through, so the drop is dead. A line-by-line human read misses this constantly; a model tracing evaluation order catches it.

Pro Tip: Ask the model to produce a plain-English summary of “what is reachable from the public internet” after evaluating the whole ruleset. That single sentence — the effective external attack surface — is the number you actually care about, and it’s the thing a pile of individual rules obscures.

Tighten toward default-deny

The fixes follow a pattern: establish default-deny, then explicitly allow only what’s needed, scoped tightly. A hardened input chain looks like:

table inet filter {
  chain input {
    type filter hook input priority 0; policy drop;   # default deny

    ct state established,related accept
    iif "lo" accept

    # SSH only from the bastion, not the world
    ip saddr 10.0.1.10 tcp dport 22 accept

    # Public service, intentionally open
    tcp dport { 80, 443 } accept
  }
}

The policy drop; is the whole game — everything not explicitly accepted is denied. The AI is good at proposing this structure, but I always load a tightened ruleset with care, because a firewall change can lock you out of the box. Use a rollback timer when testing remotely so a mistake auto-reverts.

Don’t lock yourself out — and keep auditing

I never apply a new ruleset over SSH without a safety net:

# Schedule a revert before applying — if the new rules lock you out,
# the old ruleset comes back in 2 minutes and your session survives.
sudo sh -c 'sleep 120 && nft -f /etc/nftables-backup.conf' &
sudo nft -f /etc/nftables-new.conf
# If you still have your session, cancel the revert and persist.

Firewall rules drift continuously, so I run this audit on a schedule and on every ruleset change. The review pairs naturally with the iptables-to-nftables migration guide if you’re still moving over. I drive the audit with Claude and keep the prompts reusable in the prompt library.

Defensive, verified, and lockout-safe

The model is a fast junior network auditor — it traces evaluation order tirelessly and catches shadowed rules and broad allows that humans skim past. But it doesn’t know which of your 0.0.0.0/0 rules are intentional public services versus mistakes, so I verify every “tighten this” recommendation against what the service actually needs. And firewall edits can lock you out, so I always test remote changes behind a rollback timer. There are no secrets in a ruleset, but I redact internal IP ranges I’d rather not disclose before pasting — the structure is enough.

Conclusion

A firewall is only as trustworthy as your ability to read it, and rulesets become unreadable with age. An AI auditor that traces the ruleset the way the kernel does turns “I think we’re fine” into a concrete statement of what’s actually reachable — while you keep the rollback timer, verify intent, and own the apply. Make it routine alongside the rest of your security and hardening work.