Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for DevOps Security & Hardening By James Joyner IV · · 11 min read

Hardening JWT Validation: An AI-Assisted Review of the Footguns

JWTs fail open in quiet ways. Here's how I use AI as a fast junior reviewer to catch alg confusion, skipped signature checks, and missing claim validation before they ship.

  • #security
  • #hardening
  • #jwt
  • #authentication
  • #ai

The first time I traced a production auth bypass back to a JWT library, the root cause was four words in a config: verify_signature defaulted to off in a code path nobody read carefully. The token was malformed, the check was skipped, and the request sailed through as an admin. Nobody attacked us cleverly. We attacked ourselves by trusting a default.

JSON Web Tokens fail in quiet, boring ways, and the failures almost never throw an error. That makes them perfect work for an AI reviewer: I treat the model like a fast junior engineer who has read every CVE write-up about JWTs and never gets bored re-checking the same five things. It reads the code, I verify the findings, and nothing gets applied until a human signs off. This is strictly a defensive workflow, and I never paste real signing keys or live tokens into a prompt.

Start with the validation path, not the issuance path

Most JWT bugs live on the verify side. When I review an authentication module, I ask the AI to map every place a token is decoded and answer one question per call site: is the signature actually verified, or just parsed?

A prompt I reuse:

Here is our token verification code. For each decode call, tell me whether the signature is cryptographically verified or only decoded. Flag any path that reads claims before verification succeeds. Defensive review only.

The model is good at spotting the difference between jwt.decode(token, key, algorithms=["RS256"]) and jwt.decode(token, options={"verify_signature": False}). That second form exists for debugging and shows up in production more often than anyone admits. I grep to confirm every hit it reports:

grep -rn "verify_signature" --include="*.py" . | grep -i "false"
grep -rn "decode(" --include="*.js" src/ | grep -iv "verify"

If the AI claims a path skips verification, I read those exact lines myself before believing it.

Pin the algorithm, and make the model prove it

Algorithm confusion is the classic. If your verifier accepts a list that includes both RS256 and HS256, an attacker can sign a token with HS256 using your public key as the HMAC secret, because the public key is, by definition, public. The fix is to pin a single expected algorithm.

I ask the AI to extract the algorithm allowlist from every verify call and compare it against the issuer config:

List the algorithms argument passed to each verification call. Flag any that accept more than one algorithm family, and any that allow none. Tell me which is the issuing algorithm.

Pro Tip: never let alg: none be acceptable, and never derive the verification algorithm from the token’s own header. The token is attacker-controlled. The expected algorithm must come from your server config, hardcoded next to the key.

Validate claims, not just the signature

A correctly signed token can still be the wrong token. A valid signature only proves the issuer minted it, not that it was minted for you, for this audience, or recently. I have the model check that every verify path enforces:

  • exp expiry, with a sane clock-skew leeway (60 seconds, not 24 hours)
  • nbf not-before where issued
  • iss issuer matches the expected value
  • aud audience matches this service, not a sibling service that shares a signing key
  • sub is present and non-empty before it’s used as an identity

The audience check is the one teams forget. If three internal services share one signing authority and none check aud, a token for the billing API works fine against the admin API. I ask the AI to diff the claim checks across services and tell me which ones disagree.

Get the key handling right

A signed token is only as trustworthy as the key that verifies it. For asymmetric setups I have the model confirm we fetch the public key from a trusted JWKS endpoint over TLS, cache it sanely, and respect the kid header to select the right key during rotation, without letting the token dictate an arbitrary key URL. For symmetric HS256 setups I check that the secret is loaded from the environment or a secrets manager, never committed, and long enough to resist offline brute force.

I do not paste the actual key into the prompt. I describe the shape of the key material and let the AI reason about the handling code. When I need to sanity-check structure, I redact aggressively:

# Inspect a token's structure WITHOUT pasting it into any AI prompt
echo "$TOKEN" | cut -d. -f1 | base64 -d 2>/dev/null | jq .

Reduce the blast radius with short lifetimes

Even perfect validation can’t undo a stolen long-lived token. I ask the AI to report the configured exp window for access tokens and whether there’s a refresh-and-revoke story. Fifteen-minute access tokens with a revocable refresh token beat a twelve-hour access token with no revocation path. The model can also flag whether logout actually invalidates anything server-side, or just deletes a cookie and hopes.

For stateless designs where you can’t revoke individual tokens, I have it check for a token version or jti denylist hook, so a compromised account can be cut off without rotating the global key and logging out the whole world.

Make it a repeatable gate, not a one-off

Once the patterns are clear, this becomes a checklist I run on every auth change. I keep the prompts in a shared workspace so the whole team reviews the same way, and I pair the AI pass with deterministic tooling so nothing rides on the model alone. A static lint pass catches the obvious verify_signature=False; the AI catches the subtler “this path validates exp but not aud” reasoning that grep can’t express.

If you want a starting library, the audit and review prompts I lean on are collected in our prompts library, and the security-focused bundle in the DevOps security prompt pack packages the JWT and auth-review prompts together. For larger diffs I route the change through the code review dashboard so the AI findings land as inline comments a human approves. Tools like Claude and GitHub Copilot are both serviceable for this kind of read-and-explain review.

For the broader hardening playbook this fits into, the rest of the security hardening category covers the surrounding surface, from headers to secrets.

The takeaway

JWT validation goes wrong in silence, which is exactly why a tireless reviewer pays off. Let the AI read every decode call, list every algorithm, and enumerate every missing claim check, then verify each finding against the real code yourself before you change a line. The model is your fast junior engineer for the audit; you are the senior who signs off. Keep it defensive, keep the keys out of the prompt, and treat every default as guilty until proven safe.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.