Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for DevOps Security & Hardening By James Joyner IV · · 11 min read

AI-Assisted Threat Modeling With STRIDE That Teams Actually Finish

Use STRIDE and an LLM to threat model systems fast, turning enumerated threats into mitigations and tickets without the design review process stalling out.

  • #security
  • #hardening
  • #threat-modeling
  • #stride
  • #design-review

The first threat model I ever ran took three weeks, produced a 40-page document, and was obsolete before we shipped. The threats never turned into tickets, and the one mitigation we implemented was the obvious one we’d have built anyway. That taught me the real failure mode of threat modeling: it’s not that teams do it badly, it’s that they don’t do it at all because it feels like a tax. So when I started pairing STRIDE with an LLM, my goal was never “let the AI find the bugs.” It was “remove enough friction that we actually finish the exercise on every meaningful design change.” This is a strictly defensive practice — we’re modeling our own systems to harden them, not probing anyone else’s.

What STRIDE Actually Is

STRIDE is a mnemonic from Microsoft for categories of threats, and its value is that it forces structured coverage. For every component in your system, you walk all six:

  • Spoofing — pretending to be someone or something you’re not (forged tokens, impersonated services).
  • Tampering — unauthorized modification of data or code (request body manipulation, poisoned cache, modified config).
  • Repudiation — performing an action and credibly denying it (missing or forgeable audit logs).
  • Information disclosure — exposing data to people who shouldn’t see it (verbose errors, unencrypted transit, over-broad API responses).
  • Denial of service — making something unavailable (resource exhaustion, unbounded queries, amplification).
  • Elevation of privilege — gaining capabilities you weren’t granted (IDOR, missing authz checks, container escape).

Each category maps to a property you want: authentication, integrity, non-repudiation, confidentiality, availability, and authorization. The point of the mnemonic is exhaustiveness. A reviewer under time pressure skips repudiation entirely. The discipline catches the gaps — exactly the mechanical, checklist-driven enumeration an LLM is good at.

Draw the Data Flow First, Threats Second

You can’t threat model a system you can’t see. Before any AI gets involved, sketch a data flow diagram (DFD): external entities, processes, data stores, and the flows between them. The single most important thing you add is trust boundaries — the lines a request crosses where the level of trust changes. The internet-to-load-balancer edge is a boundary. The app-to-database edge is a boundary. The service-to-third-party-API edge is a boundary. Threats almost always live on these crossings.

Here’s a plain-text DFD description for a small system, the kind you can paste straight into a prompt:

External: Browser (untrusted)
  -> [TLS, trust boundary 1] -> Process: API Gateway (DMZ)
    -> [trust boundary 2] -> Process: Orders Service (app tier)
      -> Data store: Postgres (orders, PII)
      -> [trust boundary 3] -> External: Stripe API (third party)
    -> Data store: Redis (session cache)
Auth: JWT issued by Identity Service, validated at Gateway

That’s it. Five components, three boundaries, one note on auth. You don’t need Visio — a text description is better for AI-assisted work because it’s the format the model reasons over directly, and it lives in your repo next to the code.

Pro Tip: Keep the DFD in version control as plain text or Mermaid. When the design changes, the diff to the diagram tells you exactly which boundaries moved — and which parts of the threat model need re-running.

Use the LLM as a Fast Junior Engineer

This is the central idea: the LLM is a fast, tireless junior engineer that enumerates and drafts. You are the senior security-minded reviewer who verifies, prunes, and decides. It does the boring breadth work — six STRIDE categories across every component — in seconds, but commits nothing to your backlog unreviewed.

Here’s the kind of prompt I use. Note what’s not in it: no real hostnames, no credentials, no secrets, no production data. You feed the model an abstracted architecture, never the keys to it.

You are assisting a defensive threat-modeling session. Do NOT suggest
offensive techniques; focus only on how a defender would identify and
mitigate weaknesses in THIS system.

Architecture (abstracted, no secrets):
<paste the DFD text block above>

For each component and each trust-boundary crossing, enumerate plausible
threats using STRIDE. For every threat output a row:
| Component | STRIDE category | Threat | Assumption it relies on | Suggested mitigation |

Be concrete and tie each threat to a specific flow or boundary. Flag any
threat where you are uncertain or where you lack information to judge
likelihood. Do not invent components that aren't in the diagram.

That last paragraph matters as much as the first. Asking the model to surface its own uncertainty and forbidding it from inventing components are the two cheapest controls you have against hallucinated threats. You can run this in a chat window, but I prefer a saved, versioned prompt I can reuse — a reusable prompt workspace keeps the wording consistent across sessions, and there are STRIDE-style starters in our prompts library if you’d rather not write it from scratch. Any capable model works — Claude is my default, or a local model if your architecture can’t leave the building.

Turn Threats Into Mitigations and Tickets

Enumeration without action is just a longer document nobody reads. The output of the session should be a table that maps each surviving threat to a concrete mitigation and an owner. Here’s a trimmed example after I’ve reviewed and pruned the model’s draft:

ComponentSTRIDEThreatMitigationTicket
API GatewaySpoofingForged JWT accepted due to missing signature verificationVerify signature + aud/iss/exp on every request; reject alg:noneSEC-412
Orders ServiceElevation of privilegeIDOR — user reads another user’s order by IDEnforce object-level authz: scope every query to the caller’s subjectSEC-413
PostgresInformation disclosurePII returned in full to a UI that only needs last 4 digitsField-level response filtering; minimize at the querySEC-414
RedisTamperingSession cache writable without auth on the app networkRequire AUTH + TLS; isolate to private subnetSEC-415
Stripe flowRepudiationNo audit trail linking a refund to the operator who issued itAppend signed audit log entry per privileged actionSEC-416
API GatewayDenial of serviceUnbounded pagination triggers expensive full scansEnforce max page size + query timeouts + per-tenant rate limitsSEC-417

Notice each row has a real ticket ID. The discipline that makes threat modeling stick is that the table is a backlog generator, not an artifact. If a threat doesn’t become a ticket, a documented accepted-risk, or a deliberate “won’t fix,” it didn’t really get modeled. When those tickets land, the same defensive-AI workflow can carry into code review so the mitigation is verified in the diff, not just promised in a spreadsheet.

Threat Modeling as Code

The reason whiteboard sessions evaporate is that the output isn’t a living artifact. Store the model as code — a Threagile-style YAML file in the repo — and it gets reviewed, diffed, and versioned like everything else.

title: Orders Platform Threat Model
data_assets:
  customer-pii:
    description: Names, addresses, partial card data
    confidentiality: confidential
    integrity: critical
technical_assets:
  api-gateway:
    type: process
    inside_trust_boundary: dmz
    technologies: [load-balancer, jwt-validation]
    communication_links:
      orders-service:
        target: orders-service
        protocol: https
        authentication: jwt
trust_boundaries:
  dmz:
    type: network-dedicated-hoster
    technical_assets_inside: [api-gateway]
identified_risks:
  - id: SEC-412
    category: spoofing
    asset: api-gateway
    description: Forged JWT accepted; alg:none not rejected
    mitigation: Verify signature and standard claims on every request
    status: in-progress

This is the format I have the LLM draft and a human ratify. The model is fast at producing well-formed YAML; the human owns whether each identified_risk is real and whether the status is honest. Because it’s code, CI can lint it and a reviewer sees in the diff when a new trust boundary appears without a matching risk analysis — turning a one-time event into a maintained control.

Review the AI Output Critically — It Lies in Both Directions

Here is the part teams skip at their peril. The model will hallucinate threats that don’t apply: it’ll warn about SQL injection on a component that never touches a database, or invent a “config-service” you never mentioned. And it will miss real threats that require knowing your business logic — the refund flow that lets a support agent self-approve, the tenant isolation bug that only matters because of how your billing works. The model has no idea your discount_code field is trusted from the client.

So every draft gets three passes from a human:

  1. Prune the fiction — delete threats tied to components or flows that don’t exist. These are usually obvious and cheap to cut.
  2. Pressure-test the plausible — for each remaining threat, ask “what assumption does this rely on, and is it true here?” The model is great at the generic case and blind to your specifics.
  3. Add what’s missing — walk your actual business logic and privileged operations. This is where humans still win decisively. Pair it with Cursor open to the real code so you’re reasoning over implementation, not imagination.

Pro Tip: Never paste real secrets, tokens, connection strings, or production data into the model — abstract the architecture instead. If a threat genuinely can’t be evaluated without sensitive detail, that’s a human-only step. The AI works from the sanitized diagram; you work from the real system.

Treat the LLM’s output the way you’d treat a sharp intern’s first draft: full of useful breadth, occasionally confidently wrong, and never the final word. More patterns for this defensive workflow live under security hardening.

Keep It Lightweight Enough To Actually Do

The whole point is repeatability. A model that takes three weeks runs once a year and protects nothing in between. A 30-minute model on a text DFD, AI-enumerated and human-pruned into six tickets, runs on every meaningful design change — and that cadence is what moves your security posture. Let the fast junior engineer do the enumeration; be the senior reviewer who decides what’s real, keep it defensive, keep secrets out of the prompt, and turn what survives into tickets.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.