AI-Assisted Threat Modeling With STRIDE That Teams Actually Finish
Use STRIDE and an LLM to threat model systems fast, turning enumerated threats into mitigations and tickets without the design review process stalling out.
- #security
- #hardening
- #threat-modeling
- #stride
- #design-review
The first threat model I ever ran took three weeks, produced a 40-page document, and was obsolete before we shipped. The threats never turned into tickets, and the one mitigation we implemented was the obvious one we’d have built anyway. That taught me the real failure mode of threat modeling: it’s not that teams do it badly, it’s that they don’t do it at all because it feels like a tax. So when I started pairing STRIDE with an LLM, my goal was never “let the AI find the bugs.” It was “remove enough friction that we actually finish the exercise on every meaningful design change.” This is a strictly defensive practice — we’re modeling our own systems to harden them, not probing anyone else’s.
What STRIDE Actually Is
STRIDE is a mnemonic from Microsoft for categories of threats, and its value is that it forces structured coverage. For every component in your system, you walk all six:
- Spoofing — pretending to be someone or something you’re not (forged tokens, impersonated services).
- Tampering — unauthorized modification of data or code (request body manipulation, poisoned cache, modified config).
- Repudiation — performing an action and credibly denying it (missing or forgeable audit logs).
- Information disclosure — exposing data to people who shouldn’t see it (verbose errors, unencrypted transit, over-broad API responses).
- Denial of service — making something unavailable (resource exhaustion, unbounded queries, amplification).
- Elevation of privilege — gaining capabilities you weren’t granted (IDOR, missing authz checks, container escape).
Each category maps to a property you want: authentication, integrity, non-repudiation, confidentiality, availability, and authorization. The point of the mnemonic is exhaustiveness. A reviewer under time pressure skips repudiation entirely. The discipline catches the gaps — exactly the mechanical, checklist-driven enumeration an LLM is good at.
Draw the Data Flow First, Threats Second
You can’t threat model a system you can’t see. Before any AI gets involved, sketch a data flow diagram (DFD): external entities, processes, data stores, and the flows between them. The single most important thing you add is trust boundaries — the lines a request crosses where the level of trust changes. The internet-to-load-balancer edge is a boundary. The app-to-database edge is a boundary. The service-to-third-party-API edge is a boundary. Threats almost always live on these crossings.
Here’s a plain-text DFD description for a small system, the kind you can paste straight into a prompt:
External: Browser (untrusted)
-> [TLS, trust boundary 1] -> Process: API Gateway (DMZ)
-> [trust boundary 2] -> Process: Orders Service (app tier)
-> Data store: Postgres (orders, PII)
-> [trust boundary 3] -> External: Stripe API (third party)
-> Data store: Redis (session cache)
Auth: JWT issued by Identity Service, validated at Gateway
That’s it. Five components, three boundaries, one note on auth. You don’t need Visio — a text description is better for AI-assisted work because it’s the format the model reasons over directly, and it lives in your repo next to the code.
Pro Tip: Keep the DFD in version control as plain text or Mermaid. When the design changes, the diff to the diagram tells you exactly which boundaries moved — and which parts of the threat model need re-running.
Use the LLM as a Fast Junior Engineer
This is the central idea: the LLM is a fast, tireless junior engineer that enumerates and drafts. You are the senior security-minded reviewer who verifies, prunes, and decides. It does the boring breadth work — six STRIDE categories across every component — in seconds, but commits nothing to your backlog unreviewed.
Here’s the kind of prompt I use. Note what’s not in it: no real hostnames, no credentials, no secrets, no production data. You feed the model an abstracted architecture, never the keys to it.
You are assisting a defensive threat-modeling session. Do NOT suggest
offensive techniques; focus only on how a defender would identify and
mitigate weaknesses in THIS system.
Architecture (abstracted, no secrets):
<paste the DFD text block above>
For each component and each trust-boundary crossing, enumerate plausible
threats using STRIDE. For every threat output a row:
| Component | STRIDE category | Threat | Assumption it relies on | Suggested mitigation |
Be concrete and tie each threat to a specific flow or boundary. Flag any
threat where you are uncertain or where you lack information to judge
likelihood. Do not invent components that aren't in the diagram.
That last paragraph matters as much as the first. Asking the model to surface its own uncertainty and forbidding it from inventing components are the two cheapest controls you have against hallucinated threats. You can run this in a chat window, but I prefer a saved, versioned prompt I can reuse — a reusable prompt workspace keeps the wording consistent across sessions, and there are STRIDE-style starters in our prompts library if you’d rather not write it from scratch. Any capable model works — Claude is my default, or a local model if your architecture can’t leave the building.
Turn Threats Into Mitigations and Tickets
Enumeration without action is just a longer document nobody reads. The output of the session should be a table that maps each surviving threat to a concrete mitigation and an owner. Here’s a trimmed example after I’ve reviewed and pruned the model’s draft:
| Component | STRIDE | Threat | Mitigation | Ticket |
|---|---|---|---|---|
| API Gateway | Spoofing | Forged JWT accepted due to missing signature verification | Verify signature + aud/iss/exp on every request; reject alg:none | SEC-412 |
| Orders Service | Elevation of privilege | IDOR — user reads another user’s order by ID | Enforce object-level authz: scope every query to the caller’s subject | SEC-413 |
| Postgres | Information disclosure | PII returned in full to a UI that only needs last 4 digits | Field-level response filtering; minimize at the query | SEC-414 |
| Redis | Tampering | Session cache writable without auth on the app network | Require AUTH + TLS; isolate to private subnet | SEC-415 |
| Stripe flow | Repudiation | No audit trail linking a refund to the operator who issued it | Append signed audit log entry per privileged action | SEC-416 |
| API Gateway | Denial of service | Unbounded pagination triggers expensive full scans | Enforce max page size + query timeouts + per-tenant rate limits | SEC-417 |
Notice each row has a real ticket ID. The discipline that makes threat modeling stick is that the table is a backlog generator, not an artifact. If a threat doesn’t become a ticket, a documented accepted-risk, or a deliberate “won’t fix,” it didn’t really get modeled. When those tickets land, the same defensive-AI workflow can carry into code review so the mitigation is verified in the diff, not just promised in a spreadsheet.
Threat Modeling as Code
The reason whiteboard sessions evaporate is that the output isn’t a living artifact. Store the model as code — a Threagile-style YAML file in the repo — and it gets reviewed, diffed, and versioned like everything else.
title: Orders Platform Threat Model
data_assets:
customer-pii:
description: Names, addresses, partial card data
confidentiality: confidential
integrity: critical
technical_assets:
api-gateway:
type: process
inside_trust_boundary: dmz
technologies: [load-balancer, jwt-validation]
communication_links:
orders-service:
target: orders-service
protocol: https
authentication: jwt
trust_boundaries:
dmz:
type: network-dedicated-hoster
technical_assets_inside: [api-gateway]
identified_risks:
- id: SEC-412
category: spoofing
asset: api-gateway
description: Forged JWT accepted; alg:none not rejected
mitigation: Verify signature and standard claims on every request
status: in-progress
This is the format I have the LLM draft and a human ratify. The model is fast at producing well-formed YAML; the human owns whether each identified_risk is real and whether the status is honest. Because it’s code, CI can lint it and a reviewer sees in the diff when a new trust boundary appears without a matching risk analysis — turning a one-time event into a maintained control.
Review the AI Output Critically — It Lies in Both Directions
Here is the part teams skip at their peril. The model will hallucinate threats that don’t apply: it’ll warn about SQL injection on a component that never touches a database, or invent a “config-service” you never mentioned. And it will miss real threats that require knowing your business logic — the refund flow that lets a support agent self-approve, the tenant isolation bug that only matters because of how your billing works. The model has no idea your discount_code field is trusted from the client.
So every draft gets three passes from a human:
- Prune the fiction — delete threats tied to components or flows that don’t exist. These are usually obvious and cheap to cut.
- Pressure-test the plausible — for each remaining threat, ask “what assumption does this rely on, and is it true here?” The model is great at the generic case and blind to your specifics.
- Add what’s missing — walk your actual business logic and privileged operations. This is where humans still win decisively. Pair it with Cursor open to the real code so you’re reasoning over implementation, not imagination.
Pro Tip: Never paste real secrets, tokens, connection strings, or production data into the model — abstract the architecture instead. If a threat genuinely can’t be evaluated without sensitive detail, that’s a human-only step. The AI works from the sanitized diagram; you work from the real system.
Treat the LLM’s output the way you’d treat a sharp intern’s first draft: full of useful breadth, occasionally confidently wrong, and never the final word. More patterns for this defensive workflow live under security hardening.
Keep It Lightweight Enough To Actually Do
The whole point is repeatability. A model that takes three weeks runs once a year and protects nothing in between. A 30-minute model on a text DFD, AI-enumerated and human-pruned into six tickets, runs on every meaningful design change — and that cadence is what moves your security posture. Let the fast junior engineer do the enumeration; be the senior reviewer who decides what’s real, keep it defensive, keep secrets out of the prompt, and turn what survives into tickets.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.