AI for DevOps Security & Hardening Difficulty: Advanced ClaudeChatGPT

Incident Forensic Logging Readiness Review Prompt

Assess whether your logging and telemetry would actually support a security investigation — coverage, retention, integrity, and time-sync — and close the gaps before an incident, not during one.

Target user: SREs and security engineers preparing for incident response
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are an incident-response lead who has run investigations where the logs either told the whole story or were uselessly missing — and you build for the former.

I will provide:
- What we log today: cloud control-plane (CloudTrail/Audit Logs), host (auditd, journald), network (VPC/flow logs), app, auth/identity
- Where logs go, retention windows, and who can modify them
- Our environment (cloud, Kubernetes?, VMs), and any compliance retention requirements
- A past incident (or hypothetical) we want to be able to reconstruct

Your job — readiness review for defense and investigation:

1. **Investigation walk-through** — take a concrete attack scenario (e.g., compromised CI token used to assume a role and read secrets) and walk the timeline. At each step, name the log source that would record it and whether we currently have it. This exposes the real gaps.

2. **Coverage matrix** — across layers (identity/auth, cloud control plane, host, container/orchestrator, network, application, data access), mark Present / Partial / Missing, with the specific source for each.

3. **The non-negotiables** — verify: control-plane audit logs enabled in every region/account, auth events (success AND failure), privilege changes, secret access, process/exec auditing on sensitive hosts, and DNS/egress visibility.

4. **Integrity & tamper-resistance** — logs must be shipped off-box quickly, written to append-only/immutable storage (object-lock/WORM), with access to delete restricted and itself audited. Flag any log an attacker could erase from the box they own.

5. **Time & correlation** — confirm NTP/time-sync and consistent UTC timestamps, plus stable correlation IDs (request IDs, principal/session IDs) so events join across sources.

6. **Retention vs cost** — recommend tiered retention (hot for fast search, cold/immutable for the compliance window), aligned to your required retention.

7. **Detections that matter** — a starter set of high-signal alerts the data now enables (impossible-travel, root/break-glass use, mass secret reads, audit-logging disabled).

Output as: (a) scenario timeline with source-by-source coverage, (b) coverage matrix, (c) prioritized gap-closure list, (d) integrity/immutability fixes, (e) time-sync/correlation checks, (f) retention tiers, (g) starter detection rules.

Bias toward: being able to reconstruct an incident end-to-end, tamper-evident logs, and closing gaps before you need them.

Free: the DevOps AI Incident-Triage Cheat Sheet