Audit Logging and Threat Detection: Building a Trail You Can Actually Investigate
Logs you can't query are just disk usage. Here's how I build audit logging that survives an incident — auditd, cloud trails, tamper-resistance — and use AI to surface real threats.
- #security
- #hardening
- #audit-logging
- #detection
- #siem
- #ai
The worst moment in any security incident is realizing you can’t answer the most basic question: what actually happened? No audit trail, or a trail that stops exactly where you need it, or logs the attacker could have edited. After 25 years of post-incident reviews, I can tell you the difference between a contained incident and a catastrophe is usually whether you had the logs to investigate.
This is how I build audit logging that survives contact with a real adversary, and how I use AI to turn an ocean of log data into a short list of things worth investigating — without letting it touch anything.
Log the right events, not all events
“Log everything” produces noise you’ll never search and a bill you’ll resent. Audit logging is about capturing the events that matter for detection and investigation:
- Authentication — logins, failures, privilege escalation (
sudo,su, role assumption). - Authorization changes — new users, permission grants, policy edits, key creation.
- Sensitive data access — who read the secrets, the customer table, the prod database.
- Configuration changes — what changed, by whom, when.
- Process and network anomalies — unexpected outbound connections, new listeners, unusual binaries.
These are the events that reconstruct an attacker’s path. Everything else is supporting context.
On the host: auditd
On Linux, auditd is the workhorse. It records syscall-level events against rules you define, so privileged actions and sensitive-file access leave a trail the shell history can’t:
# /etc/audit/rules.d/hardening.rules
# Watch for changes to critical auth files
-w /etc/passwd -p wa -k identity
-w /etc/sudoers -p wa -k privilege
-w /etc/ssh/sshd_config -p wa -k sshd
# Record use of privileged commands
-a always,exit -F arch=b64 -S execve -F euid=0 -k root-exec
Now “someone edited sudoers at 03:12” is a logged fact with a username attached, not a mystery.
In the cloud: enable the trail everywhere
In cloud environments, the control-plane audit log is your most valuable evidence. Enable it in every region and account — attackers love the region nobody’s watching.
- AWS: CloudTrail in all regions, plus S3 data events for sensitive buckets.
- Kubernetes: API server audit policy logging requests at
MetadataorRequestlevel for sensitive resources. - GCP/Azure: equivalent admin and data-access audit logs, enabled org-wide.
A trail with a regional blind spot is a trail an attacker will find.
Make the logs tamper-resistant
Here’s the part teams forget: an attacker who lands on a box will try to erase their tracks. Logs that live only on the compromised host are logs the attacker controls. The fix is to get them off the host, fast, and make them append-only.
- Ship logs centrally in near-real-time so deletion on the host doesn’t erase the record.
- Restrict write access — the account running workloads should not be able to delete the log archive.
- Use immutability where the platform offers it (object-lock / write-once storage for the central archive).
- Monitor for log gaps — a sudden silence from a host is itself a signal.
If your logs can be edited by the same credential that gets compromised, they’re not evidence. They’re a suggestion.
Detection: turn logs into alerts
Stored logs are forensics. Detection is catching the attack while it’s happening. Build a focused set of detection rules for the patterns that reliably indicate trouble:
- A burst of auth failures followed by a success (brute force that worked).
- A new IAM user or policy granted outside your provisioning pipeline.
- A login from an impossible location or an unusual time for that account.
- A privileged container spawned, or a workload making outbound connections it never made before.
Keep the rule set small and high-signal. A detection system that cries wolf gets muted, and a muted alert is the same as no alert.
Using AI to surface real threats
Audit logs are voluminous and mostly boring, which is exactly why humans miss the one line that matters. AI is well-suited to reading more log data than you can and surfacing the anomalies — as an analyst, never as something with access to your systems.
I feed it a slice of normalized log data and prompt:
“Here is a sample of authentication and IAM audit logs for the last 24 hours. Identify any patterns consistent with credential abuse, privilege escalation, or reconnaissance: unusual login sources or times, permission grants outside business hours, repeated failures followed by success. For each, explain why it’s suspicious and what read-only check would confirm or dismiss it. Don’t recommend any action that changes state.”
That last sentence is the guardrail. During a possible incident you want the model proposing read-only confirmations, not telling you to disable accounts. AI reads and reasons; a human runs every command and makes every containment decision — the same discipline we apply to AI incident triage.
For tuning detection rules and reviewing the logging config itself, keep your prompts with the rest of your security hardening prompts, and run detection-rule changes through our Code Review tool so a confident-but-wrong rule doesn’t ship silently.
Test that you can actually investigate
A logging setup you’ve never exercised will fail you under pressure. Run the drill: pick a recent change and try to answer “who did this, when, from where, and what else did they touch?” purely from your logs. If you can’t, you’ve found the gap before the incident instead of during it.
- Confirm logs are reaching the central store with acceptable latency.
- Confirm retention covers your realistic investigation window.
- Confirm the on-call engineer knows how to query them at 3am.
The short version
Logs you can’t query are just disk usage. Capture the events that reconstruct an attacker’s path — auth, authz changes, sensitive access, config changes — not everything. Use auditd on hosts and full-coverage cloud trails, then ship logs off-host into append-only storage so a compromised credential can’t erase the evidence. Build a small, high-signal set of detection rules, and use AI as a tireless analyst to surface anomalies and propose read-only confirmations, with a human making every containment call. Then drill the investigation so you find the gaps before an adversary does.
AI-generated threat analysis is assistive, not authoritative. Always verify suspected indicators with your own read-only checks before taking any containment action.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.