Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for DevOps Security & Hardening By James Joyner IV · · 9 min read

Security Error Guide: 'audit: backlog limit exceeded' Audit Event Loss

Fix auditd 'backlog limit exceeded' and lost audit events: diagnose kernel queue overflow, too-broad rules, slow disk, and tune backlog_limit, rate_limit, and failure mode safely.

  • #security-hardening
  • #troubleshooting
  • #errors
  • #auditd

Exact Error Message

When the Linux audit subsystem cannot drain events fast enough, the kernel reports a dropped backlog in the system log:

kernel: audit: backlog limit exceeded
kernel: audit: audit_backlog=8193 > audit_backlog_limit=8192
kernel: audit: backlog limit exceeded, dropping events

auditctl -s surfaces the same condition through the lost and backlog counters:

enabled 1
failure 1
backlog 8192
backlog_limit 8192
lost 14237

What the Error Means

The kernel audit framework holds pending audit records in an in-kernel queue (the backlog) until the auditd daemon reads and writes them to disk. backlog_limit caps that queue. When events arrive faster than auditd drains them and the queue fills, the kernel either blocks the originating process or drops events, depending on the failure mode. “backlog limit exceeded” and a rising lost counter mean audit records are being discarded — a compliance and forensics gap, not a crash.

This is a tuning problem that appears after hardening adds broad audit rules (for example watching every syscall or a busy directory). The fix is to right-size the rules and the backlog, speed up the drain, and choose a failure mode that matches your compliance posture — without simply turning auditing off.

Common Causes

  • Over-broad audit rules. A rule watches a hot path (every execve, all of /etc, or a high-traffic syscall) and generates more events than auditd can write.
  • backlog_limit too small. The default queue is too shallow for the event rate during bursts (boot, package installs, scans).
  • Slow audit log disk. /var/log/audit is on slow or contended storage, so auditd cannot drain the queue fast enough.
  • No rate_limit set. Without a per-second cap, a runaway process floods the queue.
  • auditd paused or backed up. Log rotation, max_log_file_action, or a stuck dispatcher stalls the reader.
  • Failure mode mismatch. failure 1 (printk) silently drops events; failure 2 (panic) is too strict for the workload.

How to Reproduce the Error

On a disposable test host, set a tiny backlog and an over-broad rule, then generate load:

sudo auditctl -b 64
sudo auditctl -a always,exit -F arch=b64 -S execve
# generate a burst of processes
for i in $(seq 1 2000); do /bin/true; done
sudo auditctl -s
backlog 64
backlog_limit 64
lost 1841

The lost counter climbing confirms events are being dropped because the queue overflowed under load.

Diagnostic Commands

These read-only commands show the queue state, the rules causing the load, and where events are going.

# Current audit status: backlog, backlog_limit, lost, rate_limit, failure
sudo auditctl -s

# List all loaded audit rules (find the over-broad ones)
sudo auditctl -l

# Which rules/keys are generating the most events?
sudo aureport --summary -i 2>/dev/null
sudo aureport --syscall --summary -i 2>/dev/null | head

# Search recent audit records to see the volume and source
sudo ausearch --start recent --raw 2>/dev/null | wc -l

# Kernel backlog messages
sudo journalctl -k --since '30 min ago' | grep -i 'audit.*backlog'

# Confirm auditd is running and reading the queue
sudo systemctl status auditd --no-pager

auditctl -s plus aureport --syscall --summary together tell you whether to raise the backlog, throttle the rate, or narrow a rule.

Step-by-Step Resolution

  1. Read the queue state with sudo auditctl -s. A lost value that climbs while backlog sits at backlog_limit confirms overflow. Note the current failure mode.

  2. Find the noisiest rule. Use aureport --syscall --summary and auditctl -l to identify the rule generating the bulk of events. A rule watching every execve or all of /etc is the usual culprit.

  3. Narrow the rule to the assets that matter for your control objective — a specific file, a specific syscall with field filters — rather than a blanket watch. Scope reduces volume at the source, which is the most durable fix.

  4. Raise the backlog and add a rate limit to absorb legitimate bursts. In /etc/audit/rules.d/audit.rules:

    -b 32768
    -r 500

    Then reload: sudo augenrules --load (or sudo service auditd reload).

  5. Speed up the drain if disk is the bottleneck: put /var/log/audit on faster storage, and review flush/freq in /etc/audit/auditd.conf so writes are not stalling.

  6. Choose a failure mode that matches compliance. For environments that must never lose audit events, -f 2 (panic) or failure 2 enforces no-loss but will halt the host on overflow; most use -f 1 with a sized backlog and alerting. Re-run auditctl -s and confirm lost stops climbing.

Prevention and Best Practices

  • Scope audit rules to specific files, syscalls, and field filters; avoid blanket watches on hot paths.
  • Size backlog_limit (-b) and rate_limit (-r) for your peak event rate, not steady state, so boots and scans do not overflow.
  • Keep /var/log/audit on dedicated, fast storage so auditd can drain the queue quickly.
  • Alert on a rising lost counter from auditctl -s so dropped events surface before an audit, not during one.
  • Decide the failure mode deliberately against your compliance requirement and document the trade-off.
  • audit: rate limit exceeded — the per-second -r cap was hit, throttling events.
  • audit_log_start: ... lost — the kernel reporting dropped records, the same overflow symptom.
  • auditd: failure to write audit log — disk-full or permissions on /var/log/audit, covered in the security hardening guides.
  • No space left on device on the audit partition — max_log_file_action/rotation tuning.

Frequently Asked Questions

Does “lost” mean my audit logs are useless? No, but every lost event is a gap. For compliance you should drive lost to zero by sizing the backlog and narrowing rules.

Should I just set failure 2 (panic) to guarantee no loss? Only if your policy truly requires it. Panic halts the host on overflow, which can cause an outage. Most environments size the backlog and alert on lost instead.

Why did this start after I added hardening rules? A new broad rule (for example auditing all execve or all of /etc) increased the event rate beyond what auditd could drain. Narrow it.

Where do I make changes persistent? Add -b, -r, and -f directives to files under /etc/audit/rules.d/ and load them with augenrules --load; runtime auditctl changes do not survive reboot.

How do I confirm the fix worked? Run sudo auditctl -s over time; lost should stop incrementing while backlog stays well below backlog_limit.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.