Security Error Guide: 'audit: backlog limit exceeded' Audit Event Loss
Fix auditd 'backlog limit exceeded' and lost audit events: diagnose kernel queue overflow, too-broad rules, slow disk, and tune backlog_limit, rate_limit, and failure mode safely.
- #security-hardening
- #troubleshooting
- #errors
- #auditd
Exact Error Message
When the Linux audit subsystem cannot drain events fast enough, the kernel reports a dropped backlog in the system log:
kernel: audit: backlog limit exceeded
kernel: audit: audit_backlog=8193 > audit_backlog_limit=8192
kernel: audit: backlog limit exceeded, dropping events
auditctl -s surfaces the same condition through the lost and backlog counters:
enabled 1
failure 1
backlog 8192
backlog_limit 8192
lost 14237
What the Error Means
The kernel audit framework holds pending audit records in an in-kernel queue (the backlog) until the auditd daemon reads and writes them to disk. backlog_limit caps that queue. When events arrive faster than auditd drains them and the queue fills, the kernel either blocks the originating process or drops events, depending on the failure mode. “backlog limit exceeded” and a rising lost counter mean audit records are being discarded — a compliance and forensics gap, not a crash.
This is a tuning problem that appears after hardening adds broad audit rules (for example watching every syscall or a busy directory). The fix is to right-size the rules and the backlog, speed up the drain, and choose a failure mode that matches your compliance posture — without simply turning auditing off.
Common Causes
- Over-broad audit rules. A rule watches a hot path (every
execve, all of/etc, or a high-traffic syscall) and generates more events thanauditdcan write. backlog_limittoo small. The default queue is too shallow for the event rate during bursts (boot, package installs, scans).- Slow audit log disk.
/var/log/auditis on slow or contended storage, soauditdcannot drain the queue fast enough. - No
rate_limitset. Without a per-second cap, a runaway process floods the queue. auditdpaused or backed up. Log rotation,max_log_file_action, or a stuck dispatcher stalls the reader.- Failure mode mismatch.
failure 1(printk) silently drops events;failure 2(panic) is too strict for the workload.
How to Reproduce the Error
On a disposable test host, set a tiny backlog and an over-broad rule, then generate load:
sudo auditctl -b 64
sudo auditctl -a always,exit -F arch=b64 -S execve
# generate a burst of processes
for i in $(seq 1 2000); do /bin/true; done
sudo auditctl -s
backlog 64
backlog_limit 64
lost 1841
The lost counter climbing confirms events are being dropped because the queue overflowed under load.
Diagnostic Commands
These read-only commands show the queue state, the rules causing the load, and where events are going.
# Current audit status: backlog, backlog_limit, lost, rate_limit, failure
sudo auditctl -s
# List all loaded audit rules (find the over-broad ones)
sudo auditctl -l
# Which rules/keys are generating the most events?
sudo aureport --summary -i 2>/dev/null
sudo aureport --syscall --summary -i 2>/dev/null | head
# Search recent audit records to see the volume and source
sudo ausearch --start recent --raw 2>/dev/null | wc -l
# Kernel backlog messages
sudo journalctl -k --since '30 min ago' | grep -i 'audit.*backlog'
# Confirm auditd is running and reading the queue
sudo systemctl status auditd --no-pager
auditctl -s plus aureport --syscall --summary together tell you whether to raise the backlog, throttle the rate, or narrow a rule.
Step-by-Step Resolution
-
Read the queue state with
sudo auditctl -s. Alostvalue that climbs whilebacklogsits atbacklog_limitconfirms overflow. Note the currentfailuremode. -
Find the noisiest rule. Use
aureport --syscall --summaryandauditctl -lto identify the rule generating the bulk of events. A rule watching everyexecveor all of/etcis the usual culprit. -
Narrow the rule to the assets that matter for your control objective — a specific file, a specific syscall with field filters — rather than a blanket watch. Scope reduces volume at the source, which is the most durable fix.
-
Raise the backlog and add a rate limit to absorb legitimate bursts. In
/etc/audit/rules.d/audit.rules:-b 32768 -r 500Then reload:
sudo augenrules --load(orsudo service auditd reload). -
Speed up the drain if disk is the bottleneck: put
/var/log/auditon faster storage, and reviewflush/freqin/etc/audit/auditd.confso writes are not stalling. -
Choose a failure mode that matches compliance. For environments that must never lose audit events,
-f 2(panic) orfailure 2enforces no-loss but will halt the host on overflow; most use-f 1with a sized backlog and alerting. Re-runauditctl -sand confirmloststops climbing.
Prevention and Best Practices
- Scope audit rules to specific files, syscalls, and field filters; avoid blanket watches on hot paths.
- Size
backlog_limit(-b) andrate_limit(-r) for your peak event rate, not steady state, so boots and scans do not overflow. - Keep
/var/log/auditon dedicated, fast storage soauditdcan drain the queue quickly. - Alert on a rising
lostcounter fromauditctl -sso dropped events surface before an audit, not during one. - Decide the
failuremode deliberately against your compliance requirement and document the trade-off.
Related Errors
audit: rate limit exceeded— the per-second-rcap was hit, throttling events.audit_log_start: ... lost— the kernel reporting dropped records, the same overflow symptom.auditd: failure to write audit log— disk-full or permissions on/var/log/audit, covered in the security hardening guides.No space left on deviceon the audit partition —max_log_file_action/rotation tuning.
Frequently Asked Questions
Does “lost” mean my audit logs are useless? No, but every lost event is a gap. For compliance you should drive lost to zero by sizing the backlog and narrowing rules.
Should I just set failure 2 (panic) to guarantee no loss? Only if your policy truly requires it. Panic halts the host on overflow, which can cause an outage. Most environments size the backlog and alert on lost instead.
Why did this start after I added hardening rules? A new broad rule (for example auditing all execve or all of /etc) increased the event rate beyond what auditd could drain. Narrow it.
Where do I make changes persistent? Add -b, -r, and -f directives to files under /etc/audit/rules.d/ and load them with augenrules --load; runtime auditctl changes do not survive reboot.
How do I confirm the fix worked? Run sudo auditctl -s over time; lost should stop incrementing while backlog stays well below backlog_limit.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.