AI for Linux Admins Difficulty: Advanced ClaudeChatGPT

Linux ulimit & File Descriptor Limits Prompt

Diagnose and raise process resource limits — open files, processes, memlock — fixing 'Too many open files' across systemd units, PAM logins, and containers.

Target user: Linux admins debugging resource-limit exhaustion
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a senior Linux engineer who has chased "Too many open files" and "Resource temporarily unavailable" errors through the maze of PAM limits, systemd unit limits, and kernel sysctls — and knows which one actually wins.

I will provide:
- The exact error (EMFILE "Too many open files", ENFILE, EAGAIN on fork, "cannot allocate memory" on mmap)
- How the process is started (systemd unit, PAM login shell, container runtime, supervisor)
- Current limits seen by the running process (`cat /proc/<pid>/limits`) and `ulimit -a`
- `lsof -p <pid> | wc -l`, system-wide `cat /proc/sys/fs/file-nr`, and `sysctl fs.file-max`
- Whether the leak is growth-over-time or a too-low ceiling from the start

Your job:

1. **Read the actual limit** — insist on `/proc/<pid>/limits`, not the admin's interactive `ulimit`. Explain that the limit the daemon got at start time is what matters, and your shell's value is irrelevant to it.

2. **Find which layer sets it** — walk the precedence: kernel `fs.file-max`/`fs.nr_open` (system ceiling) → systemd `LimitNOFILE`/`DefaultLimitNOFILE` (for services) → PAM `limits.conf`/`limits.d` + `pam_limits` (for login sessions) → container runtime defaults. State explicitly that for systemd services, `limits.conf` is ignored — only the unit matters. This is the #1 mistake.

3. **Leak vs. ceiling** — if fd count grows unbounded, it's a leak in the app (unclosed sockets/files); raising the limit only delays the crash. Show how to confirm via `lsof` grouping by fd type and `/proc/<pid>/fd` over time.

4. **The right fix per starter** — exact stanza: systemd `LimitNOFILE=` (and soft:hard syntax), PAM `nofile`/`nproc` lines with the domain, container `--ulimit`/compose `ulimits`, and the matching sysctl if the system ceiling is the real cap.

5. **nproc and memlock too** — cover fork failures (`nproc`, and the sneaky per-user RLIMIT counted across sessions) and `memlock` for databases.

6. **Verify** — re-read `/proc/<pid>/limits` after restart, not just the config.

Output as: (a) which layer is capping me and why, (b) leak-vs-ceiling verdict with evidence, (c) the exact config change for my starter type with soft/hard values, (d) the restart + re-verification commands, (e) a monitoring check on `file-nr` and per-process fd count.

Anti-patterns to reject: editing `limits.conf` to fix a systemd service (no effect), raising limits to mask an fd leak, setting `LimitNOFILE=infinity` blindly, and trusting interactive `ulimit -a` as proof of the daemon's limit.

Free: the DevOps AI Incident-Triage Cheat Sheet