Replacing setuid Root with Fine-Grained Linux Capabilities
Swap dangerous setuid root binaries for narrow Linux capabilities. Use setcap, getcap, getpcaps and systemd to grant only the privilege a process needs.
- #linux
- #capabilities
- #security
- #hardening
I used to wince every time I ran ls -l on /usr/bin and saw that little s bit lighting up in the permission column. Each one of those setuid root binaries is a tiny grenade with the pin half pulled: a single buffer overflow or path-injection bug and the attacker is not “the web user” anymore, they are root. For years I told myself it was fine because the kernel had no better answer. Then I actually sat down with Linux capabilities, the same week I started leaning on an AI assistant to draft and explain the trickier capsh invocations, and I realized I had been carrying that grenade around for no reason. This post is the walkthrough I wish I had been handed.
What setuid Actually Costs You
When a binary has the setuid bit set and is owned by root, it runs with the full root identity regardless of who launched it. ping is the classic example: it needs to open a raw socket, which historically required UID 0, so distributions shipped it setuid root. The problem is that opening a raw socket is one specific power, and setuid hands over all of them: mounting filesystems, loading kernel modules, overwriting any file on disk, killing any process. You wanted a teaspoon of privilege and the kernel gave you the whole ocean.
Capabilities split that monolithic root power into roughly forty discrete units. Instead of “you are root or you are not,” you can say “this process may bind low ports and nothing else.” The relevant ones for day-to-day admin work are CAP_NET_BIND_SERVICE (bind to ports below 1024), CAP_NET_RAW (open raw and packet sockets, which is what ping needs), CAP_SYS_TIME (set the system clock), CAP_DAC_OVERRIDE (bypass file permission checks), and CAP_CHOWN. The full list lives in man 7 capabilities, which is worth reading start to finish at least once.
Inspecting What a Binary Already Carries
Before changing anything, look at the current state with getcap:
getcap /usr/bin/ping
# /usr/bin/ping cap_net_raw=ep
On a modern distribution ping is no longer setuid at all; it carries exactly the one capability it needs. That ep suffix is the encoding of which capability sets the bits land in, which we will decode in a moment. To scan a whole tree:
getcap -r /usr/bin 2>/dev/null
This is also a fast audit trick: anything in here with cap_sys_admin or cap_dac_override deserves a second look, because those are nearly as dangerous as full root.
The Four (and a Half) Capability Sets
Capabilities live in several sets per process, and the distinction matters once you start granting them. The permitted set is what a process may request. The effective set is what is currently active for permission checks. The inheritable set is what passes across an execve() to a new program. The ambient set, added in kernel 4.3, is the modern bridge that lets inheritable capabilities actually take effect without the receiving binary being specially marked. There is also the bounding set, which is a ceiling: a process can never gain a capability that is not in its bounding set, even via setuid.
On a file, setcap writes the permitted, inheritable, and effective bits. The letters map directly: p is permitted, i is inheritable, e is effective. So cap_net_raw=ep means “permitted and effective.”
Pro Tip: If you only remember one rule, remember this — a capability has to be both permitted and effective to do anything. The +ep you see in nearly every example is what makes the grant actually fire at exec time.
Granting a Capability to a Binary
Say you compiled your own network tool and it needs to bind to port 443 without running as root. You grant the narrow capability with setcap:
sudo setcap 'cap_net_bind_service=+ep' /usr/local/bin/mywebd
getcap /usr/local/bin/mywebd
# /usr/local/bin/mywebd cap_net_bind_service=ep
Now any user can run mywebd, it can bind to 80 or 443, and it has zero other elevated powers. If someone exploits it, the worst they get is the ability to bind low ports as that unprivileged user. To strip the capability later:
sudo setcap -r /usr/local/bin/mywebd
One caveat: file capabilities are stored as extended attributes, so they do not survive a cp that drops xattrs, and they are ignored on filesystems mounted nosuid. Your deploy pipeline needs to re-apply setcap after copying the binary into place, which is exactly the kind of repetitive, easy-to-forget step I now hand to an AI assistant to script for me.
Reading Capabilities on a Running Process
For a live process, getpcaps shows the active sets by PID:
getpcaps 1 # systemd / pid 1
# Capabilities for `1': = cap_chown,cap_dac_override,...+ep
You can also read the raw bitmasks straight from /proc:
grep Cap /proc/$(pgrep -n mywebd)/status
# CapInh: 0000000000000000
# CapPrm: 0000000000000400
# CapEff: 0000000000000400
# CapBnd: 000001ffffffffff
# CapAmb: 0000000000000000
Those hex masks are unreadable on their own, so decode them with capsh:
capsh --decode=0000000000000400
# 0x0000000000000400=cap_net_bind_service
Bit 10 (0x400) is CAP_NET_BIND_SERVICE, confirming the process holds exactly what we granted and nothing more. Cross-checking the bitmask against getcap like this is the kind of detail an AI assistant nails instantly and a tired human gets wrong at 2 a.m.
Capabilities for a systemd Service
Most production privilege does not live in a setuid binary anymore; it lives in a service. systemd grants capabilities declaratively, which is cleaner than setcap because there is no on-disk state to lose. In your unit file:
[Service]
ExecStart=/usr/local/bin/mywebd
User=mywebd
AmbientCapabilities=CAP_NET_BIND_SERVICE
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
NoNewPrivileges=true
AmbientCapabilities= hands the running process the capability even though it launches as the unprivileged mywebd user. CapabilityBoundingSet= clamps the ceiling so the service can never acquire anything beyond that one capability, even if a child process tries something clever. Pair it with NoNewPrivileges=true and you have a service that can bind 443 and is otherwise as harmless as a cat process. Apply it with:
sudo systemctl daemon-reload
sudo systemctl restart mywebd
getpcaps $(systemctl show -p MainPID --value mywebd)
Pro Tip: Set CapabilityBoundingSet= to the exact list you grant, never wider. The bounding set is your blast-radius limit, and an empty bounding set on a service that needs no privilege at all is the strongest hardening you can ship.
Why This Beats setuid Root Every Time
Go back to ping. The setuid version ran as root the entire time it was parsing your command line, resolving DNS, and formatting output, with raw socket access being the only part that genuinely needed elevation. The capability version holds CAP_NET_RAW and nothing else for its whole lifetime. The attack surface shrinks from “all of root” to “one syscall family.” Multiply that across every privileged tool on the box and you have converted dozens of total-compromise grenades into dozens of contained, boring, single-purpose grants. That is the entire game of least privilege, and capabilities are how Linux finally lets you play it properly.
This is also where I want to be honest about how I work now. I treat my AI assistant the way I would treat a sharp junior engineer: fast, tireless, great at recalling that CAP_NET_RAW is bit 13 or drafting the systemd stanza above. But a junior does not get to merge to production unreviewed, and neither does the AI. I read every setcap line before it touches a host, I keep a human in the loop on anything that changes a privilege boundary, and I never, ever give the model production credentials. It drafts; I verify and execute. If you want a structured place to do that drafting and review, the prompt workspace and our code review dashboard are built for exactly this kind of careful, human-gated loop.
Conclusion
Capabilities are not new, but they are still wildly underused, mostly because the tooling looks intimidating until you have run getcap, setcap, getpcaps, and capsh --decode a few times. Spend an afternoon auditing your setuid binaries and migrating the ones you control to narrow grants or systemd AmbientCapabilities=, and you will permanently shrink your blast radius. Let an AI help you draft and decode the fiddly parts, keep yourself firmly in the review seat, and ship the result. For more in this vein, browse the Linux admins category, grab a ready-made prompt or a curated prompt pack, and if you live in the terminal, see how a capability-aware assistant fits into Warp or Claude.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.