Grafana Error Guide: 'too many open files' — File Descriptor & ulimit Limits
Fix Grafana 'too many open files' errors — raise the file-descriptor ulimit via systemd LimitNOFILE or container limits, find FD leaks, and tune connections so Grafana stops running out.
- #grafana
- #troubleshooting
- #errors
- #ulimit
Overview
Every open network socket, log file, database handle, and plugin file counts against the Grafana process’s file-descriptor (FD) limit. When Grafana exceeds the nofile ulimit, the kernel refuses new descriptors and operations that need one — accepting HTTP connections, opening data-source connections, writing logs — fail with too many open files. Under load this looks like Grafana intermittently refusing requests or losing its data sources.
The literal errors you will see:
logger=context error="accept tcp [::]:3000: accept4: too many open files"
dial tcp 10.0.0.20:9090: socket: too many open files
sqlite: unable to open database file: too many open files
It occurs under connection load or after a slow FD leak: many concurrent users, many data-source queries, or per-request connections that aren’t being closed.
Symptoms
- Grafana intermittently returns 502/errors; log shows
accept4: too many open files. - Data-source queries fail with
socket: too many open files. - The process’s open-FD count sits at or near its limit.
- Restarting Grafana fixes it temporarily, then it degrades again (leak).
PID=$(pgrep -x grafana-server); ls /proc/$PID/fd | wc -l
cat /proc/$PID/limits | grep -i "open files"
1021
Max open files 1024 1024 files
Common Root Causes
1. Default ulimit too low for the workload
A distro default of 1024 open files is easily exhausted by a busy Grafana with many users and data sources.
cat /proc/$(pgrep -x grafana-server)/limits | grep 'open files'
2. systemd unit not raising LimitNOFILE
The service must set LimitNOFILE; a shell-level ulimit doesn’t apply to a systemd-managed process.
systemctl show grafana-server -p LimitNOFILE
LimitNOFILE=1024
3. Container runtime FD limit
In Kubernetes/Docker the container’s nofile limit applies, independent of the host.
kubectl -n monitoring exec deploy/grafana -- sh -c 'cat /proc/1/limits | grep "open files"'
4. A genuine FD leak
A misbehaving plugin, a data source that opens per-query connections without closing, or the image renderer can leak descriptors so the count climbs steadily until exhaustion.
5. Excessive concurrent connections / keep-alive
Very high concurrency (dashboards with many live panels, alerting fan-out) opens many sockets at once.
Diagnostic Workflow
Step 1: Measure current FD usage vs. the limit
PID=$(pgrep -x grafana-server)
echo "open: $(ls /proc/$PID/fd | wc -l)"
grep 'open files' /proc/$PID/limits
Step 2: See what the FDs are
sudo ls -l /proc/$PID/fd | awk '{print $NF}' | sed 's/[0-9]*$//' | sort | uniq -c | sort -rn | head
sudo lsof -p $PID 2>/dev/null | awk '{print $5}' | sort | uniq -c | sort -rn | head
A large, growing count of socket: or IP entries points at connection leakage.
Step 3: Check whether it’s a leak (trend over time)
for i in 1 2 3; do ls /proc/$PID/fd | wc -l; sleep 30; done
A monotonically climbing count under steady load indicates a leak, not just a low limit.
Step 4: Raise the systemd limit
# /etc/systemd/system/grafana-server.service.d/override.conf
[Service]
LimitNOFILE=65536
systemctl daemon-reload
systemctl restart grafana-server
systemctl show grafana-server -p LimitNOFILE
Step 5: Raise the container limit (Kubernetes/Docker)
# Docker Compose
services:
grafana:
ulimits:
nofile:
soft: 65536
hard: 65536
For Kubernetes, set the node/container nofile via the runtime or a securityContext/sysctl per your platform, then verify inside the pod.
Example Root Cause Analysis
An on-call sees Grafana returning intermittent 502s during business hours. The log:
logger=context error="accept tcp [::]:3000: accept4: too many open files"
FD usage sits at the ceiling:
PID=$(pgrep -x grafana-server); ls /proc/$PID/fd | wc -l; grep 'open files' /proc/$PID/limits
1024
Max open files 1024 1024 files
lsof shows ~900 sockets to a Prometheus data source — the org grew and many users now keep heavy dashboards open. It’s not a leak (count is stable at the limit under load), just an outgrown default. systemctl show confirms LimitNOFILE=1024.
Fix: raise the systemd limit:
# /etc/systemd/system/grafana-server.service.d/override.conf
[Service]
LimitNOFILE=65536
systemctl daemon-reload && systemctl restart grafana-server
FD usage now peaks around 2–3k with plenty of headroom and the 502s stop. Root cause: the default 1024 ulimit was too low for the grown connection load — a limit increase, not a leak fix.
Prevention Best Practices
- Set
LimitNOFILEexplicitly (e.g.65536) in the systemd unit / container ulimits; don’t rely on distro defaults. - Monitor open FDs vs. the limit and alert well before exhaustion (e.g. at 80%).
- Distinguish leak from load: a steady climb under constant traffic is a leak — capture
lsofand check plugin/renderer/data-source versions. - Keep plugins and the image renderer up to date; leaks are often fixed upstream.
- Use connection pooling settings on SQL data sources to bound concurrent connections.
- See more Grafana guides and the sibling OOMKilled guide.
Quick Command Reference
# Current usage vs limit
PID=$(pgrep -x grafana-server); ls /proc/$PID/fd | wc -l
grep 'open files' /proc/$PID/limits
systemctl show grafana-server -p LimitNOFILE
# What are the FDs?
sudo lsof -p $PID | awk '{print $5}' | sort | uniq -c | sort -rn | head
# Leak check (trend)
for i in 1 2 3; do ls /proc/$PID/fd | wc -l; sleep 30; done
# Raise systemd limit
# /etc/systemd/system/grafana-server.service.d/override.conf
# [Service]
# LimitNOFILE=65536
systemctl daemon-reload && systemctl restart grafana-server
# In-container check
kubectl -n monitoring exec deploy/grafana -- sh -c 'cat /proc/1/limits | grep "open files"'
Conclusion
too many open files means Grafana hit its file-descriptor ceiling and the kernel is refusing new sockets/handles. Typical root causes:
- A default
nofileulimit (often1024) too low for the workload. - The systemd unit not setting
LimitNOFILE(shellulimitdoesn’t apply). - A container runtime
nofilelimit in Kubernetes/Docker. - A genuine FD leak from a plugin, data source, or the renderer.
- Very high concurrent connection load.
Measure open FDs vs. the limit and check the trend first — flat-at-limit under load means raise LimitNOFILE; a steady climb means hunt the leak.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.