Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Grafana By James Joyner IV · · 9 min read

Grafana Error Guide: 'too many open files' — File Descriptor & ulimit Limits

Fix Grafana 'too many open files' errors — raise the file-descriptor ulimit via systemd LimitNOFILE or container limits, find FD leaks, and tune connections so Grafana stops running out.

  • #grafana
  • #troubleshooting
  • #errors
  • #ulimit

Overview

Every open network socket, log file, database handle, and plugin file counts against the Grafana process’s file-descriptor (FD) limit. When Grafana exceeds the nofile ulimit, the kernel refuses new descriptors and operations that need one — accepting HTTP connections, opening data-source connections, writing logs — fail with too many open files. Under load this looks like Grafana intermittently refusing requests or losing its data sources.

The literal errors you will see:

logger=context error="accept tcp [::]:3000: accept4: too many open files"
dial tcp 10.0.0.20:9090: socket: too many open files
sqlite: unable to open database file: too many open files

It occurs under connection load or after a slow FD leak: many concurrent users, many data-source queries, or per-request connections that aren’t being closed.

Symptoms

  • Grafana intermittently returns 502/errors; log shows accept4: too many open files.
  • Data-source queries fail with socket: too many open files.
  • The process’s open-FD count sits at or near its limit.
  • Restarting Grafana fixes it temporarily, then it degrades again (leak).
PID=$(pgrep -x grafana-server); ls /proc/$PID/fd | wc -l
cat /proc/$PID/limits | grep -i "open files"
1021
Max open files            1024                 1024                 files

Common Root Causes

1. Default ulimit too low for the workload

A distro default of 1024 open files is easily exhausted by a busy Grafana with many users and data sources.

cat /proc/$(pgrep -x grafana-server)/limits | grep 'open files'

2. systemd unit not raising LimitNOFILE

The service must set LimitNOFILE; a shell-level ulimit doesn’t apply to a systemd-managed process.

systemctl show grafana-server -p LimitNOFILE
LimitNOFILE=1024

3. Container runtime FD limit

In Kubernetes/Docker the container’s nofile limit applies, independent of the host.

kubectl -n monitoring exec deploy/grafana -- sh -c 'cat /proc/1/limits | grep "open files"'

4. A genuine FD leak

A misbehaving plugin, a data source that opens per-query connections without closing, or the image renderer can leak descriptors so the count climbs steadily until exhaustion.

5. Excessive concurrent connections / keep-alive

Very high concurrency (dashboards with many live panels, alerting fan-out) opens many sockets at once.

Diagnostic Workflow

Step 1: Measure current FD usage vs. the limit

PID=$(pgrep -x grafana-server)
echo "open: $(ls /proc/$PID/fd | wc -l)"
grep 'open files' /proc/$PID/limits

Step 2: See what the FDs are

sudo ls -l /proc/$PID/fd | awk '{print $NF}' | sed 's/[0-9]*$//' | sort | uniq -c | sort -rn | head
sudo lsof -p $PID 2>/dev/null | awk '{print $5}' | sort | uniq -c | sort -rn | head

A large, growing count of socket: or IP entries points at connection leakage.

Step 3: Check whether it’s a leak (trend over time)

for i in 1 2 3; do ls /proc/$PID/fd | wc -l; sleep 30; done

A monotonically climbing count under steady load indicates a leak, not just a low limit.

Step 4: Raise the systemd limit

# /etc/systemd/system/grafana-server.service.d/override.conf
[Service]
LimitNOFILE=65536
systemctl daemon-reload
systemctl restart grafana-server
systemctl show grafana-server -p LimitNOFILE

Step 5: Raise the container limit (Kubernetes/Docker)

# Docker Compose
services:
  grafana:
    ulimits:
      nofile:
        soft: 65536
        hard: 65536

For Kubernetes, set the node/container nofile via the runtime or a securityContext/sysctl per your platform, then verify inside the pod.

Example Root Cause Analysis

An on-call sees Grafana returning intermittent 502s during business hours. The log:

logger=context error="accept tcp [::]:3000: accept4: too many open files"

FD usage sits at the ceiling:

PID=$(pgrep -x grafana-server); ls /proc/$PID/fd | wc -l; grep 'open files' /proc/$PID/limits
1024
Max open files            1024                 1024                 files

lsof shows ~900 sockets to a Prometheus data source — the org grew and many users now keep heavy dashboards open. It’s not a leak (count is stable at the limit under load), just an outgrown default. systemctl show confirms LimitNOFILE=1024.

Fix: raise the systemd limit:

# /etc/systemd/system/grafana-server.service.d/override.conf
[Service]
LimitNOFILE=65536
systemctl daemon-reload && systemctl restart grafana-server

FD usage now peaks around 2–3k with plenty of headroom and the 502s stop. Root cause: the default 1024 ulimit was too low for the grown connection load — a limit increase, not a leak fix.

Prevention Best Practices

  • Set LimitNOFILE explicitly (e.g. 65536) in the systemd unit / container ulimits; don’t rely on distro defaults.
  • Monitor open FDs vs. the limit and alert well before exhaustion (e.g. at 80%).
  • Distinguish leak from load: a steady climb under constant traffic is a leak — capture lsof and check plugin/renderer/data-source versions.
  • Keep plugins and the image renderer up to date; leaks are often fixed upstream.
  • Use connection pooling settings on SQL data sources to bound concurrent connections.
  • See more Grafana guides and the sibling OOMKilled guide.

Quick Command Reference

# Current usage vs limit
PID=$(pgrep -x grafana-server); ls /proc/$PID/fd | wc -l
grep 'open files' /proc/$PID/limits
systemctl show grafana-server -p LimitNOFILE

# What are the FDs?
sudo lsof -p $PID | awk '{print $5}' | sort | uniq -c | sort -rn | head

# Leak check (trend)
for i in 1 2 3; do ls /proc/$PID/fd | wc -l; sleep 30; done

# Raise systemd limit
#  /etc/systemd/system/grafana-server.service.d/override.conf
#  [Service]
#  LimitNOFILE=65536
systemctl daemon-reload && systemctl restart grafana-server

# In-container check
kubectl -n monitoring exec deploy/grafana -- sh -c 'cat /proc/1/limits | grep "open files"'

Conclusion

too many open files means Grafana hit its file-descriptor ceiling and the kernel is refusing new sockets/handles. Typical root causes:

  1. A default nofile ulimit (often 1024) too low for the workload.
  2. The systemd unit not setting LimitNOFILE (shell ulimit doesn’t apply).
  3. A container runtime nofile limit in Kubernetes/Docker.
  4. A genuine FD leak from a plugin, data source, or the renderer.
  5. Very high concurrent connection load.

Measure open FDs vs. the limit and check the trend first — flat-at-limit under load means raise LimitNOFILE; a steady climb means hunt the leak.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.