Prometheus Error Guide: 'lock DB directory: resource temporarily unavailable' Startup Failure
Fix Prometheus 'lock DB directory: resource temporarily unavailable' at startup: find and stop the second process holding the TSDB lock file before restarting.
- #prometheus-monitoring
- #troubleshooting
- #errors
- #tsdb
Exact Error Message
lock DB directory: resource temporarily unavailable is a startup-time failure. Prometheus logs it while opening its TSDB and then exits immediately — it never reaches the point of serving on port 9090:
ts=2026-06-27T09:12:44.501Z caller=main.go:1186 level=info msg="Starting TSDB ..."
ts=2026-06-27T09:12:44.512Z caller=main.go:1213 level=error msg="Error opening storage" err="opening storage failed: lock DB directory: resource temporarily unavailable"
ts=2026-06-27T09:12:44.513Z caller=main.go:1043 level=error err="opening storage failed: lock DB directory: resource temporarily unavailable"
The underlying string is the EAGAIN/EWOULDBLOCK errno text — “resource temporarily unavailable” — returned from a non-blocking flock() on the TSDB lock file. That lock file is literally named lock and lives at the root of the storage path:
/var/lib/prometheus/data/lock
If you see this, the process did not start. There is no partial degradation; the binary fails closed and systemd typically logs it as code=exited, status=1/FAILURE.
What the Error Means
On startup Prometheus opens its data directory and takes an exclusive advisory lock on the lock file using flock(LOCK_EX | LOCK_NB) — a non-blocking exclusive lock. The lock guarantees that exactly one process owns the TSDB at a time, because two writers appending to the same head block and WAL would corrupt the database.
When the lock is already held by another process, the non-blocking call returns EAGAIN, which surfaces as “resource temporarily unavailable.” Prometheus does not wait or retry; it reports opening storage failed and exits.
The critical thing to understand: flock locks are tied to the process, not the file. When a process exits — cleanly or via a crash, a kill -9, or an OOM — the kernel releases its locks automatically. So a lock file left sitting on disk after a crash is not what blocks startup. A truly orphaned lock file is harmless; Prometheus will re-acquire it on the next boot. If you are hitting this error, it is almost always because another live process is still holding the lock right now — most commonly a second or not-yet-dead Prometheus.
This is distinct from WAL corruption, which also fails under “opening storage failed” but is about damaged write-ahead-log segments rather than lock contention.
Common Causes
- Two Prometheus processes pointing at the same
--storage.tsdb.path. The classic case: a duplicate unit, a manual./prometheuslaunched alongside the service, or two containers mounting the same host directory. - An old process that did not fully stop during a restart. systemd thinks the service stopped, but the old PID is still draining (slow shutdown, blocked on fsync) and still holds the lock when the new process starts.
- A crashed process that is actually still alive. A wedged or zombie-parent process that never released the descriptor — rare, but it keeps the lock until reaped.
- Running Prometheus plus a manual
promtool/tsdboperation on the same dir.promtool tsdbcommands and snapshot/backfill tooling can open the same data directory; doing so while the server runs collides on the lock. - systemd restart racing the old PID. A
Restart=alwayspolicy or a fastsystemctl restartcan launch the new process before the kernel has torn down the old one’s file locks. - NFS or shared-volume lock semantics. On NFS (especially older
nfswithout properlockd/flocksupport) advisory locks behave inconsistently, so a lock may appear held when it isn’t, or vice versa. - Containers sharing a host path. Two pods/containers bind-mounting the same
/var/lib/prometheus/datawill fight over the singlelockfile.
How to Reproduce the Error
Start a second Prometheus pointed at a data directory that is already in use:
# First instance is already running against /var/lib/prometheus/data
prometheus --config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/data \
--web.listen-address=:9091
level=error caller=main.go:1213 msg="Error opening storage" err="opening storage failed: lock DB directory: resource temporarily unavailable"
Note that the second instance used a different listen port (:9091) and still failed — the conflict is over the data directory lock, not the network port. This is what trips people up: they change the port, the error stays, because the contended resource is the filesystem, not the socket.
Diagnostic Commands
Every command here is a read-only inspection. None of them modify the lock file or the database.
List every Prometheus process — you are looking for more than one, or for an old PID that should be gone:
ps aux | grep '[p]rometheus'
Find exactly which process holds the lock file (this is the definitive answer):
lsof /var/lib/prometheus/data/lock
# or, if lsof is unavailable:
fuser -v /var/lib/prometheus/data/lock
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
prometheus 4821 prometheus 8uW REG 259,1 0 131074 /var/lib/prometheus/data/lock
The W in the FD column confirms a write lock is held by PID 4821. Check what is bound to the metrics port:
ss -ltnp | grep 9090
Inspect the service state and recent logs:
systemctl status prometheus
journalctl -u prometheus -n 50 --no-pager
Confirm the lock file exists and who owns it (its presence alone is not the problem):
ls -la /var/lib/prometheus/data/lock
Step-by-Step Resolution
1. Identify the holder before touching anything. Run lsof /var/lib/prometheus/data/lock (or fuser). The PID it returns is the real cause. If it returns nothing, the file is orphaned and is not blocking you — your failure is something else (check for WAL corruption or a wrong path).
2. Stop the duplicate or stale process cleanly. If ps/lsof shows a second Prometheus (or an old PID from a botched restart), stop it gracefully so it can flush and release the lock:
sudo systemctl stop prometheus
# then confirm nothing is left:
ps aux | grep '[p]rometheus'
If a manually-launched instance is the culprit, send it SIGTERM and let it shut down:
sudo kill -TERM 4821 # graceful; lets it fsync and release the flock
Avoid kill -9 unless the process is genuinely wedged — SIGTERM lets Prometheus close the head block cleanly. Either way, the kernel releases the lock the instant the process dies.
3. Eliminate the duplicate launcher. If two unit files or a stray container both target the same path, remove or reconfigure one. Two Prometheus servers that must coexist need separate --storage.tsdb.path directories — never a shared one.
4. Fix restart races. If a fast systemctl restart keeps racing the old PID, add a short TimeoutStopSec and ensure ExecStop waits, or insert a brief sleep in a wrapper so the new process starts only after the old descriptor is gone. Do not paper over this with retries.
5. Start Prometheus and confirm. Once the holder is gone, start the service; it will re-acquire the lock with no manual cleanup:
sudo systemctl start prometheus
journalctl -u prometheus -n 20 --no-pager | grep -i 'server is ready'
You should not need to rm the lock file. Deleting it while a process holds the lock does nothing useful (the lock lives on the open descriptor, not the path), and deleting it when no process holds it is unnecessary.
The --storage.tsdb.no-lockfile tradeoff. This flag disables the lock entirely. It exists for edge cases like NFS where flock is unreliable, but it is dangerous: with no lock, nothing stops two processes from opening the same TSDB and corrupting it. Only use it if you have an external guarantee of single-writer (e.g. a Kubernetes StatefulSet with one replica and ReadWriteOnce), and never as a quick fix to silence this error. The right fix is to stop the second process.
Prevention and Best Practices
- One data directory per Prometheus, always. Treat
--storage.tsdb.pathas exclusive. If you run multiple instances on a host, give each its own directory. - Never run
promtool tsdbagainst a live server’s directory. Snapshot first (/api/v1/admin/tsdb/snapshot) and operate on the copy. - Set sane systemd stop timeouts so a restart fully tears down the old process before relaunching, avoiding PID races.
- Avoid NFS for the TSDB. If you must, validate
flockworks on that mount, or run a single replica with a block-device PVC instead. - In Kubernetes, use a single-replica StatefulSet with
ReadWriteOncestorage so two pods can never mount the same volume. - Leave the lock enabled. It is a corruption guard, not a nuisance. Treat the error as a signal that you have two writers, which is the real problem to fix.
Related Errors
opening storage failed: ... WAL— also fails under “opening storage failed,” but the cause is damaged write-ahead-log segments after an unclean shutdown, not lock contention. Different fix (repair/truncate the WAL).too many open files— a separate startup/runtime failure where the TSDB cannot open enough file descriptors; raise theLimitNOFILE/ulimit rather than touching the lock.resource temporarily unavailablein other contexts — this errno (EAGAIN) also appears for socket and fork limits; here it specifically means theflockonlockwas already held.
Frequently Asked Questions
Should I just delete the lock file?
No. The lock is held on an open file descriptor, not on the filename. Deleting it while a process holds the lock does not free anything, and if no process holds it the file is harmless. Find and stop the real holder with lsof instead.
The old process crashed — why is the lock still stuck?
It almost certainly isn’t. flock locks are released by the kernel when a process exits, including on a crash or kill -9. If startup still fails, run lsof /var/lib/prometheus/data/lock; you will usually find a live process you didn’t know about (a slow-draining old instance, a duplicate unit, or a promtool job).
I changed the listen port and still get the error. Why?
Because the conflict is over the data directory’s lock file, not the network port. Two Prometheus instances on different ports still collide if they share --storage.tsdb.path. Give each its own data directory.
Is --storage.tsdb.no-lockfile safe?
Only when something else guarantees a single writer (e.g. a one-replica StatefulSet on ReadWriteOnce storage). Without that guarantee it invites two processes to corrupt the same TSDB. Do not use it to silence this error — stop the duplicate process.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.