Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Linux Admins By James Joyner IV · · 9 min read

Linux Error Guide: 'task blocked for more than 120 seconds' Hung Task Detector

Fix the khungtaskd 'task blocked for more than 120 seconds' warning: understand the uninterruptible D state, diagnose the I/O stall behind a hung task.

  • #linux-admins
  • #troubleshooting
  • #errors
  • #kernel

Exact Error Message

The message is emitted by the kernel’s hung task detector (the khungtaskd thread) and shows up in dmesg and the kernel ring buffer:

INFO: task kworker/3:1:142 blocked for more than 120 seconds.
      Tainted: G           OE     5.15.0-91-generic #101
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:jbd2/sda1-8     state:D stack:    0 pid:  301 ppid:     2 flags:0x00004000
Call Trace:
 <TASK>
 __schedule+0x2cd/0x890
 schedule+0x69/0x110
 io_schedule+0x46/0x80
 bit_wait_io+0x11/0x70
 __wait_on_bit+0x33/0xa0
 out_of_line_wait_on_bit+0x8e/0xb0
 do_get_write_access+0x288/0x3c0
 jbd2_journal_get_write_access+0x69/0x90
 submit_bio+0x...
 nfs_writepages+0x...
 </TASK>

You will often see the same warning re-printed every 120 seconds for the same task until the underlying I/O completes (or the task is killed). Multiple tasks frequently appear together because they are all waiting on the same stalled device.

What the Error Means

This is a warning, not a crash. The hung task detector wakes up periodically and scans every task on the system. If a task has been sitting in the uninterruptible sleep state (TASK_UNINTERRUPTIBLE, shown as D in ps) continuously for longer than kernel.hung_task_timeout_secs (default 120), the kernel logs the warning above.

A task in D state is blocked inside the kernel waiting for something that cannot be interrupted by a signal, almost always disk or network I/O. You cannot kill it with SIGKILL while it is genuinely in D state, because the kernel will not deliver the signal until the blocking operation returns. That is by design: the task is mid-syscall and interrupting it could corrupt in-flight data structures.

The key takeaway: the warning is pointing at the symptom (a task stuck waiting), not the cause. The named task (kworker, jbd2/sda1-8, nfsd, your application) is usually an innocent victim. The real problem is whatever it is waiting on: a slow disk, a hung NFS mount, an unresponsive iSCSI target, or a saturated storage backend.

In the example above, jbd2/sda1-8 is the journaling thread for the ext4 filesystem on sda1. If jbd2 is blocked in submit_bio, the block device behind sda1 is not completing writes. Because the journal thread is stuck, every other process that needs to write to that filesystem will eventually pile up behind it in D state too.

By default the kernel only warns. It does not panic or reboot unless you have explicitly set kernel.hung_task_panic = 1.

Common Causes

  • Failing or slow physical disk. A disk with reallocated sectors or a dying controller can take many seconds per I/O, easily blowing past 120 seconds under load. Check SMART data and dmesg for I/O error or medium errors.
  • Stalled NFS server or mount. A hard NFS mount whose server has gone away will block all I/O indefinitely. Tasks accessing the mount enter D state and never recover until the server returns. This is the single most common cause on virtualized and cloud fleets.
  • iSCSI / SAN path loss. Losing the path to an iSCSI target or multipath device stalls every I/O on that LUN. The block layer queues requests until the path recovers or the device times out.
  • Overloaded storage backend. Even healthy storage can stall under extreme load: a noisy neighbor on shared cloud disks, throttled IOPS/burst credits, or a backup job saturating the array. The disk is “fine” but latency spikes past the threshold.
  • Heavy memory pressure / swap thrash. When the system is swapping hard, page-in/page-out becomes the slow I/O that blocks tasks.
  • Filesystem journal contention. A single slow device serializes the journal thread (jbd2), which serializes everyone.

How to Reproduce the Error

You can safely trigger the detector in a throwaway VM by introducing artificial I/O latency. Do not do this on production. Using a hard NFS mount and then making the server unreachable is the classic reproduction:

# In a test VM only: mount a hard NFS share, run I/O against it,
# then block the NFS server with a firewall rule.
# Any process reading/writing the mount will go into D state and,
# after hung_task_timeout_secs, the warning fires in dmesg.

A cleaner lab method is dm-delay or a Linux device-mapper target that injects multi-second latency, but the NFS approach mirrors what real fleets actually hit.

Diagnostic Commands

All of the following are read-only and safe to run on a live system.

# See the warning with surrounding context (human-readable timestamps)
dmesg -T | grep -A20 'blocked for more'

# Same data via journald
journalctl -k -g 'blocked for more'

# List every task currently in uninterruptible (D) sleep, with its wait channel
ps -eo pid,stat,wchan,comm | awk '$2 ~ /D/'

# Inspect what a specific blocked PID is waiting on (kernel stack + wait channel)
cat /proc/<pid>/stack
cat /proc/<pid>/wchan

# Confirm the current timeout threshold
cat /proc/sys/kernel/hung_task_timeout_secs

# Watch per-device I/O latency and utilization
iostat -x 1 5

# Pressure Stall Information: how much time tasks spend stalled on I/O
cat /proc/pressure/io

# Look for storage / NFS errors in the ring buffer
dmesg | grep -i 'nfs\|i/o error\|timeout'

# Check for NFS mounts (a hung server is a top suspect)
mount | grep nfs

Typical output from the ps scan when storage is stalled:

PID STAT WCHAN              COMMAND
301 D    jbd2_journal_commit jbd2/sda1-8
412 D    io_schedule         kworker/3:1
588 D    nfs_wait_bit_killab kworker/u8:2
901 D    folio_wait_bit      postgres

And /proc/pressure/io on a system where I/O is the bottleneck:

some avg10=78.42 avg60=71.03 avg300=44.18 total=5821934201
full avg10=61.07 avg60=55.92 avg300=33.40 total=4011298776

avg10=78.42 means tasks were stalled waiting on I/O roughly 78% of the last 10 seconds. Anything sustained above a few percent points squarely at the storage layer, confirming the hung task is an I/O victim rather than a CPU or scheduling problem.

Step-by-Step Resolution

  1. Capture the call traces. Run dmesg -T | grep -A20 'blocked for more' and note which tasks are blocked and the top frames. Frames like nfs_*, io_schedule, submit_bio, or jbd2_* confirm an I/O wait.
  2. Identify the device or mount. Map the task to its backing store. jbd2/sda1-8 points at sda1; an nfs_* frame points at an NFS mount (mount | grep nfs); a dm-* device points at LVM/multipath/iSCSI.
  3. Check device health and latency. Run iostat -x 1 5 and look for %util near 100 with huge await (wait time in ms). Check dmesg | grep -i 'i/o error' for hardware/medium errors.
  4. For NFS: verify the server is reachable and responsive. If it is down, the only real fix is restoring the server. Tasks on a hard mount will recover automatically once the server returns; tasks on a soft mount will eventually error out.
  5. For local disk: if SMART or dmesg shows a failing drive, plan to replace it. Move the workload off the device if you can.
  6. Relieve load if the backend is merely overloaded. Pause backups, throttle the noisy job, or scale up provisioned IOPS. Once latency drops back under the threshold, the warnings stop on their own.
  7. Do not just raise the timeout to hide it. You can tune kernel.hung_task_timeout_secs higher, but that only silences the symptom. Fix the storage first.
  8. Reboot only as a last resort. Tasks stuck in D cannot be killed, so a truly wedged NFS/iSCSI path sometimes leaves a reboot as the only way to clear them — but confirm the backend is healthy first, or they will just wedge again.

Prevention and Best Practices

  • Monitor I/O latency and PSI. Alert on /proc/pressure/io some avg60 and on await/%util from iostat. Catching rising latency early prevents the 120-second cliff.
  • Use soft or intr-friendly NFS options for non-critical mounts so a dead server returns errors instead of hanging forever. Reserve hard mounts for data you cannot afford to lose mid-write.
  • Configure multipath with sane timeouts for iSCSI/SAN so path failures fail over quickly instead of stalling.
  • Right-size cloud disk IOPS and watch burst-credit exhaustion on shared volumes.
  • Watch SMART health and replace disks proactively.
  • Optionally enable kernel.hung_task_panic on dedicated nodes where a wedged box should crash-dump and reboot for HA, rather than sit half-alive. Pair it with kdump so you capture the trace.

For broader kernel-log triage, see our Linux troubleshooting guides and the DevOps AI ToolKit blog.

  • soft lockup - CPU#N stuck for Ns! — A soft lockup means a CPU was spinning in the kernel without scheduling, i.e. a CPU/scheduling problem. A hung task is the opposite: the task is scheduled out, sleeping on I/O. Soft lockups point at busy loops or spinlock contention; hung tasks point at storage.
  • nfs: server X not responding, still trying — The NFS-specific companion message that usually accompanies hung tasks on NFS mounts.
  • Buffer I/O error on device sdX and I/O error, dev sdX, sector ... — Direct evidence of a failing block device behind the stall.
  • task <name> blocked ... Workqueue: ... — Same detector, naming a kernel workqueue worker as the victim.

Frequently Asked Questions

Is “blocked for more than 120 seconds” a crash? No. By default it is only a warning printed to the kernel log. The system keeps running. It becomes fatal only if you have set kernel.hung_task_panic = 1.

Why can’t I kill the stuck process with kill -9? Because it is in uninterruptible (D) sleep inside a syscall. The kernel will not deliver the signal until the blocking I/O returns. Once the underlying storage or NFS server recovers, the task resumes and any pending signal is processed.

Should I just increase hung_task_timeout_secs? Only if you have legitimately slow-but-healthy I/O and want fewer log lines. Raising it does not fix anything — the task is still blocked. Always diagnose the storage backend first.

Is the named task the cause of the problem? Almost never. The task (kworker, jbd2, your app) is the victim waiting on I/O. The real culprit is the device or mount it is waiting on, which the call trace and iostat/PSI output will reveal.

Why do I see many tasks blocked at once? When one device or filesystem journal stalls, every process that needs it queues up behind it, so they all cross the 120-second threshold together. Find the shared device and you have found the root cause.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.