Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Bash & Python Automation By James Joyner IV · · 9 min read

Bash & Python Error Guide: 'BrokenPipeError' and 'UnicodeDecodeError'

Fix Python BrokenPipeError when piping to head/grep and UnicodeDecodeError reading non-UTF-8 files: SIGPIPE handling, encoding detection, and safe I/O patterns.

  • #automation
  • #troubleshooting
  • #errors
  • #python

Overview

BrokenPipeError and UnicodeDecodeError are two of the most common I/O failures in Python automation, and both stem from a mismatch between your program’s assumptions and the real world at its edges. BrokenPipeError: [Errno 32] Broken pipe happens when your program keeps writing to a pipe whose reader has gone away — classically python3 produce.py | head where head exits after 10 lines but your script keeps printing. UnicodeDecodeError happens when you read bytes as text using the wrong codec — opening a Latin-1 or binary file in the default UTF-8 mode and hitting a byte sequence that is not valid UTF-8.

The two tracebacks:

BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
Traceback (most recent call last):
  File "/srv/app/parse.py", line 4, in <module>
    data = open("legacy.csv").read()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 1423: invalid continuation byte

BrokenPipeError occurs at write time when the downstream reader has closed. UnicodeDecodeError occurs at read/decode time on the first byte the codec cannot interpret. Both are about the boundary between your process and something external — a pipe, or a file’s actual encoding.

Symptoms

  • BrokenPipeError: [Errno 32] Broken pipe, often with a trailing “Exception ignored in” at interpreter shutdown.
  • The error appears only when piping into head, less, grep -m, or a consumer that closes early.
  • UnicodeDecodeError: 'utf-8' codec can't decode byte 0x.. in position .. when reading a file or subprocess output.
  • A script works on UTF-8 files but fails on files exported from Excel, Windows, or legacy systems.
python3 produce.py | head -5
line1
...
BrokenPipeError: [Errno 32] Broken pipe
python3 parse.py legacy.csv
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 1423: invalid continuation byte

Common Root Causes

1. Writing to a pipe whose reader exited (BrokenPipeError)

The downstream command (head, less) closes the pipe after reading what it needs; your script’s next print() writes to a closed pipe and the kernel delivers SIGPIPE / EPIPE.

# produce.py
for i in range(10_000_000):
    print(i)
... | head -3
0
1
2
BrokenPipeError: [Errno 32] Broken pipe

Restore the default SIGPIPE behavior so the process exits quietly like normal Unix tools:

import signal
signal.signal(signal.SIGPIPE, signal.SIG_DFL)

2. Buffered output flushing after the reader is gone

Even when you stop early, buffered stdout is flushed at exit; if the reader already left, the flush triggers the error during interpreter shutdown (“Exception ignored in”).

import sys
try:
    main()
finally:
    try:
        sys.stdout.flush()
    except BrokenPipeError:
        # devnull the fd so shutdown flush doesn't re-raise
        import os
        os.dup2(os.open(os.devnull, os.O_WRONLY), sys.stdout.fileno())

3. Reading a non-UTF-8 file with the default codec (UnicodeDecodeError)

The file is Latin-1 / Windows-1252 / UTF-16, but open() defaults to UTF-8 (locale-dependent), and a high byte like 0xe9 (é in Latin-1) is invalid as UTF-8.

file -i legacy.csv
legacy.csv: text/csv; charset=iso-8859-1

Open with the correct encoding:

data = open("legacy.csv", encoding="latin-1").read()

4. A UTF-16 file (BOM) read as UTF-8

Files exported from some Windows tools are UTF-16 with a BOM; the null bytes break UTF-8 decoding immediately.

hexdump -C export.txt | head -1
00000000  ff fe 48 00 65 00 6c 00  6c 00 6f 00              |..H.e.l.l.o.|

The ff fe BOM signals UTF-16 LE. Use encoding="utf-16" (the BOM auto-selects endianness).

5. Reading binary data as text

The “file” is actually gzip, an image, or a database dump; decoding any of it as text fails on the first non-text byte.

open("data.bin").read()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Open in binary mode ("rb") and handle bytes, or decompress first.

6. subprocess output containing non-UTF-8 bytes

subprocess.run(..., text=True) decodes child output as UTF-8; tools emitting Latin-1 or raw bytes raise.

import subprocess
out = subprocess.run(["./tool"], capture_output=True, text=True)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Capture bytes (no text=True) and decode with errors="replace", or set encoding=/errors=.

Diagnostic Workflow

Step 1: Classify which error you have

python3 script.py | head -1     # triggers BrokenPipeError if it's a pipe issue
python3 script.py               # triggers UnicodeDecodeError if it's an encoding issue

A pipe-only failure points at SIGPIPE; a failure reading a file points at encoding.

Step 2 (pipe): Confirm the reader closes early

python3 script.py | head -3 ; echo "exit=$?"

If it only fails with head/less and not when redirected to a file, it is a broken-pipe/SIGPIPE issue.

Step 3 (pipe): Restore default SIGPIPE handling

import signal
signal.signal(signal.SIGPIPE, signal.SIG_DFL)

The process now exits silently when the reader leaves, like cat or yes.

Step 4 (encoding): Detect the file’s real encoding

file -i suspect.txt
hexdump -C suspect.txt | head -2
chardetect suspect.txt 2>/dev/null   # if chardet is installed

charset= and the leading bytes (BOM) tell you which codec to use.

Step 5 (encoding): Open with the right codec or tolerate errors

# Known encoding:
open("f.csv", encoding="latin-1")
# Unknown / mixed — never crash, replace bad bytes:
open("f.csv", encoding="utf-8", errors="replace")

Example Root Cause Analysis

An engineer pipes a report generator into head to preview it and gets a stack trace every time:

$ python3 report.py | head -20
... 20 lines ...
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe

Redirecting to a file works fine, which isolates the issue to the pipe:

python3 report.py > /tmp/out.txt ; echo "exit=$?"
exit=0

report.py loops printing thousands of rows. head -20 reads 20 lines, then closes the pipe and exits. Python’s default behavior converts the resulting SIGPIPE into a BrokenPipeError exception, and the leftover buffered stdout is flushed at shutdown against the now-closed pipe — producing the “Exception ignored in” trailer. Standard Unix tools avoid this by leaving SIGPIPE at its default disposition (terminate quietly).

Fix: restore the default SIGPIPE handler at the top of the script so it behaves like a normal pipeline producer:

import signal, sys
signal.signal(signal.SIGPIPE, signal.SIG_DFL)

def main():
    for row in generate_rows():
        print(row)

if __name__ == "__main__":
    main()

Now python3 report.py | head -20 prints 20 lines and exits cleanly with no traceback.

Prevention Best Practices

  • For any script meant to be piped into head/less/grep, set signal.signal(signal.SIGPIPE, signal.SIG_DFL) at startup so early reader exit terminates the process quietly instead of raising.
  • Never rely on the platform default when reading files — pass an explicit encoding= to open(); use file -i or a BOM check to determine the real codec for legacy data.
  • Use errors="replace" (or "ignore") for best-effort reads of mixed or unknown encodings so one bad byte cannot crash a batch job.
  • Read genuinely binary data in "rb" mode and decode deliberately; never read gzip/images/dumps as text.
  • For subprocess, capture bytes and decode with a known encoding/errors rather than blindly trusting text=True to be UTF-8.
  • For triaging I/O failures that surface in scheduled jobs, the free incident assistant can distinguish a broken-pipe shutdown from an encoding crash. More patterns in the Bash & Python automation guides.

Quick Command Reference

# Reproduce / isolate a broken pipe
python3 script.py | head -3 ; echo "exit=$?"
python3 script.py > /tmp/out.txt   # works if it's pipe-only

# Detect a file's encoding
file -i suspect.txt
hexdump -C suspect.txt | head -2

# Restore default SIGPIPE (in the script)
python3 -c "import signal; signal.signal(signal.SIGPIPE, signal.SIG_DFL)"
# Encoding-safe reads
open("f.csv", encoding="latin-1").read()
open("f.csv", encoding="utf-8", errors="replace").read()
open("data.bin", "rb").read()        # binary, no decode

Conclusion

BrokenPipeError and UnicodeDecodeError are boundary errors between your program and its I/O. For BrokenPipeError, the recurring causes are writing to a pipe after the reader exited and buffered output flushing at shutdown — both solved by restoring default SIGPIPE handling. For UnicodeDecodeError:

  1. Reading a non-UTF-8 file (Latin-1/Windows-1252) with the default codec.
  2. A UTF-16/BOM file decoded as UTF-8.
  3. Reading binary data as text.
  4. subprocess output containing non-UTF-8 bytes.

Classify the error first (pipe vs file), then apply the matching fix: signal.SIG_DFL for pipes, and an explicit encoding= (with errors="replace" as a safety net) for reads. Both are one-line fixes once you know which boundary failed.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.