Bash & Python Error Guide: 'BrokenPipeError' and 'UnicodeDecodeError'
Fix Python BrokenPipeError when piping to head/grep and UnicodeDecodeError reading non-UTF-8 files: SIGPIPE handling, encoding detection, and safe I/O patterns.
- #automation
- #troubleshooting
- #errors
- #python
Overview
BrokenPipeError and UnicodeDecodeError are two of the most common I/O failures in Python automation, and both stem from a mismatch between your program’s assumptions and the real world at its edges. BrokenPipeError: [Errno 32] Broken pipe happens when your program keeps writing to a pipe whose reader has gone away — classically python3 produce.py | head where head exits after 10 lines but your script keeps printing. UnicodeDecodeError happens when you read bytes as text using the wrong codec — opening a Latin-1 or binary file in the default UTF-8 mode and hitting a byte sequence that is not valid UTF-8.
The two tracebacks:
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
Traceback (most recent call last):
File "/srv/app/parse.py", line 4, in <module>
data = open("legacy.csv").read()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 1423: invalid continuation byte
BrokenPipeError occurs at write time when the downstream reader has closed. UnicodeDecodeError occurs at read/decode time on the first byte the codec cannot interpret. Both are about the boundary between your process and something external — a pipe, or a file’s actual encoding.
Symptoms
BrokenPipeError: [Errno 32] Broken pipe, often with a trailing “Exception ignored in” at interpreter shutdown.- The error appears only when piping into
head,less,grep -m, or a consumer that closes early. UnicodeDecodeError: 'utf-8' codec can't decode byte 0x.. in position ..when reading a file or subprocess output.- A script works on UTF-8 files but fails on files exported from Excel, Windows, or legacy systems.
python3 produce.py | head -5
line1
...
BrokenPipeError: [Errno 32] Broken pipe
python3 parse.py legacy.csv
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 1423: invalid continuation byte
Common Root Causes
1. Writing to a pipe whose reader exited (BrokenPipeError)
The downstream command (head, less) closes the pipe after reading what it needs; your script’s next print() writes to a closed pipe and the kernel delivers SIGPIPE / EPIPE.
# produce.py
for i in range(10_000_000):
print(i)
... | head -3
0
1
2
BrokenPipeError: [Errno 32] Broken pipe
Restore the default SIGPIPE behavior so the process exits quietly like normal Unix tools:
import signal
signal.signal(signal.SIGPIPE, signal.SIG_DFL)
2. Buffered output flushing after the reader is gone
Even when you stop early, buffered stdout is flushed at exit; if the reader already left, the flush triggers the error during interpreter shutdown (“Exception ignored in”).
import sys
try:
main()
finally:
try:
sys.stdout.flush()
except BrokenPipeError:
# devnull the fd so shutdown flush doesn't re-raise
import os
os.dup2(os.open(os.devnull, os.O_WRONLY), sys.stdout.fileno())
3. Reading a non-UTF-8 file with the default codec (UnicodeDecodeError)
The file is Latin-1 / Windows-1252 / UTF-16, but open() defaults to UTF-8 (locale-dependent), and a high byte like 0xe9 (é in Latin-1) is invalid as UTF-8.
file -i legacy.csv
legacy.csv: text/csv; charset=iso-8859-1
Open with the correct encoding:
data = open("legacy.csv", encoding="latin-1").read()
4. A UTF-16 file (BOM) read as UTF-8
Files exported from some Windows tools are UTF-16 with a BOM; the null bytes break UTF-8 decoding immediately.
hexdump -C export.txt | head -1
00000000 ff fe 48 00 65 00 6c 00 6c 00 6f 00 |..H.e.l.l.o.|
The ff fe BOM signals UTF-16 LE. Use encoding="utf-16" (the BOM auto-selects endianness).
5. Reading binary data as text
The “file” is actually gzip, an image, or a database dump; decoding any of it as text fails on the first non-text byte.
open("data.bin").read()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
Open in binary mode ("rb") and handle bytes, or decompress first.
6. subprocess output containing non-UTF-8 bytes
subprocess.run(..., text=True) decodes child output as UTF-8; tools emitting Latin-1 or raw bytes raise.
import subprocess
out = subprocess.run(["./tool"], capture_output=True, text=True)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
Capture bytes (no text=True) and decode with errors="replace", or set encoding=/errors=.
Diagnostic Workflow
Step 1: Classify which error you have
python3 script.py | head -1 # triggers BrokenPipeError if it's a pipe issue
python3 script.py # triggers UnicodeDecodeError if it's an encoding issue
A pipe-only failure points at SIGPIPE; a failure reading a file points at encoding.
Step 2 (pipe): Confirm the reader closes early
python3 script.py | head -3 ; echo "exit=$?"
If it only fails with head/less and not when redirected to a file, it is a broken-pipe/SIGPIPE issue.
Step 3 (pipe): Restore default SIGPIPE handling
import signal
signal.signal(signal.SIGPIPE, signal.SIG_DFL)
The process now exits silently when the reader leaves, like cat or yes.
Step 4 (encoding): Detect the file’s real encoding
file -i suspect.txt
hexdump -C suspect.txt | head -2
chardetect suspect.txt 2>/dev/null # if chardet is installed
charset= and the leading bytes (BOM) tell you which codec to use.
Step 5 (encoding): Open with the right codec or tolerate errors
# Known encoding:
open("f.csv", encoding="latin-1")
# Unknown / mixed — never crash, replace bad bytes:
open("f.csv", encoding="utf-8", errors="replace")
Example Root Cause Analysis
An engineer pipes a report generator into head to preview it and gets a stack trace every time:
$ python3 report.py | head -20
... 20 lines ...
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe
Redirecting to a file works fine, which isolates the issue to the pipe:
python3 report.py > /tmp/out.txt ; echo "exit=$?"
exit=0
report.py loops printing thousands of rows. head -20 reads 20 lines, then closes the pipe and exits. Python’s default behavior converts the resulting SIGPIPE into a BrokenPipeError exception, and the leftover buffered stdout is flushed at shutdown against the now-closed pipe — producing the “Exception ignored in” trailer. Standard Unix tools avoid this by leaving SIGPIPE at its default disposition (terminate quietly).
Fix: restore the default SIGPIPE handler at the top of the script so it behaves like a normal pipeline producer:
import signal, sys
signal.signal(signal.SIGPIPE, signal.SIG_DFL)
def main():
for row in generate_rows():
print(row)
if __name__ == "__main__":
main()
Now python3 report.py | head -20 prints 20 lines and exits cleanly with no traceback.
Prevention Best Practices
- For any script meant to be piped into
head/less/grep, setsignal.signal(signal.SIGPIPE, signal.SIG_DFL)at startup so early reader exit terminates the process quietly instead of raising. - Never rely on the platform default when reading files — pass an explicit
encoding=toopen(); usefile -ior a BOM check to determine the real codec for legacy data. - Use
errors="replace"(or"ignore") for best-effort reads of mixed or unknown encodings so one bad byte cannot crash a batch job. - Read genuinely binary data in
"rb"mode and decode deliberately; never read gzip/images/dumps as text. - For
subprocess, capture bytes and decode with a knownencoding/errorsrather than blindly trustingtext=Trueto be UTF-8. - For triaging I/O failures that surface in scheduled jobs, the free incident assistant can distinguish a broken-pipe shutdown from an encoding crash. More patterns in the Bash & Python automation guides.
Quick Command Reference
# Reproduce / isolate a broken pipe
python3 script.py | head -3 ; echo "exit=$?"
python3 script.py > /tmp/out.txt # works if it's pipe-only
# Detect a file's encoding
file -i suspect.txt
hexdump -C suspect.txt | head -2
# Restore default SIGPIPE (in the script)
python3 -c "import signal; signal.signal(signal.SIGPIPE, signal.SIG_DFL)"
# Encoding-safe reads
open("f.csv", encoding="latin-1").read()
open("f.csv", encoding="utf-8", errors="replace").read()
open("data.bin", "rb").read() # binary, no decode
Conclusion
BrokenPipeError and UnicodeDecodeError are boundary errors between your program and its I/O. For BrokenPipeError, the recurring causes are writing to a pipe after the reader exited and buffered output flushing at shutdown — both solved by restoring default SIGPIPE handling. For UnicodeDecodeError:
- Reading a non-UTF-8 file (Latin-1/Windows-1252) with the default codec.
- A UTF-16/BOM file decoded as UTF-8.
- Reading binary data as text.
subprocessoutput containing non-UTF-8 bytes.
Classify the error first (pipe vs file), then apply the matching fix: signal.SIG_DFL for pipes, and an explicit encoding= (with errors="replace" as a safety net) for reads. Both are one-line fixes once you know which boundary failed.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.