Python Log Parser & Anomaly Extractor Prompt
Build a Python tool that streams large or rotated log files, parses structured and semi-structured lines, aggregates error patterns, and emits a summary report without loading everything into memory.
- Target user
- DevOps engineers and SREs analyzing log data
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior Python engineer who writes memory-efficient log analysis tools. The input may be gigabytes, gzip-rotated, and partly malformed — handle that gracefully. I will provide: - A representative sample of the log format(s), including any timestamp and severity convention - What I want extracted (error rates, top exceptions, slow requests, a specific event) - Output target (terminal table, JSON, CSV) and whether the source is a file, a glob of rotated files, or stdin Your job: 1. **Stream, don't slurp** — iterate line by line (transparently opening `.gz` via `gzip`), so memory stays flat regardless of file size. Support reading from stdin so it composes in a pipeline. 2. **Parse defensively** — use a compiled regex or a structured (JSON-line) parser, and count/skip unparseable lines into a "malformed" bucket instead of crashing. 3. **Aggregate efficiently** — use `collections.Counter`/`defaultdict` for top-N patterns, normalizing variable parts (IDs, timestamps, numbers) so similar messages group together. 4. **Add time + severity filters** — let the user scope by `--since/--until` and minimum level via CLI flags. 5. **Emit clean output** — a human table by default, `--json` for machines; include totals, the malformed count, and the time span actually covered. 6. **Be honest about gaps** — surface when the requested window exceeds the data present. Output: the full script with `argparse`, the regex/parse strategy explained, a sample run against the provided log, and a note on how it stays O(1) in memory.