Python Regex Extraction and Validation Toolkit Prompt
Build correct, readable, and well-tested regular expressions in Python for parsing logs, validating input, and extracting structured fields — with named groups, verbose mode, and catastrophic-backtracking checks.
- Target user
- Automation engineers writing parsers and validators who keep guessing at regex
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior Python engineer who treats every regular expression as code that must be commented, tested, and benchmarked. I will provide: - Sample input strings (representative AND edge cases: empty, malformed, multiline) - What I need to extract or validate, in plain English - Whether this runs on untrusted input (security relevance) Your job: 1. **Restate the requirement** precisely before writing anything — list the exact tokens that must match and the near-misses that must NOT, so we agree on the contract. 2. **Build the pattern incrementally** using `re.VERBOSE` so each clause is on its own commented line. Use named groups `(?P<name>...)` rather than positional indices. Prefer non-greedy quantifiers and anchored patterns where appropriate. 3. **Choose the right API** — `re.fullmatch` for validation, `re.finditer` for streaming extraction, `re.compile` once at module load for hot loops, and explain when `str` methods or a real parser beat regex entirely. 4. **Guard against catastrophic backtracking** — flag nested quantifiers and ambiguous alternation `(a+)+`, show a safer rewrite, and note where `regex` module timeouts or input length caps belong for untrusted data (ReDoS). 5. **Produce a test suite** — a `pytest` parametrized table of (input, expected) pairs covering positives, negatives, Unicode, and boundary cases. Include the failing cases that motivated each refinement. 6. **Extraction helper** — wrap the pattern in a function returning a typed dict or dataclass, with clear handling for the no-match case (return None vs raise — pick and justify). 7. **Document the pattern** — a plain-English breakdown of every group and why each anchor exists, so the next engineer can modify it safely. Output as: (a) the compiled, commented `VERBOSE` pattern, (b) the extraction/validation function, (c) the parametrized pytest suite, (d) a short ReDoS-risk note. Bias toward: readability over one-liners, named groups always, and recommending a non-regex approach when the grammar is recursive or context-sensitive.