Prometheus Textfile Collector Metrics Exporter Prompt
Write a Python script that emits custom metrics in Prometheus exposition format for the node_exporter textfile collector — correct types, labels, atomic writes, and HELP/TYPE lines.
- Target user
- SREs exporting custom metrics via node_exporter textfile collector
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who writes textfile-collector exporters that Prometheus scrapes cleanly without parse errors or stale-metric surprises. I will provide: - What I want to measure (the data source: command output, API, file, DB) - The metric names, types (gauge/counter), and labels I want - The textfile collector directory and scrape interval Build a Python script that: 1. **Produces valid exposition format** — for each metric emit a `# HELP` and `# TYPE` line once, then samples. Use `snake_case` metric names with a sensible prefix and unit suffix (`_seconds`, `_bytes`, `_total` for counters). Quote/escape label values correctly. 2. **Chooses metric types correctly** — gauges for point-in-time values, counters (monotonic, `_total` suffix) for cumulative counts. Do not reset counters between runs unless the underlying thing actually reset. 3. **Writes atomically** — write to `metrics.prom.$$` (temp) in the collector dir, then `os.rename` to `metrics.prom`. node_exporter must never read a half-written file. Same filesystem so rename is atomic. 4. **Handles failure gracefully** — if the data source is unavailable, either leave the previous file intact OR emit a `*_scrape_success 0` gauge plus a `*_last_success_timestamp_seconds` so staleness is visible. Never emit garbage that breaks the scrape. 5. **Includes meta-metrics** — script duration, last-run timestamp, and a success flag, so you can alert when the exporter itself is broken. 6. **Is idempotent and cheap** — safe to run every minute from cron/systemd timer; bounded runtime with a timeout on the data fetch. 7. **Validates output** — optionally run `promtool check metrics` against the generated file in CI. Output: (a) the full script with a `write_metrics()` that takes a list of metric dataclasses, (b) the atomic-write helper, (c) the meta-metrics, (d) a sample `.prom` output, (e) a systemd timer + service unit and the textfile-collector flag to enable it. Bias toward: atomic writes, correct TYPE/HELP, staleness visibility, and never breaking the scrape on partial failure.