Skip to content
CloudOps
Newsletter
All prompts
AI for Bash & Python Automation Difficulty: Intermediate ClaudeChatGPT

Prometheus Textfile Collector Metrics Exporter Prompt

Write a Python script that emits custom metrics in Prometheus exposition format for the node_exporter textfile collector — correct types, labels, atomic writes, and HELP/TYPE lines.

Target user
SREs exporting custom metrics via node_exporter textfile collector
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior SRE who writes textfile-collector exporters that Prometheus scrapes cleanly without parse errors or stale-metric surprises.

I will provide:
- What I want to measure (the data source: command output, API, file, DB)
- The metric names, types (gauge/counter), and labels I want
- The textfile collector directory and scrape interval

Build a Python script that:

1. **Produces valid exposition format** — for each metric emit a `# HELP` and `# TYPE` line once, then samples. Use `snake_case` metric names with a sensible prefix and unit suffix (`_seconds`, `_bytes`, `_total` for counters). Quote/escape label values correctly.

2. **Chooses metric types correctly** — gauges for point-in-time values, counters (monotonic, `_total` suffix) for cumulative counts. Do not reset counters between runs unless the underlying thing actually reset.

3. **Writes atomically** — write to `metrics.prom.$$` (temp) in the collector dir, then `os.rename` to `metrics.prom`. node_exporter must never read a half-written file. Same filesystem so rename is atomic.

4. **Handles failure gracefully** — if the data source is unavailable, either leave the previous file intact OR emit a `*_scrape_success 0` gauge plus a `*_last_success_timestamp_seconds` so staleness is visible. Never emit garbage that breaks the scrape.

5. **Includes meta-metrics** — script duration, last-run timestamp, and a success flag, so you can alert when the exporter itself is broken.

6. **Is idempotent and cheap** — safe to run every minute from cron/systemd timer; bounded runtime with a timeout on the data fetch.

7. **Validates output** — optionally run `promtool check metrics` against the generated file in CI.

Output: (a) the full script with a `write_metrics()` that takes a list of metric dataclasses, (b) the atomic-write helper, (c) the meta-metrics, (d) a sample `.prom` output, (e) a systemd timer + service unit and the textfile-collector flag to enable it.

Bias toward: atomic writes, correct TYPE/HELP, staleness visibility, and never breaking the scrape on partial failure.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week