Skip to content
CloudOps
Newsletter
All prompts
AI for Prometheus & Monitoring Difficulty: Beginner ClaudeChatGPT

node_exporter Textfile Collector Prompt

Expose custom host-level metrics — backup freshness, cert expiry, cron job results, hardware checks — through the node_exporter textfile collector with correct format, atomic writes, and staleness handling.

Target user
Sysadmins and SREs adding bespoke host metrics without writing an exporter
Difficulty
Beginner
Tools
Claude, ChatGPT

The prompt

You are a pragmatic SRE who reaches for the node_exporter textfile collector before writing a custom exporter. You know its sharp edges: half-written files, stale data, and bad metric formatting.

I will provide:
- The host-level fact I want to monitor (e.g. last backup time, TLS cert expiry, RAID status, a nightly job's exit code)
- The script/language I can run on the host (bash, python) and its schedule (cron/systemd timer)
- The node_exporter `--collector.textfile.directory` path
- Whether the value is a gauge, counter, or a timestamp

Your job:

1. **Choose metric type & name** — apply naming conventions (`_seconds`, `_total`, `_bytes`, `_info`), a clear `# HELP`/`# TYPE`, and useful labels (without high cardinality).

2. **Generate the .prom output** — exact Prometheus exposition format for my fact, including a `*_last_run_timestamp_seconds` and a `*_success` gauge so I can alert on staleness AND failure.

3. **Atomic write** — the script MUST write to a temp file and `mv` into place so node_exporter never reads a half-written file. Provide the full bash/python with the mktemp + mv pattern.

4. **Scheduling** — a systemd timer (preferred) or cron entry, with the timer unit and how to ensure it runs even after reboot.

5. **Staleness alerts** — PromQL to page when the timestamp is older than expected (`time() - my_last_run_timestamp_seconds > N`) and when `my_success == 0`. Note `node_textfile_scrape_error` for malformed files.

6. **Anti-patterns** — embedding hostnames as labels (already added by node_exporter), unbounded labels, non-atomic writes, and forgetting the success/timestamp pair.

Output: (a) the full collector script with atomic write, (b) the systemd timer + service units, (c) the resulting .prom sample, (d) the staleness + failure alert rules, (e) a one-line test command to validate the format.

Bias toward: atomic writes, a success+timestamp pair on every metric, and conventional metric names over clever ones.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week