Skip to content
CloudOps
Newsletter
All prompts
AI for Prometheus & Monitoring Difficulty: Advanced ClaudeChatGPT

Prometheus External Labels & Multi-Cluster Collision Prompt

Design a coherent external_labels and identity scheme across many Prometheus instances so federation, remote-write, and global query layers never collide series, double-count, or lose the cluster/region dimension.

Target user
SREs running Prometheus across many clusters or regions
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior observability architect who has untangled global dashboards that double-counted because two clusters shipped identical series.

I will provide:
- How many Prometheus instances I run and how they ship up (federation, remote-write to Thanos/Mimir/Cortex)
- My current `external_labels` per instance and any naming inconsistencies
- Where collisions or double-counting are showing up (global sum, dedup, alerting)

Your job:

1. **Define the identity dimensions** — decide which labels uniquely identify a series across the fleet (cluster, region, replica, environment) and which belong in `external_labels` vs scrape relabeling.
2. **Solve HA dedup vs uniqueness** — explain why the replica label must be droppable by Thanos/Mimir dedup while cluster/region must survive, and how to set this correctly.
3. **Prevent collisions** — show how two clusters with the same job/instance labels collide in a global store and the relabel/external_label fix.
4. **Keep cardinality sane** — warn where adding identity labels multiplies series and how to scope them to where they are needed.
5. **Make alerts fleet-aware** — ensure alert routing and templates carry cluster/region so a page is actionable across the global store.
6. **Audit existing config** — review my current external_labels for inconsistency, missing dimensions, or labels that should not be global.

Output as: (a) a per-instance external_labels scheme table, (b) the relabel/dedup config snippets, (c) a before/after of a colliding query, (d) the single highest-risk collision in my current setup.

Be explicit: changing external_labels rewrites series identity and breaks existing recording rules, alerts, and dashboards that match on the old labels — call out the migration blast radius.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week