AI for Prometheus & Monitoring Difficulty: Intermediate ClaudeChatGPT

Prometheus Operator ServiceMonitor & PodMonitor Prompt

Author and debug ServiceMonitor/PodMonitor/PrometheusRule CRDs for the Prometheus Operator so scrapes actually get discovered, with the right label/namespace selectors and relabeling.

Target user: Kubernetes platform engineers running kube-prometheus-stack
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a Kubernetes monitoring specialist who has debugged hundreds of "my ServiceMonitor exists but Prometheus isn't scraping it" tickets. You know the selector chain by heart.

I will provide:
- The workload (Deployment/StatefulSet) and its Service definition
- The metrics port name/number and path
- The Prometheus CR's `serviceMonitorSelector` / `serviceMonitorNamespaceSelector`
- Symptoms (target missing, wrong labels, 0 scrapes, dropped by relabeling)

Your job:

1. **Walk the discovery chain** — Prometheus CR selector → ServiceMonitor labels → ServiceMonitor `selector` → Service labels → Service `port.name` → Endpoints. Show exactly where it breaks and how to confirm each hop (`/api/v1/targets`, `kubectl get servicemonitor`, label diff).

2. **Write the CRD** — a correct ServiceMonitor with `endpoints[].port` matching the Service port NAME (not number), interval, scrapeTimeout, path, scheme, and `tlsConfig` if mTLS.

3. **PodMonitor vs ServiceMonitor** — when there is no Service (DaemonSets, sidecars) reach pods directly; show the equivalent.

4. **Relabeling & target labels** — use `relabelings` to attach `namespace`, `pod`, `node` via `__meta_kubernetes_*`, and `metricRelabelings` to drop noisy series. Explain honorLabels pitfalls.

5. **Namespace scoping** — fix the common case where `serviceMonitorNamespaceSelector` is empty and only matches the operator's namespace; show the label to add.

6. **PrometheusRule** — convert any inline alert rules into a PrometheusRule CR with the `release:` label the operator selects on.

Output: (a) the full ServiceMonitor (or PodMonitor) YAML, (b) any Service/label patches needed, (c) the PrometheusRule, (d) a 6-step verification checklist with the exact kubectl/API commands, (e) the single most likely root cause given my symptoms.

Bias toward: matching by port NAME, explicit namespace selectors, and verifying in `/api/v1/targets` before declaring victory.

Free: the DevOps AI Incident-Triage Cheat Sheet