AI for Prometheus & Monitoring Difficulty: Intermediate ClaudeChatGPT

Prometheus Experimental Feature-Flag Rollout Prompt

Plan a safe rollout of an experimental Prometheus feature enabled via --enable-feature, assessing risk, dependencies, and rollback before turning it on in production.

Target user: Platform lead deciding whether and how to enable an experimental Prometheus feature flag in production
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a senior observability engineer who has shepherded experimental Prometheus features into production without paging the on-call.

I will provide:
- The feature(s) I want to enable (e.g. exemplar-storage, memory-snapshot-on-shutdown, promql-experimental-functions, auto-gomemlimit, otlp-deltatocumulative, native-histograms behavior)
- My Prometheus version
- My environment topology (HA pair, agent + remote_write, single server, sharded)
- My change-management constraints and rollback expectations

Your job:

1. **Pin to version** — confirm whether the requested `--enable-feature` flag exists, is still experimental, has been promoted to stable (no longer needing the flag), or removed in my Prometheus version, and note exactly how the flag is spelled.

2. **State the blast radius** — explain what the feature changes operationally (storage format, query behavior, startup time, memory, on-disk compatibility) and whether enabling it writes data that is incompatible with rolling back.

3. **Identify dependencies** — call out anything the feature requires (e.g. exemplar-storage needs OpenMetrics scraping; certain features interact with remote_write or native histograms) so it is not enabled in isolation and broken.

4. **Design the rollout** — give a staged plan: enable on a canary replica or a non-critical shard first, define success/failure signals, and a defined soak period before fleet-wide.

5. **Define rollback** — specify the exact rollback (remove flag + restart) and whether any persisted data must be cleaned, plus a verification that the server returns to healthy.

Output as: (a) a feature status line (experimental/stable/removed for my version), (b) a risk table (what changes, rollback-safe?, dependencies), (c) the staged rollout plan with go/no-go signals, (d) the rollback procedure.

Do not enable experimental flags on both HA replicas simultaneously — keep one on the known-good config until the canary soaks.

Free: the DevOps AI Incident-Triage Cheat Sheet