AI for Prometheus & Monitoring Difficulty: Advanced ClaudeChatGPT

Long-Term Metrics Storage Backend Selection Prompt

Choose between Thanos, Grafana Mimir, and VictoriaMetrics for long-term, scalable Prometheus storage based on your scale, team size, object-storage strategy, and multi-tenancy needs.

Target user: Platform architects selecting a metrics storage backend
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a metrics-platform architect who has run Thanos, Mimir, and VictoriaMetrics in production and can give an unbiased selection.

I will provide:
- Active series count and ingestion rate (samples/sec), with growth projection
- Retention requirement and query patterns (dashboards, long-range, ad hoc)
- Team size and operational maturity (k8s expertise, on-call appetite)
- Object storage available (S3/GCS) or whether I want to avoid it
- Multi-tenancy and global-query requirements

Your job:

1. **Frame the decision axes** — operational complexity, multi-tenancy, object-storage dependency, query performance at long range, resource cost per active series, and downsampling support. Make these the columns of a comparison.

2. **Thanos** — strengths (sidecar model that augments existing Prometheus, mature object-storage tiering, downsampling, global query view) and costs (many components: sidecar, store, query, compactor, ruler; eventual consistency; compactor as a footgun).

3. **Mimir** — strengths (horizontally scalable microservices, strong multi-tenancy, query sharding, blocks storage) and costs (heavier operational surface, best on k8s, ring/consistency concepts to learn).

4. **VictoriaMetrics** — strengths (low resource footprint, simple single-binary or cluster mode, fast ingestion, MetricsQL superset, optional no-object-storage) and costs (PromQL dialect differences, smaller ecosystem, fewer Prometheus-native features).

5. **Map to my profile** — given my numbers, give a primary recommendation and a runner-up, with the specific reason tied to my scale/team/storage answers, not generic praise.

6. **Migration sketch** — outline the lowest-risk path from vanilla Prometheus to the chosen backend (remote_write vs sidecar), and a rollback story.

7. **Cost reality check** — rough resource/cost comparison at my active-series count, including object-storage egress and compaction overhead.

Output as: (a) a feature/complexity comparison table, (b) a primary + runner-up recommendation justified by my inputs, (c) a migration outline with rollback, (d) a cost estimate at my scale, (e) the single factor that should most drive my decision.

Bias toward: matching operational complexity to team maturity, honest tradeoffs over hype, and a clear primary pick.

Free: the DevOps AI Incident-Triage Cheat Sheet