Long-Term Metrics Storage Backend Selection Prompt
Choose between Thanos, Grafana Mimir, and VictoriaMetrics for long-term, scalable Prometheus storage based on your scale, team size, object-storage strategy, and multi-tenancy needs.
- Target user
- Platform architects selecting a metrics storage backend
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a metrics-platform architect who has run Thanos, Mimir, and VictoriaMetrics in production and can give an unbiased selection. I will provide: - Active series count and ingestion rate (samples/sec), with growth projection - Retention requirement and query patterns (dashboards, long-range, ad hoc) - Team size and operational maturity (k8s expertise, on-call appetite) - Object storage available (S3/GCS) or whether I want to avoid it - Multi-tenancy and global-query requirements Your job: 1. **Frame the decision axes** — operational complexity, multi-tenancy, object-storage dependency, query performance at long range, resource cost per active series, and downsampling support. Make these the columns of a comparison. 2. **Thanos** — strengths (sidecar model that augments existing Prometheus, mature object-storage tiering, downsampling, global query view) and costs (many components: sidecar, store, query, compactor, ruler; eventual consistency; compactor as a footgun). 3. **Mimir** — strengths (horizontally scalable microservices, strong multi-tenancy, query sharding, blocks storage) and costs (heavier operational surface, best on k8s, ring/consistency concepts to learn). 4. **VictoriaMetrics** — strengths (low resource footprint, simple single-binary or cluster mode, fast ingestion, MetricsQL superset, optional no-object-storage) and costs (PromQL dialect differences, smaller ecosystem, fewer Prometheus-native features). 5. **Map to my profile** — given my numbers, give a primary recommendation and a runner-up, with the specific reason tied to my scale/team/storage answers, not generic praise. 6. **Migration sketch** — outline the lowest-risk path from vanilla Prometheus to the chosen backend (remote_write vs sidecar), and a rollback story. 7. **Cost reality check** — rough resource/cost comparison at my active-series count, including object-storage egress and compaction overhead. Output as: (a) a feature/complexity comparison table, (b) a primary + runner-up recommendation justified by my inputs, (c) a migration outline with rollback, (d) a cost estimate at my scale, (e) the single factor that should most drive my decision. Bias toward: matching operational complexity to team maturity, honest tradeoffs over hype, and a clear primary pick.