You are a senior platform engineer who has integrated Prometheus with long-term storage — Thanos, Cortex/Mimir, VictoriaMetrics — for global query and multi-year retention. I will provide: - Long-term storage backend choice - Symptom (remote write queue growing, samples dropped, slow ingest) - Current `remote_write` config Your job: 1. **When remote write**: - Long-term retention (months/years) beyond local TSDB - Multi-Prometheus aggregation - Disaster recovery 2. **Backend choices**: - **Thanos** — sidecar uploads blocks to S3; Querier federates - **Mimir / Cortex** — multi-tenant Prometheus-compatible - **VictoriaMetrics** — open source, single-binary or cluster - **Grafana Cloud** — managed 3. **For remote_write config**: - `url` — remote endpoint - `queue_config` — buffering, batch size, max samples per send - `write_relabel_configs` — drop / transform before send 4. **For "queue growing"**: - Remote slower than ingest rate - Tune queue: increase `capacity`, `max_samples_per_send` - Or: backend too small 5. **For "samples dropped"**: - Queue full → samples dropped - Check `prometheus_remote_storage_samples_dropped_total` - Reduce ingest rate or scale backend 6. **For "back-pressure"**: - Prometheus blocks on full queue - Affects local TSDB too - Critical to monitor 7. **For authentication**: - Bearer token, basic auth, mTLS, sigv4 - Secret management 8. **For metric filtering** before send: - `write_relabel_configs` to drop noise - Saves bandwidth + backend cost Mark DESTRUCTIVE: removing remote write while backend depends (gap in long-term history), changing endpoint without verifying (data loss), aggressive queue dropping samples. --- Backend: [Thanos / Mimir / VictoriaMetrics / Grafana Cloud] Symptom: [DESCRIBE] `remote_write` config: ```yaml [PASTE] ```

Why this prompt works

Long-term storage at scale requires understanding the remote write pipeline. This prompt walks it.

How to use it

Pick backend based on needs.
Tune queue for backend speed.
Filter at source to save bandwidth.
Monitor queue health.

Useful commands

# Remote write metrics
prometheus_remote_storage_samples_in_total
prometheus_remote_storage_samples_failed_total
prometheus_remote_storage_samples_dropped_total
prometheus_remote_storage_shard_capacity
prometheus_remote_storage_shards
prometheus_remote_storage_shards_desired
prometheus_remote_storage_pending_samples
prometheus_remote_storage_queue_highest_sent_timestamp_seconds
prometheus_remote_storage_highest_timestamp_in_seconds

# Lag (ingest vs sent)
prometheus_remote_storage_highest_timestamp_in_seconds
  - on(remote_name) group_right
prometheus_remote_storage_queue_highest_sent_timestamp_seconds

Config patterns

Thanos Receive

remote_write:
- url: "http://thanos-receive:19291/api/v1/receive"
  queue_config:
    capacity: 10000
    max_samples_per_send: 2000
    batch_send_deadline: 5s
    min_shards: 1
    max_shards: 50
  write_relabel_configs:
  # Drop noisy
  - source_labels: [__name__]
    regex: 'go_.*'
    action: drop

Mimir

remote_write:
- url: "https://mimir.example.com/api/v1/push"
  basic_auth:
    username: tenant1
    password_file: /etc/secret/mimir-password
  queue_config:
    capacity: 10000
    max_samples_per_send: 2000

VictoriaMetrics

remote_write:
- url: "https://victoriametrics.example.com/api/v1/write"
  queue_config:
    capacity: 10000
    max_samples_per_send: 5000        # VM tolerates large batches

Filter (drop high-cardinality at source)

remote_write:
- url: "..."
  write_relabel_configs:
  # Keep only essentials
  - source_labels: [__name__]
    regex: 'up|node_.*|http_requests_total|http_request_duration.*'
    action: keep
  # Drop pod-uid label
  - regex: 'pod_uid'
    action: labeldrop

Common findings this catches

Queue growing constantly → backend too slow; scale or filter.
Samples dropped → queue cap hit; tune.
Lag growing → ingest > send; scale shards.
Auth failures → token expired.
Local Prom OOM with queue full → back-pressure.
Backend ingest issues at scale → backend capacity.
Network partition → samples buffered until limit, then dropped.

When to escalate

Backend capacity planning — strategic.
Multi-region replication — DR.
Migration between backends — staged.

Prometheus Remote Write & Long-term Storage Prompt

Why this prompt works

How to use it

Useful commands

Config patterns

Thanos Receive

Mimir

VictoriaMetrics

Filter (drop high-cardinality at source)

Common findings this catches

When to escalate

Related prompts

Prometheus Performance Tuning Prompt

Prometheus Storage, Retention & TSDB Prompt

Thanos Architecture & Component Debug Prompt

Why this prompt works

How to use it

Useful commands

Config patterns

Thanos Receive

Mimir

VictoriaMetrics

Filter (drop high-cardinality at source)

Common findings this catches

When to escalate

Related prompts

Prometheus Performance Tuning Prompt

Prometheus Storage, Retention & TSDB Prompt

Thanos Architecture & Component Debug Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet