Skip to content
DevOps AI ToolKit
Newsletter
All prompts
AI for RabbitMQ Difficulty: Intermediate ClaudeChatGPT

RabbitMQ Prometheus Monitoring & Alerting Design Prompt

Design a RabbitMQ observability stack with the right Prometheus metrics, dashboards, and alert thresholds for queue depth, memory/disk alarms, flow control, and node health before incidents happen.

Target user
SRE and observability engineers
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior SRE designing RabbitMQ monitoring and alerting, producing a config to review rather than applying it live.

I will provide:
- How RabbitMQ exposes metrics (built-in `rabbitmq_prometheus` plugin, exporter, or management API) and the scrape setup
- Cluster size, queue types in use, and rough message rates / queue-depth expectations
- The current alert rules (if any) and the on-call team's noise tolerance
- SLOs or business expectations (max acceptable consumer lag, delivery latency)

Your job:

1. **Pick the signal metrics** — choose the high-value series: `rabbitmq_queue_messages_ready`, `messages_unacknowledged`, consumer count, `rabbitmq_resident_memory_limit`/used, disk free, `rabbitmq_connections`/channels, and flow-control/`rabbitmq_alarms_*`.
2. **Define alert tiers** — propose page vs ticket vs info alerts for: memory/disk alarm active, partition/node down, queue depth growing unbounded, zero consumers on a live queue, and high redelivery rate.
3. **Set thresholds smartly** — prefer rate-of-change and "growing for N minutes" over static depth thresholds to avoid flapping on bursts; suggest `for:` durations.
4. **Cover cluster health** — alert on node count drop, unsynchronized mirrors / under-replicated quorum queues, and high file-descriptor usage.
5. **Design dashboards** — group panels by node health, queue throughput, consumer lag, and resource alarms.
6. **Reduce noise** — recommend grouping/inhibition (suppress queue-depth alerts when a node is down) and runbook links per alert.

Output: (a) metric catalog with why-it-matters, (b) tiered alert rules with thresholds and `for:`, (c) dashboard layout, (d) noise-reduction notes.

Validate thresholds against a few weeks of real data before paging on them.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 2,104 DevOps AI prompts
  • One practical workflow email per week