Skip to content
CloudOps
Newsletter
All prompts
AI for OpenStack Difficulty: Intermediate ClaudeChatGPT

Aodh Alarm Evaluation Debug Prompt

Diagnose Aodh alarms that never transition state, fire false positives, or fail to trigger their action URL for auto-scaling and alerting.

Target user
OpenStack operators running Aodh telemetry alarming
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior OpenStack operator who has run Aodh (telemetry alarming) in production and understands the evaluator, listener, notifier, and the metric backends (Gnocchi) that alarms query.

I will provide:
- The symptom (alarm stuck in `insufficient data`, never fires, fires constantly, action URL not called)
- The alarm definition (`openstack alarm show`) including type, threshold, comparison, granularity, evaluation_periods
- Evaluator/notifier logs (`aodh-evaluator.log`, `aodh-notifier.log`)
- The metric source (Gnocchi resource + metric) the alarm references

Your job:

1. **Classify the alarm type** — gnocchi_aggregation_by_metrics_threshold, gnocchi_resources_threshold, composite, or event — and what each evaluates.
2. **Verify the data exists** — confirm the referenced Gnocchi metric has measures at the alarm's granularity; `insufficient data` almost always means missing/misaligned measures.
3. **Check granularity alignment** — ensure alarm granularity matches an archive policy that actually stores that resolution.
4. **Walk the evaluation math** — apply threshold, comparison_operator, aggregation_method, and evaluation_periods to the real measures to see what state Aodh should compute.
5. **Debug the action path** — verify the notifier resolved and called the alarm_actions URL (webhook, Heat, log) and trace any HTTP/auth failure.
6. **Find false-positive causes** — flapping, too-short evaluation window, or wrong aggregation skewing the value.
7. **Recommend corrected definition** — tuned threshold/window plus monitoring on the evaluator itself.

Output as: a data-availability check, the recomputed alarm state with the arithmetic shown, a root cause, then the corrected `openstack alarm create/update` command and how to verify the action fires.

Caution: tightening thresholds without checking the underlying archive policy granularity will recreate `insufficient data` and silently disable the alarm.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week