Monitoring-as-a-Service with OpenStack Monasca and AI

Most OpenStack monitoring stories start with someone bolting Prometheus onto the side and calling it done. That works until you need multi-tenant monitoring — where each project sees only its own metrics and can set its own alarms — and suddenly you are reinventing a whole access model. OpenStack Monasca is the project that already solved this: a high-throughput, multi-tenant monitoring-as-a-service that ingests metrics, evaluates alarm definitions, and notifies, all with Keystone-scoped tenancy baked in.

Monasca’s alarm expression language and its API surface are deep, and I will be honest: I do not memorize the expression grammar. I describe what I want to alert on, an AI assistant drafts the expression, and I verify it against real metric names before it goes live. The model is a fast junior engineer — quick with syntax, oblivious to whether the alarm will page someone at 3 a.m. for nothing.

Confirming Metrics Are Flowing

Monasca is useless without data. The agent pushes metrics to the API; verify they are arriving:

openstack metric-list
openstack metric-name-list

If metric-name-list is sparse or empty, your Monasca agents are not reporting — check the agent config on your hosts before blaming the API. A common gotcha is the agent running but pointed at the wrong Keystone endpoint, so metrics land in the wrong tenant.

Pull a specific metric’s recent statistics:

openstack metric-statistics cpu.percent avg \
  -260 --dimensions hostname=compute-01

Building Alarm Definitions

An alarm definition is an expression plus matching dimensions. The expression is where the power and the footguns both live:

openstack alarm-definition-create \
  --name 'High CPU on compute' \
  'avg(cpu.percent{hostname=compute-01}) > 90 times 3'

That fires when the average CPU exceeds 90 percent for three consecutive periods. The times 3 is the kind of detail people forget, which is why a brand-new alarm either never fires or fires constantly.

This is precisely the work I delegate to AI. I give the model my available metric names (from metric-name-list) and say “alert me when a host’s memory usage stays above 85 percent for ten minutes.” It writes the expression, including the right times count for my collection interval. I then sanity-check the metric name actually exists — models will cheerfully invent mem.usage_pct when your real metric is mem.usage_perc.

Pro Tip: Always paste your actual metric-name-list output into the prompt before asking for an alarm expression. Hallucinated metric names are the number-one cause of alarms that silently never evaluate, and grounding the model in real names eliminates almost all of them.

Compound Alarms and Sub-Expressions

Monasca supports boolean combinations, which is where you encode real operational knowledge:

openstack alarm-definition-create \
  --name 'Service degraded' \
  '(avg(cpu.percent) > 80) and (max(disk.space_used_perc) > 90)'

Compound expressions get hard to read fast. I lean on Claude to both write these and to explain an existing one back to me in English — “this fires when CPU is high AND disk is nearly full simultaneously.” That round-trip review catches logic errors a glance would miss.

Reading Alarm State

When something fires, list the alarms and inspect history:

openstack alarm-list --metric-name cpu.percent
openstack alarm-show <alarm-id>
openstack alarm-history <alarm-id>

The history tells you the transition path — OK to ALARM and back — which is gold for tuning a noisy definition. I route Monasca notifications through my monitoring alerts dashboard so the firing, the acknowledgment, and the fix all live in one place instead of three Slack threads.

Dimensions Are Everything

The feature that makes Monasca scale to a real multi-tenant cloud is dimensions — the key/value tags attached to every metric. A single metric name like cpu.percent carries dimensions for hostname, instance ID, project, and region, and your alarm expressions filter on them. Get the dimensions wrong and your alarm either matches nothing or matches every host in the fleet at once.

openstack metric-name-list --dimensions hostname=compute-01

This is a place I am very deliberate about AI help. When I describe an alarm “for one specific project’s instances,” the model needs to know which dimension carries the project — and dimension keys vary by deployment. So I always paste a sample metric’s actual dimensions into the prompt before asking for an expression. With that grounding, the model writes a correctly scoped filter; without it, it guesses project= when your cloud uses tenant_id=, and the alarm quietly evaluates against nothing. Reviewing the dimension filter is non-negotiable, because a too-broad filter is how one host’s CPU spike pages the entire on-call rotation.

Notifications and Routing

Alarms are pointless if nobody hears them. Create a notification method and wire it to definitions:

openstack notification-create pager EMAIL ops@example.com

Then attach it via the alarm definition’s --alarm-actions. I ask AI to generate the full set of create commands for a routing policy I describe (“warnings to email, criticals to the pager”), then I review the addresses by hand. Misrouting a critical alarm is its own incident.

Keeping the Human in the Loop

Monasca evaluates and notifies; it does not provision. But a bad alarm definition causes alert fatigue, and alert fatigue causes missed real incidents — so the stakes are real even without a destructive blast radius. My rules:

The AI drafts alarm expressions and notification configs; it never holds production credentials.
Every new definition is tested against real metric statistics before it pages anyone.
I review compound logic by having the model explain it back, then confirming against what I actually meant.

When a Monasca alarm escalates into a genuine outage, I track the diagnosis in my incident response dashboard. My vetted Monasca prompts live in the prompt workspace, and the reusable templates are in the OpenStack prompt pack. For lighter local drafting I sometimes use Gemma when I do not need a frontier model.

Wrapping Up

Monasca gives you real multi-tenant monitoring without bolting together five tools, and an AI assistant takes the pain out of its expression language. The pattern holds: describe the alert in English, let the model write the expression, ground it in real metric names, and review before it can page anyone. Do that and you get monitoring that catches problems instead of one that cries wolf.

The lasting value is that tenants can own their own alarms. Because Monasca is Keystone-scoped, a project team can define the alerts that matter to their workload without you mediating every request — and an AI assistant lowers the bar enough that non-experts can write a sane expression on the first try. Monitoring stops being a central bottleneck and becomes a self-service capability, which is exactly what a multi-tenant cloud is supposed to deliver. That shift, more than any single alarm, is what makes Monasca worth the setup.

If you want Monasca tuned to alert on the things that actually matter for your cloud, work with me, or keep reading across the OpenStack category and the prompt library.