Enriching Prometheus Alert Annotations With Live Query

The difference between a good page and a bad one is what’s in the annotation. A bad page says “HighErrorRate firing on api.” A good page says “error rate is 47/s (8% of traffic), p95 latency 2.3s, last deploy 12 minutes ago — runbook attached.” The first sends on-call scrambling to a dashboard at 3am; the second lets them start diagnosing from the notification itself. The good annotation is built from Prometheus’s templating, which can embed label values and even query results into the message — and writing those templates correctly is fiddly Go-template-meets-PromQL work that AI is genuinely good at drafting. The trap is that a wrong template renders silently and ships a page with a blank where the context should be. Here’s how I get rich, correct annotations.

What alert annotation templating can actually do

Prometheus annotations are Go templates with access to the alert’s labels, its value, and helper functions — including the ability to run additional queries at render time. Most teams use a fraction of this, producing bare annotations because the templating syntax is unfamiliar. AI knows the syntax well, so it’s a fast way to level up your annotations from “what fired” to “here’s the context to act”:

annotations:
  summary: "Error rate {{ $value | humanize }}/s on {{ $labels.service }}"
  description: |
    Service {{ $labels.service }} is returning errors at
    {{ $value | humanize }}/s. Investigate the recent deploy and
    check downstream dependencies before escalating.
  runbook_url: "https://runbooks.internal/{{ $labels.service }}/errors"

The {{ $value }} and {{ $labels.x }} references are the foundation, and the model gets these right because it’s seen thousands of rules. The risk starts when you want richer context.

Pulling live context with template queries

Prometheus templates can run queries at render time via the query function, which is how you embed “and by the way, latency is also high” into an error-rate page. This is powerful and where AI both helps and stumbles, because the syntax is gnarly:

annotations:
  description: |
    Error rate is {{ $value | humanize }}/s.
    Current p95 latency:
    {{ with query (printf "histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket{service='%s'}[5m])))" .Labels.service) }}
    {{ . | first | value | humanizeDuration }}{{ end }}

The model drafts this faster than I can recall the with query ... first | value dance, but it occasionally mismatches a label selector or forgets the first, which renders an empty string. So I always test-render before trusting it.

Always test-render the template

A broken annotation template doesn’t throw — it renders a blank or a literal <no value> and the page ships degraded. The only way to catch it is to actually render it, which I do with amtool or by firing a test alert:

# Render an alert's annotations against a live Prometheus
amtool alert add alertname=HighErrorRate service=api \
  --alertmanager.url=http://localhost:9093

If the latency line comes back blank, the template query has a bad selector — usually a label name that doesn’t match. This is the verification step the model can’t do for you, because it can’t see whether service is the real label name in your metrics or whether the embedded query returns data.

Pro Tip: For every annotation that embeds a query, ask the AI to write out the exact PromQL it’s embedding as a standalone query first, run that in the expression browser to confirm it returns data, and only then wrap it in the template. Debugging a flat query is trivial; debugging it through three layers of Go templating at 3am is not.

Don’t over-stuff the page

There’s a failure mode in the other direction: an annotation so crammed with embedded queries that the page is a wall of text on-call won’t read, and so many render-time queries that the alert is slow to fire. I have the model propose context, then I cut it to the three or four things that actually change the first action: current value, a correlated metric, time since last deploy, and the runbook link. More than that is noise, and noise in a page is its own kind of alert fatigue. The judgment about what matters for this service is mine; the model just drafts the templating.

Make the embedded queries cheap

Render-time queries run every time the alert evaluates, so an expensive embedded query is a self-inflicted load problem. I check that any query in an annotation is bounded and cheap — a single instant value, not a heavy aggregation over high cardinality. If the context I want is expensive to compute, it belongs in a recording rule that the annotation reads from, not a raw query in the template. The model won’t think about this cost unless I raise it; it optimizes for “does it render,” not “is it cheap to render.”

Build it once, template it everywhere

The real win is consistency: every alert from a service should carry the same shape of context so on-call learns to read it at a glance. I draft a standard annotation block with the model, verify it renders, and then reuse it across the service’s rules. The Alert Rule Generator emits alerts with consistent summary and runbook annotations as a baseline, and I layer the richer context templates on top. I keep the verified template snippets in the prompt workspace so the team isn’t re-deriving the with query syntax each time, and the rules go through our code review dashboard where “did you test-render the annotation?” is a standing check.

The human owns what’s worth saying

The division of labor is the same as always: AI is a fast junior engineer who knows the Go-template and PromQL syntax cold and will draft a rich annotation in seconds. What it can’t know is which context actually helps your on-call make the first decision, whether the embedded query references real labels, and whether it’s cheap enough to run every evaluation. So the model drafts, you test-render, you cut to what matters, and you confirm you can explain why each line earns its place on the page. An annotation you can’t explain is clutter; one you can is the difference between a five-minute response and a thirty-minute one.

Conclusion

The annotation is where an alert becomes actionable, and AI is a genuinely good drafter of the fiddly templating that pulls live context into a page. But test-render every template, keep embedded queries cheap and real, cut the context down to what changes the first action, and never ship an annotation you can’t explain. A page that diagnoses itself is the highest-leverage alerting improvement you can make. More alert-quality patterns are in the monitoring guides, and annotation prompt templates are in the prompts library.

Enriching Prometheus Alert Annotations With Live Query Context