Building Incident Timelines From Prometheus Data With AI

The most tedious part of writing a postmortem is reconstructing the timeline — scrolling back through Grafana, eyeballing when latency spiked, cross-referencing the deploy log, and writing “at 14:32, error rate began climbing.” It’s an hour of mechanical work that AI can do in minutes, because the raw material is right there in Prometheus as queryable time series. But a timeline isn’t just a list of events; it’s an implicit causal narrative, and the moment you let a model order events it starts implying causality that the data doesn’t support. “Latency rose after the deploy” becomes “the deploy caused the latency” in the model’s prose, and now your postmortem asserts a root cause you never verified. Here’s how I build timelines that are fast and honest.

Metrics give you the what, never the why

Prometheus knows precisely when error rate crossed a threshold; it has no idea why. This is the central discipline of AI-assisted timelines: the model can extract and order the what — when each metric moved — but every why is a hypothesis a human must verify. I keep these strictly separated. The timeline is evidence; the causal story is analysis, and I never let the model blur them.

Step one: query the raw movements, don’t summarize from memory

The mistake is describing the incident to the model and asking it to write the timeline. It will produce fluent prose untethered from data. Instead I pull the actual time series for the incident window and feed them in as evidence. A range query over the incident gives the model real numbers to anchor on:

curl -s 'http://localhost:9090/api/v1/query_range' \
  --data-urlencode 'query=sum(rate(http_requests_total{code=~"5.."}[1m]))' \
  --data-urlencode 'start=2026-06-15T14:00:00Z' \
  --data-urlencode 'end=2026-06-15T15:00:00Z' \
  --data-urlencode 'step=30s' | jq '.data.result'

Now when the model writes “error rate climbed from near zero to 40/s at 14:32,” that’s a fact pulled from the data, not a plausible invention. Every timestamp in the timeline must trace back to a query result.

Step two: identify the change points with PromQL, not prose

Rather than asking the model to eyeball when things changed, I have it identify change points with queries it can verify. A deriv() or a simple threshold crossing pins the moment precisely:

# When did p95 latency first cross 1s in the window?
histogram_quantile(0.95,
  sum by (le) (rate(http_request_duration_seconds_bucket[5m]))) > 1

I ask the model to report the exact timestamp each key metric crossed its threshold, and I can re-run the query to confirm. This turns “around 14:30-ish” into “14:32:30,” which matters when you’re correlating with a deploy that happened at 14:31.

Pro Tip: Have the AI build the timeline as a table with a column for the evidence query behind each row. A timeline entry with no query backing it is a claim, not a fact — and the empty cells are exactly where the model started narrating instead of reporting. The query column makes fabrication visible.

Step three: correlate, but force the model to hedge causality

Once the metric movements are pinned, I overlay other evidence — deploys, config changes, dependency alerts — and ask the model to correlate, explicitly forbidding causal language unless the data forces it. The phrasing matters:

Bad (asserts cause):   "The 14:31 deploy caused error rates to spike."
Good (states timing):  "Error rate began rising at 14:32, one minute after
                        the 14:31 deploy. The two are correlated in time;
                        causation is unconfirmed pending log review."

This is where the model is a fast junior engineer who’s eager to look insightful by declaring a root cause. My job is to make it report the sequence and flag the correlation while leaving the causal call to the humans in the review. A postmortem that asserts an unverified root cause is worse than one that honestly says “correlated, unconfirmed.”

Step four: cross-check against the alert firing times

Prometheus and Alertmanager record when alerts actually fired, which is independent evidence of the timeline. I have the model reconcile its metric-derived timeline against the ALERTS series, because a discrepancy is informative — if error rate crossed the threshold at 14:32 but the alert didn’t fire until 14:40, that eight-minute gap is itself a finding about a for: duration that’s too long:

ALERTS{alertname="HighErrorRate", alertstate="firing"}

This reconciliation often surfaces the most actionable item in the whole postmortem: the alert was slow, and that’s why the response was slow.

Step five: human review owns the narrative

The model produces a draft timeline — ordered, timestamped, evidence-backed, causally hedged. Then a human who was in the incident reviews it, because they know things the metrics don’t: that the 14:35 “recovery” was actually a failover, that the deploy was unrelated, that the real trigger was a partner API the metrics didn’t cover. The model’s draft saves the hour of mechanical reconstruction; the human adds the context that turns a timeline into a postmortem. Our incident response dashboard is where I assemble the draft alongside the alert and log evidence, keeping the metric timeline and the human analysis in one place.

Keep the explainability bar high

The principle that keeps AI-built timelines trustworthy: every line must be explainable from evidence, and every causal claim must come from a human, not the model. The AI is fast at the tedious extraction and ordering — genuinely a huge time save on the part of postmortems everyone hates. But the narrative, the root cause, and the lessons are human work, because those are exactly where a confident model invents a story the data can’t back. I draft these in Claude, and the alert-tuning follow-ups — like fixing that slow for: — go through the Alert Rule Generator.

Conclusion

Prometheus holds a precise record of what happened during an incident, and AI can turn that into a timeline in minutes instead of an hour. But metrics never explain why, so the discipline is to make the model extract and order facts with a query behind every line, hedge all causality, reconcile against actual alert firing times, and hand the narrative to a human who was there. A fast, honest timeline beats a fast, confident fiction every time. More incident and alerting patterns are in the monitoring guides, and postmortem prompt templates are in the prompts library.