Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Prometheus & Monitoring By James Joyner IV · · 9 min read

Prometheus Error Guide: 'found error when loading rules' Invalid Rule Group

Fix Prometheus 'found error when loading rules' and 'could not parse expression' failures: validate PromQL, fix templating, dedupe rule names, and unit-test with promtool.

  • #prometheus-monitoring
  • #troubleshooting
  • #errors
  • #alerting-rules

Exact Error Message

A bad recording or alerting rule fails the same load/reload path that parses prometheus.yml. At startup the log reads:

ts=2026-06-27T10:02:41.553Z caller=manager.go:1009 level=error component="rule manager" msg="loading groups failed" err="/etc/prometheus/rules/app.yml: group \"app-latency\", rule 2, \"HighErrorRate\": could not parse expression: 1:18: parse error: unexpected character: '%'"

promtool check rules reports the same thing, prefixed with found error when loading rules:

$ promtool check rules /etc/prometheus/rules/app.yml
Checking /etc/prometheus/rules/app.yml
  FAILED:
found error when loading rules:
group "app-latency", rule 2, "HighErrorRate": could not parse expression: 1:18: parse error: unexpected character: '%'

On a live reload it comes back as HTTP 400:

$ curl -s -XPOST http://localhost:9090/-/reload
failed to reload config: error loading rules, previous rule set restored

What the Error Means

Rule files referenced by rule_files in prometheus.yml are loaded into named groups, each containing an ordered list of recording rules (record:) and alerting rules (alert:). Every rule’s expr is parsed as PromQL, and alert labels/annotations are parsed as Go templates. If any rule in any group fails to parse, the entire rule set is rejected — Prometheus restores the previous rules and the new ones never load.

The error always names the file, the group, the rule index, the rule name, and the precise parse position (1:18), so you can jump straight to the broken character. This is purely a parse-time failure; it has nothing to do with whether the query returns data.

Common Causes

  1. PromQL syntax error in an expr — a stray %, unbalanced parentheses, or a misspelled function (could not parse expression).
  2. Duplicate rule names — two alerting rules with the same alert: name in one group, or two recording rules writing the same record: metric.
  3. Invalid for durationfor: 5 instead of for: 5m (not a valid duration string).
  4. Broken templating in labels/annotations{{ $value }} or {{ $labels.instance }} with a missing brace, a bad pipeline, or {{ .Value }} (wrong variable).
  5. Missing record/alert key — a rule entry that is neither a recording rule nor an alerting rule.
  6. rule_files glob not matching — the pattern matches nothing, so rules silently never load (no error, but no alerts either).
  7. Indentation errors — a labels: block under the wrong parent, collapsing two rules into one malformed entry.

How to Reproduce the Error

Write a rule with a PromQL typo and a templating mistake:

# /etc/prometheus/rules/app.yml
groups:
  - name: app-latency
    rules:
      - record: job:request_errors:rate5m
        expr: sum(rate(http_requests_total{code="5xx"}[5m])) by (job)
      - alert: HighErrorRate
        expr: job:request_errors:rate5m / job:request:rate5m > 5%   # '%' is not valid PromQL
        for: 5                                                       # missing unit
        labels:
          severity: page
        annotations:
          summary: "Errors at {{ $value }}"                         # ok
          detail: "instance {{ $labels.instance }"                  # missing closing brace
promtool check rules /etc/prometheus/rules/app.yml
# -> found error when loading rules:
#    group "app-latency", rule 2, "HighErrorRate": could not parse expression ...

Diagnostic Commands

Validate the specific rule file (or a glob of them):

promtool check rules /etc/prometheus/rules/*.yml
Checking /etc/prometheus/rules/app.yml
  FAILED:
found error when loading rules:
group "app-latency", rule 2, "HighErrorRate": could not parse expression: 1:42: parse error: unexpected character: '%'

Confirm the config that references the rules is itself valid (a broken rule_files glob hides here):

promtool check config /etc/prometheus/prometheus.yml

See whether the running server kept the old rules after a failed reload:

curl -s http://localhost:9090/api/v1/rules | jq -r '.data.groups[].name'
journalctl -u prometheus --no-pager | grep -iE 'loading groups failed|rule manager|err=' | tail -20

Run unit tests against the rules to verify behaviour, not just syntax:

promtool test rules /etc/prometheus/rules/tests/app_test.yml
Unit Testing:  /etc/prometheus/rules/tests/app_test.yml
  SUCCESS

Step-by-Step Resolution

  1. Read the position pointer. The error gives group, rule N, the rule name, and a line:col like 1:42. Open that rule’s expr and look at that column.

  2. Fix the PromQL. Express percentages as ratios, not %. Replace > 5% with > 0.05:

    - alert: HighErrorRate
      expr: job:request_errors:rate5m / job:request:rate5m > 0.05
      for: 5m
  3. Add duration units to for (and keep_firing_for if used): 5m, 1h, 30s. A bare number fails to parse.

  4. Fix templating. labels and annotations use Go templates. The sample value is {{ $value }}, labels are {{ $labels.instance }}, and humanize helpers are common. Balance every {{ }}:

    annotations:
      summary: "Error ratio {{ $value | humanizePercentage }}"
      detail: "instance {{ $labels.instance }}"
  5. Dedupe names. Within a group, alert names should be unique and recording rules must not write the same record: metric twice. Follow the level:metric:operations recording-rule naming convention (e.g., job:http_requests:rate5m) so collisions are obvious.

  6. Ensure each rule has exactly one of record: or alert: and that indentation places labels/annotations/expr under the correct rule.

  7. Fix the rule_files glob so it actually matches. If promtool check config passes but no rules appear in /api/v1/rules, the pattern matched nothing:

    rule_files:
      - /etc/prometheus/rules/*.yml
  8. Validate, reload, confirm:

    promtool check rules /etc/prometheus/rules/*.yml \
      && curl -s -XPOST http://localhost:9090/-/reload \
      && curl -s http://localhost:9090/api/v1/rules | jq -r '.data.groups[].name'

Prevention and Best Practices

  • Run promtool check rules on every rule file in CI; one broken rule rejects the whole set, so a single bad merge can drop all your alerts.
  • Add promtool test rules unit tests for critical alerts so you validate firing behaviour, not just syntax.
  • Adopt the level:metric:operations recording-rule naming convention to prevent duplicate record: collisions.
  • Keep alerting and config validation in the same pipeline as config reload checks — both share the reload path and both can return HTTP 400.
  • Alert on prometheus_rule_group_last_evaluation_samples == 0 and on reload failures so silently dropped rules surface.
  • The free incident assistant can parse a found error when loading rules message and point at the exact expr or template fragment; see more under Prometheus and monitoring.
  • Config reload failed / HTTP 400 — the parent reload error; a bad rule file makes the whole reload fail the same way.
  • Duplicate scrape job name — another load-time Error loading config variant in the scrape section.
  • out of order sample — an ingestion error affecting recording-rule output written back to the TSDB.

Frequently Asked Questions

Does one broken rule disable all my alerts? Yes. If any rule in any group fails to parse, the entire rule set is rejected and the previous one is restored. Validate every file before reloading.

What does the 1:42 in the error mean? It is the line and column within the rule’s expr where parsing failed. Most rule expressions are a single line, so it usually points at a character offset in that one expression.

How do I write a percentage threshold without %? PromQL has no % operator. Use a ratio: errors / total > 0.05 for 5 percent, optionally formatting the display value with {{ $value | humanizePercentage }}.

Why do my rules pass check config but never fire? promtool check config validates that rule_files paths/globs are syntactically fine but a glob matching zero files produces no error. Check GET /api/v1/rules to confirm the groups actually loaded.

What’s the difference between promtool check rules and promtool test rules? check rules validates syntax (PromQL parses, durations valid, templates balanced). test rules runs a synthetic time series through your rules and asserts the expected alerts and recorded values — it catches logic errors, not just typos.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.