Prometheus Error Guide: 'found error when loading rules' Invalid Rule Group
Fix Prometheus 'found error when loading rules' and 'could not parse expression' failures: validate PromQL, fix templating, dedupe rule names, and unit-test with promtool.
- #prometheus-monitoring
- #troubleshooting
- #errors
- #alerting-rules
Exact Error Message
A bad recording or alerting rule fails the same load/reload path that parses prometheus.yml. At startup the log reads:
ts=2026-06-27T10:02:41.553Z caller=manager.go:1009 level=error component="rule manager" msg="loading groups failed" err="/etc/prometheus/rules/app.yml: group \"app-latency\", rule 2, \"HighErrorRate\": could not parse expression: 1:18: parse error: unexpected character: '%'"
promtool check rules reports the same thing, prefixed with found error when loading rules:
$ promtool check rules /etc/prometheus/rules/app.yml
Checking /etc/prometheus/rules/app.yml
FAILED:
found error when loading rules:
group "app-latency", rule 2, "HighErrorRate": could not parse expression: 1:18: parse error: unexpected character: '%'
On a live reload it comes back as HTTP 400:
$ curl -s -XPOST http://localhost:9090/-/reload
failed to reload config: error loading rules, previous rule set restored
What the Error Means
Rule files referenced by rule_files in prometheus.yml are loaded into named groups, each containing an ordered list of recording rules (record:) and alerting rules (alert:). Every rule’s expr is parsed as PromQL, and alert labels/annotations are parsed as Go templates. If any rule in any group fails to parse, the entire rule set is rejected — Prometheus restores the previous rules and the new ones never load.
The error always names the file, the group, the rule index, the rule name, and the precise parse position (1:18), so you can jump straight to the broken character. This is purely a parse-time failure; it has nothing to do with whether the query returns data.
Common Causes
- PromQL syntax error in an
expr— a stray%, unbalanced parentheses, or a misspelled function (could not parse expression). - Duplicate rule names — two alerting rules with the same
alert:name in one group, or two recording rules writing the samerecord:metric. - Invalid
forduration —for: 5instead offor: 5m(not a valid duration string). - Broken templating in
labels/annotations—{{ $value }}or{{ $labels.instance }}with a missing brace, a bad pipeline, or{{ .Value }}(wrong variable). - Missing
record/alertkey — a rule entry that is neither a recording rule nor an alerting rule. rule_filesglob not matching — the pattern matches nothing, so rules silently never load (no error, but no alerts either).- Indentation errors — a
labels:block under the wrong parent, collapsing two rules into one malformed entry.
How to Reproduce the Error
Write a rule with a PromQL typo and a templating mistake:
# /etc/prometheus/rules/app.yml
groups:
- name: app-latency
rules:
- record: job:request_errors:rate5m
expr: sum(rate(http_requests_total{code="5xx"}[5m])) by (job)
- alert: HighErrorRate
expr: job:request_errors:rate5m / job:request:rate5m > 5% # '%' is not valid PromQL
for: 5 # missing unit
labels:
severity: page
annotations:
summary: "Errors at {{ $value }}" # ok
detail: "instance {{ $labels.instance }" # missing closing brace
promtool check rules /etc/prometheus/rules/app.yml
# -> found error when loading rules:
# group "app-latency", rule 2, "HighErrorRate": could not parse expression ...
Diagnostic Commands
Validate the specific rule file (or a glob of them):
promtool check rules /etc/prometheus/rules/*.yml
Checking /etc/prometheus/rules/app.yml
FAILED:
found error when loading rules:
group "app-latency", rule 2, "HighErrorRate": could not parse expression: 1:42: parse error: unexpected character: '%'
Confirm the config that references the rules is itself valid (a broken rule_files glob hides here):
promtool check config /etc/prometheus/prometheus.yml
See whether the running server kept the old rules after a failed reload:
curl -s http://localhost:9090/api/v1/rules | jq -r '.data.groups[].name'
journalctl -u prometheus --no-pager | grep -iE 'loading groups failed|rule manager|err=' | tail -20
Run unit tests against the rules to verify behaviour, not just syntax:
promtool test rules /etc/prometheus/rules/tests/app_test.yml
Unit Testing: /etc/prometheus/rules/tests/app_test.yml
SUCCESS
Step-by-Step Resolution
-
Read the position pointer. The error gives
group,rule N, the rule name, and aline:collike1:42. Open that rule’sexprand look at that column. -
Fix the PromQL. Express percentages as ratios, not
%. Replace> 5%with> 0.05:- alert: HighErrorRate expr: job:request_errors:rate5m / job:request:rate5m > 0.05 for: 5m -
Add duration units to
for(andkeep_firing_forif used):5m,1h,30s. A bare number fails to parse. -
Fix templating.
labelsandannotationsuse Go templates. The sample value is{{ $value }}, labels are{{ $labels.instance }}, andhumanizehelpers are common. Balance every{{ }}:annotations: summary: "Error ratio {{ $value | humanizePercentage }}" detail: "instance {{ $labels.instance }}" -
Dedupe names. Within a group, alert names should be unique and recording rules must not write the same
record:metric twice. Follow thelevel:metric:operationsrecording-rule naming convention (e.g.,job:http_requests:rate5m) so collisions are obvious. -
Ensure each rule has exactly one of
record:oralert:and that indentation placeslabels/annotations/exprunder the correct rule. -
Fix the
rule_filesglob so it actually matches. Ifpromtool check configpasses but no rules appear in/api/v1/rules, the pattern matched nothing:rule_files: - /etc/prometheus/rules/*.yml -
Validate, reload, confirm:
promtool check rules /etc/prometheus/rules/*.yml \ && curl -s -XPOST http://localhost:9090/-/reload \ && curl -s http://localhost:9090/api/v1/rules | jq -r '.data.groups[].name'
Prevention and Best Practices
- Run
promtool check ruleson every rule file in CI; one broken rule rejects the whole set, so a single bad merge can drop all your alerts. - Add
promtool test rulesunit tests for critical alerts so you validate firing behaviour, not just syntax. - Adopt the
level:metric:operationsrecording-rule naming convention to prevent duplicaterecord:collisions. - Keep alerting and config validation in the same pipeline as config reload checks — both share the reload path and both can return HTTP 400.
- Alert on
prometheus_rule_group_last_evaluation_samples == 0and on reload failures so silently dropped rules surface. - The free incident assistant can parse a
found error when loading rulesmessage and point at the exactexpror template fragment; see more under Prometheus and monitoring.
Related Errors
- Config reload failed / HTTP 400 — the parent reload error; a bad rule file makes the whole reload fail the same way.
- Duplicate scrape job name — another load-time
Error loading configvariant in the scrape section. out of order sample— an ingestion error affecting recording-rule output written back to the TSDB.
Frequently Asked Questions
Does one broken rule disable all my alerts? Yes. If any rule in any group fails to parse, the entire rule set is rejected and the previous one is restored. Validate every file before reloading.
What does the 1:42 in the error mean?
It is the line and column within the rule’s expr where parsing failed. Most rule expressions are a single line, so it usually points at a character offset in that one expression.
How do I write a percentage threshold without %?
PromQL has no % operator. Use a ratio: errors / total > 0.05 for 5 percent, optionally formatting the display value with {{ $value | humanizePercentage }}.
Why do my rules pass check config but never fire?
promtool check config validates that rule_files paths/globs are syntactically fine but a glob matching zero files produces no error. Check GET /api/v1/rules to confirm the groups actually loaded.
What’s the difference between promtool check rules and promtool test rules?
check rules validates syntax (PromQL parses, durations valid, templates balanced). test rules runs a synthetic time series through your rules and asserts the expected alerts and recorded values — it catches logic errors, not just typos.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.