Prometheus Error Guide: 'scrape sample limit exceeded' Target Down on Cardinality
Fix Prometheus 'sample limit exceeded' target-down errors: count exposed series, identify high-cardinality exporters, drop noisy metrics, and raise sample_limit safely.
- #prometheus-monitoring
- #troubleshooting
- #errors
- #cardinality
Exact Error Message
A target trips sample_limit and is marked down, with the reason shown on the targets page and in the log:
ts=2026-06-27T12:48:13.907Z caller=scrape.go:1701 level=warn component="scrape manager" scrape_pool=kube-state-metrics target=http://10.0.4.22:8080/metrics msg="Append failed" err="sample limit exceeded"
On /api/v1/targets the target’s lastError reads:
"health": "down",
"lastError": "sample limit exceeded",
"scrapeUrl": "http://10.0.4.22:8080/metrics"
Older builds phrase it as target exceeded the sample limit or scrape sample limit exceeded. In every case the entire scrape is discarded — not just the overflowing samples — so the target goes from up 1 to up 0.
What the Error Means
sample_limit is a per-scrape cap on how many samples (series) a single scrape may return. When a target’s /metrics response exceeds it, Prometheus rejects the whole scrape: no samples from that response are stored and the target’s up becomes 0. The limit exists to protect the TSDB from a single exporter that suddenly explodes in cardinality (a runaway label, per-request IDs) and blows up memory and disk.
This is a cardinality guardrail, not a label-name guardrail. It is distinct from label_limit (which caps labels per series) and from the query-time too many samples error (which is about evaluating a PromQL expression over too wide a range, not scraping).
Common Causes
- Target legitimately exposes more series than
sample_limitallows — the fleet grew and the cap was never revisited. - High-cardinality exporter — labels carry per-request data, user IDs, full URLs, or UUIDs, multiplying series without bound.
- A single misbehaving exporter — one buggy instance emits an unbounded label (e.g., a path with embedded IDs) while peers are fine.
sample_limitset too low — a conservative default (e.g.,1000) applied to a naturally large exporter like kube-state-metrics or cAdvisor.- cAdvisor / kube-state-metrics explosion — short-lived pods, many containers, or churned objects inflate series counts past the cap.
How to Reproduce the Error
Set a deliberately low limit on a job and point it at a large exporter:
# /etc/prometheus/prometheus.yml
scrape_configs:
- job_name: "kube-state-metrics"
sample_limit: 1000 # too low for a real KSM target
static_configs:
- targets: ["10.0.4.22:8080"]
curl -s -XPOST http://localhost:9090/-/reload
# next scrape:
# err="sample limit exceeded" and target health -> down
Diagnostic Commands
Find which targets are down for sample-limit reasons:
curl -s http://localhost:9090/api/v1/targets \
| jq -r '.data.activeTargets[] | select(.lastError|test("sample limit")) | [.scrapePool,.scrapeUrl,.lastError] | @tsv'
kube-state-metrics http://10.0.4.22:8080/metrics sample limit exceeded
Count how many series the target actually exposes (count non-comment, non-blank lines):
curl -s http://10.0.4.22:8080/metrics | grep -vcE '^(#|$)'
14732
Find the configured sample_limit for that job:
curl -s http://localhost:9090/api/v1/status/config | jq -r '.data.yaml' \
| grep -A6 'kube-state-metrics' | grep sample_limit
See how many samples Prometheus managed to read before rejecting, and rank the biggest metrics:
scrape_samples_scraped{job="kube-state-metrics"}
topk(10, count by (__name__) ({job="kube-state-metrics"}))
{__name__="kube_pod_status_phase"} 4821
{__name__="kube_pod_container_status_restarts_total"} 2960
Read the log for the rejection:
journalctl -u prometheus --no-pager | grep -iE 'sample limit' | tail -10
Step-by-Step Resolution
-
Confirm the cause. Compare the target’s actual line count against the configured
sample_limit. If the count is a legitimate, stable number, the limit is simply too low. If it is climbing fast, you have a cardinality bug. -
If the volume is legitimate, raise the limit deliberately — pick a value with headroom over the real series count, not an unbounded one:
- job_name: "kube-state-metrics" sample_limit: 50000 static_configs: - targets: ["10.0.4.22:8080"] -
If a high-cardinality metric is the culprit, drop it with
metric_relabel_configsrather than raising the cap. Drop the noisy series entirely:- job_name: "kube-state-metrics" sample_limit: 20000 metric_relabel_configs: - source_labels: [__name__] regex: kube_pod_container_status_last_terminated_reason action: drop -
Strip an unbounded label (keep the metric, remove the high-cardinality dimension) when the metric is useful but a label explodes:
metric_relabel_configs: - regex: "id|request_id|path" action: labeldrop -
Fix the exporter when a single instance misbehaves — replace per-request IDs or full URLs in labels with bounded values (status class, route template). This is the only durable fix for a true cardinality bug.
-
Validate, reload, confirm the target recovers:
promtool check config /etc/prometheus/prometheus.yml \ && curl -s -XPOST http://localhost:9090/-/reload \ && sleep 20 \ && curl -s http://localhost:9090/api/v1/targets \ | jq -r '.data.activeTargets[] | select(.scrapePool=="kube-state-metrics") | [.health,.lastError] | @tsv'
Prevention and Best Practices
- Set
sample_limiton every job as a tripwire, sized with headroom over the current count — it catches cardinality explosions before they fill the TSDB. - Alert on approach:
scrape_samples_scraped / scrape_sample_limit > 0.8warns you before a scrape is rejected. - Use
metric_relabel_configsto drop high-cardinality series at scrape time rather than raising limits indefinitely. - Never put unbounded values (request IDs, raw paths, UUIDs) in labels; use route templates and status classes.
- Review cAdvisor and kube-state-metrics limits whenever cluster size changes; they scale with pods and objects.
- The free incident assistant can rank a down target’s metrics by series count and suggest the
metric_relabel_configsdrop rule; see more under Prometheus and monitoring.
Related Errors
label_limit/label_name_length_limit— a related but distinct guardrail capping labels per series, not total samples per scrape.- The query-time
query processing would load too many sampleserror — about evaluating PromQL over too wide a range, not scraping; do not confuse the two. - Config reload failed / HTTP 400 — what you hit if the
sample_limitvalue ormetric_relabel_configsregex is malformed.
Frequently Asked Questions
Does hitting sample_limit drop only the extra samples?
No. The entire scrape is rejected and the target is marked down. You lose every sample from that response, not just the overflow.
Should I raise sample_limit or drop metrics?
Raise it only when the series count is legitimate and stable. If a label or metric is exploding, drop it with metric_relabel_configs and fix the exporter — raising the cap just postpones the problem.
How is this different from label_limit?
sample_limit caps total series per scrape (cardinality). label_limit caps the number of labels on a single series. They guard different failure modes and are configured separately.
Is this the same as the query ‘too many samples’ error?
No. That error happens at query evaluation when a PromQL expression spans too many samples. sample limit exceeded happens at scrape time. Different subsystem, different fix.
How do I see how many series a target exposes?
Count non-comment, non-blank lines of its /metrics (grep -vcE '^(#|$)') or read scrape_samples_scraped{job="..."} in Prometheus. Compare that to the job’s sample_limit.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.