Prometheus Error Guide: 'scrape sample limit exceeded'

Exact Error Message

A target trips sample_limit and is marked down, with the reason shown on the targets page and in the log:

ts=2026-06-27T12:48:13.907Z caller=scrape.go:1701 level=warn component="scrape manager" scrape_pool=kube-state-metrics target=http://10.0.4.22:8080/metrics msg="Append failed" err="sample limit exceeded"

On /api/v1/targets the target’s lastError reads:

"health": "down",
"lastError": "sample limit exceeded",
"scrapeUrl": "http://10.0.4.22:8080/metrics"

Older builds phrase it as target exceeded the sample limit or scrape sample limit exceeded. In every case the entire scrape is discarded — not just the overflowing samples — so the target goes from up 1 to up 0.

What the Error Means

sample_limit is a per-scrape cap on how many samples (series) a single scrape may return. When a target’s /metrics response exceeds it, Prometheus rejects the whole scrape: no samples from that response are stored and the target’s up becomes 0. The limit exists to protect the TSDB from a single exporter that suddenly explodes in cardinality (a runaway label, per-request IDs) and blows up memory and disk.

This is a cardinality guardrail, not a label-name guardrail. It is distinct from label_limit (which caps labels per series) and from the query-time too many samples error (which is about evaluating a PromQL expression over too wide a range, not scraping).

Common Causes

Target legitimately exposes more series than sample_limit allows — the fleet grew and the cap was never revisited.
High-cardinality exporter — labels carry per-request data, user IDs, full URLs, or UUIDs, multiplying series without bound.
A single misbehaving exporter — one buggy instance emits an unbounded label (e.g., a path with embedded IDs) while peers are fine.
sample_limit set too low — a conservative default (e.g., 1000) applied to a naturally large exporter like kube-state-metrics or cAdvisor.
cAdvisor / kube-state-metrics explosion — short-lived pods, many containers, or churned objects inflate series counts past the cap.

How to Reproduce the Error

Set a deliberately low limit on a job and point it at a large exporter:

# /etc/prometheus/prometheus.yml
scrape_configs:
  - job_name: "kube-state-metrics"
    sample_limit: 1000          # too low for a real KSM target
    static_configs:
      - targets: ["10.0.4.22:8080"]

curl -s -XPOST http://localhost:9090/-/reload
# next scrape:
# err="sample limit exceeded"   and target health -> down

Diagnostic Commands

Find which targets are down for sample-limit reasons:

curl -s http://localhost:9090/api/v1/targets \
  | jq -r '.data.activeTargets[] | select(.lastError|test("sample limit")) | [.scrapePool,.scrapeUrl,.lastError] | @tsv'

kube-state-metrics  http://10.0.4.22:8080/metrics  sample limit exceeded

Count how many series the target actually exposes (count non-comment, non-blank lines):

curl -s http://10.0.4.22:8080/metrics | grep -vcE '^(#|$)'

Find the configured sample_limit for that job:

curl -s http://localhost:9090/api/v1/status/config | jq -r '.data.yaml' \
  | grep -A6 'kube-state-metrics' | grep sample_limit

See how many samples Prometheus managed to read before rejecting, and rank the biggest metrics:

scrape_samples_scraped{job="kube-state-metrics"}

topk(10, count by (__name__) ({job="kube-state-metrics"}))

{__name__="kube_pod_status_phase"}  4821
{__name__="kube_pod_container_status_restarts_total"}  2960

Read the log for the rejection:

journalctl -u prometheus --no-pager | grep -iE 'sample limit' | tail -10

Step-by-Step Resolution

Confirm the cause. Compare the target’s actual line count against the configured sample_limit. If the count is a legitimate, stable number, the limit is simply too low. If it is climbing fast, you have a cardinality bug.
If the volume is legitimate, raise the limit deliberately — pick a value with headroom over the real series count, not an unbounded one:
```
- job_name: "kube-state-metrics"
  sample_limit: 50000
  static_configs:
    - targets: ["10.0.4.22:8080"]
```

If a high-cardinality metric is the culprit, drop it with metric_relabel_configs rather than raising the cap. Drop the noisy series entirely:

- job_name: "kube-state-metrics"
  sample_limit: 20000
  metric_relabel_configs:
    - source_labels: [__name__]
      regex: kube_pod_container_status_last_terminated_reason
      action: drop

Strip an unbounded label (keep the metric, remove the high-cardinality dimension) when the metric is useful but a label explodes:
```
metric_relabel_configs:
  - regex: "id|request_id|path"
    action: labeldrop
```
Fix the exporter when a single instance misbehaves — replace per-request IDs or full URLs in labels with bounded values (status class, route template). This is the only durable fix for a true cardinality bug.

Validate, reload, confirm the target recovers:

promtool check config /etc/prometheus/prometheus.yml \
  && curl -s -XPOST http://localhost:9090/-/reload \
  && sleep 20 \
  && curl -s http://localhost:9090/api/v1/targets \
     | jq -r '.data.activeTargets[] | select(.scrapePool=="kube-state-metrics") | [.health,.lastError] | @tsv'

Prevention and Best Practices

Set sample_limit on every job as a tripwire, sized with headroom over the current count — it catches cardinality explosions before they fill the TSDB.
Alert on approach: scrape_samples_scraped / scrape_sample_limit > 0.8 warns you before a scrape is rejected.
Use metric_relabel_configs to drop high-cardinality series at scrape time rather than raising limits indefinitely.
Never put unbounded values (request IDs, raw paths, UUIDs) in labels; use route templates and status classes.
Review cAdvisor and kube-state-metrics limits whenever cluster size changes; they scale with pods and objects.
The free incident assistant can rank a down target’s metrics by series count and suggest the metric_relabel_configs drop rule; see more under Prometheus and monitoring.

label_limit / label_name_length_limit — a related but distinct guardrail capping labels per series, not total samples per scrape.
The query-time query processing would load too many samples error — about evaluating PromQL over too wide a range, not scraping; do not confuse the two.
Config reload failed / HTTP 400 — what you hit if the sample_limit value or metric_relabel_configs regex is malformed.

Frequently Asked Questions

Does hitting sample_limit drop only the extra samples? No. The entire scrape is rejected and the target is marked down. You lose every sample from that response, not just the overflow.

Should I raise sample_limit or drop metrics? Raise it only when the series count is legitimate and stable. If a label or metric is exploding, drop it with metric_relabel_configs and fix the exporter — raising the cap just postpones the problem.

How is this different from label_limit? sample_limit caps total series per scrape (cardinality). label_limit caps the number of labels on a single series. They guard different failure modes and are configured separately.

Is this the same as the query ‘too many samples’ error? No. That error happens at query evaluation when a PromQL expression spans too many samples. sample limit exceeded happens at scrape time. Different subsystem, different fix.

How do I see how many series a target exposes? Count non-comment, non-blank lines of its /metrics (grep -vcE '^(#|$)') or read scrape_samples_scraped{job="..."} in Prometheus. Compare that to the job’s sample_limit.

Prometheus Error Guide: 'scrape sample limit exceeded' Target Down on Cardinality

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Related Errors

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit