Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Prometheus & Monitoring By James Joyner IV · · 9 min read

Prometheus Error Guide: 'scrape sample limit exceeded' Target Down on Cardinality

Fix Prometheus 'sample limit exceeded' target-down errors: count exposed series, identify high-cardinality exporters, drop noisy metrics, and raise sample_limit safely.

  • #prometheus-monitoring
  • #troubleshooting
  • #errors
  • #cardinality

Exact Error Message

A target trips sample_limit and is marked down, with the reason shown on the targets page and in the log:

ts=2026-06-27T12:48:13.907Z caller=scrape.go:1701 level=warn component="scrape manager" scrape_pool=kube-state-metrics target=http://10.0.4.22:8080/metrics msg="Append failed" err="sample limit exceeded"

On /api/v1/targets the target’s lastError reads:

"health": "down",
"lastError": "sample limit exceeded",
"scrapeUrl": "http://10.0.4.22:8080/metrics"

Older builds phrase it as target exceeded the sample limit or scrape sample limit exceeded. In every case the entire scrape is discarded — not just the overflowing samples — so the target goes from up 1 to up 0.

What the Error Means

sample_limit is a per-scrape cap on how many samples (series) a single scrape may return. When a target’s /metrics response exceeds it, Prometheus rejects the whole scrape: no samples from that response are stored and the target’s up becomes 0. The limit exists to protect the TSDB from a single exporter that suddenly explodes in cardinality (a runaway label, per-request IDs) and blows up memory and disk.

This is a cardinality guardrail, not a label-name guardrail. It is distinct from label_limit (which caps labels per series) and from the query-time too many samples error (which is about evaluating a PromQL expression over too wide a range, not scraping).

Common Causes

  1. Target legitimately exposes more series than sample_limit allows — the fleet grew and the cap was never revisited.
  2. High-cardinality exporter — labels carry per-request data, user IDs, full URLs, or UUIDs, multiplying series without bound.
  3. A single misbehaving exporter — one buggy instance emits an unbounded label (e.g., a path with embedded IDs) while peers are fine.
  4. sample_limit set too low — a conservative default (e.g., 1000) applied to a naturally large exporter like kube-state-metrics or cAdvisor.
  5. cAdvisor / kube-state-metrics explosion — short-lived pods, many containers, or churned objects inflate series counts past the cap.

How to Reproduce the Error

Set a deliberately low limit on a job and point it at a large exporter:

# /etc/prometheus/prometheus.yml
scrape_configs:
  - job_name: "kube-state-metrics"
    sample_limit: 1000          # too low for a real KSM target
    static_configs:
      - targets: ["10.0.4.22:8080"]
curl -s -XPOST http://localhost:9090/-/reload
# next scrape:
# err="sample limit exceeded"   and target health -> down

Diagnostic Commands

Find which targets are down for sample-limit reasons:

curl -s http://localhost:9090/api/v1/targets \
  | jq -r '.data.activeTargets[] | select(.lastError|test("sample limit")) | [.scrapePool,.scrapeUrl,.lastError] | @tsv'
kube-state-metrics  http://10.0.4.22:8080/metrics  sample limit exceeded

Count how many series the target actually exposes (count non-comment, non-blank lines):

curl -s http://10.0.4.22:8080/metrics | grep -vcE '^(#|$)'
14732

Find the configured sample_limit for that job:

curl -s http://localhost:9090/api/v1/status/config | jq -r '.data.yaml' \
  | grep -A6 'kube-state-metrics' | grep sample_limit

See how many samples Prometheus managed to read before rejecting, and rank the biggest metrics:

scrape_samples_scraped{job="kube-state-metrics"}
topk(10, count by (__name__) ({job="kube-state-metrics"}))
{__name__="kube_pod_status_phase"}  4821
{__name__="kube_pod_container_status_restarts_total"}  2960

Read the log for the rejection:

journalctl -u prometheus --no-pager | grep -iE 'sample limit' | tail -10

Step-by-Step Resolution

  1. Confirm the cause. Compare the target’s actual line count against the configured sample_limit. If the count is a legitimate, stable number, the limit is simply too low. If it is climbing fast, you have a cardinality bug.

  2. If the volume is legitimate, raise the limit deliberately — pick a value with headroom over the real series count, not an unbounded one:

    - job_name: "kube-state-metrics"
      sample_limit: 50000
      static_configs:
        - targets: ["10.0.4.22:8080"]
  3. If a high-cardinality metric is the culprit, drop it with metric_relabel_configs rather than raising the cap. Drop the noisy series entirely:

    - job_name: "kube-state-metrics"
      sample_limit: 20000
      metric_relabel_configs:
        - source_labels: [__name__]
          regex: kube_pod_container_status_last_terminated_reason
          action: drop
  4. Strip an unbounded label (keep the metric, remove the high-cardinality dimension) when the metric is useful but a label explodes:

    metric_relabel_configs:
      - regex: "id|request_id|path"
        action: labeldrop
  5. Fix the exporter when a single instance misbehaves — replace per-request IDs or full URLs in labels with bounded values (status class, route template). This is the only durable fix for a true cardinality bug.

  6. Validate, reload, confirm the target recovers:

    promtool check config /etc/prometheus/prometheus.yml \
      && curl -s -XPOST http://localhost:9090/-/reload \
      && sleep 20 \
      && curl -s http://localhost:9090/api/v1/targets \
         | jq -r '.data.activeTargets[] | select(.scrapePool=="kube-state-metrics") | [.health,.lastError] | @tsv'

Prevention and Best Practices

  • Set sample_limit on every job as a tripwire, sized with headroom over the current count — it catches cardinality explosions before they fill the TSDB.
  • Alert on approach: scrape_samples_scraped / scrape_sample_limit > 0.8 warns you before a scrape is rejected.
  • Use metric_relabel_configs to drop high-cardinality series at scrape time rather than raising limits indefinitely.
  • Never put unbounded values (request IDs, raw paths, UUIDs) in labels; use route templates and status classes.
  • Review cAdvisor and kube-state-metrics limits whenever cluster size changes; they scale with pods and objects.
  • The free incident assistant can rank a down target’s metrics by series count and suggest the metric_relabel_configs drop rule; see more under Prometheus and monitoring.
  • label_limit / label_name_length_limit — a related but distinct guardrail capping labels per series, not total samples per scrape.
  • The query-time query processing would load too many samples error — about evaluating PromQL over too wide a range, not scraping; do not confuse the two.
  • Config reload failed / HTTP 400 — what you hit if the sample_limit value or metric_relabel_configs regex is malformed.

Frequently Asked Questions

Does hitting sample_limit drop only the extra samples? No. The entire scrape is rejected and the target is marked down. You lose every sample from that response, not just the overflow.

Should I raise sample_limit or drop metrics? Raise it only when the series count is legitimate and stable. If a label or metric is exploding, drop it with metric_relabel_configs and fix the exporter — raising the cap just postpones the problem.

How is this different from label_limit? sample_limit caps total series per scrape (cardinality). label_limit caps the number of labels on a single series. They guard different failure modes and are configured separately.

Is this the same as the query ‘too many samples’ error? No. That error happens at query evaluation when a PromQL expression spans too many samples. sample limit exceeded happens at scrape time. Different subsystem, different fix.

How do I see how many series a target exposes? Count non-comment, non-blank lines of its /metrics (grep -vcE '^(#|$)') or read scrape_samples_scraped{job="..."} in Prometheus. Compare that to the job’s sample_limit.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.