Prometheus Operator and kube-prometheus-stack Explained

The first time I ran Prometheus in Kubernetes, I tried to manage prometheus.yml by hand in a ConfigMap. It was miserable: every new service meant editing a central file, reloading, and praying I hadn’t broken the YAML. The Prometheus Operator exists to kill that workflow, and kube-prometheus-stack packages it with everything else you need. Here’s how the pieces fit, because the abstraction confuses people until it clicks.

The core idea: scrape config becomes CRDs

The Prometheus Operator introduces Custom Resource Definitions that you create like any other Kubernetes object, and the Operator translates them into the Prometheus config you’d otherwise write by hand. The ones that matter:

Prometheus — declares a Prometheus instance (replicas, retention, resources, which monitors to select).
ServiceMonitor — says “scrape the endpoints behind this Service.”
PodMonitor — scrape pods directly, no Service required.
PrometheusRule — your recording and alerting rules.
Alertmanager — declares an Alertmanager instance.
Probe — blackbox-style probes of external targets.

You never edit prometheus.yml again. You create a ServiceMonitor, the Operator regenerates the config and reloads Prometheus. That’s the whole value proposition, and it’s a big one.

How a ServiceMonitor actually finds your target

This is where people get stuck, so let’s be precise. A ServiceMonitor selects Services by label, then scrapes the named port on the endpoints behind them:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: payments-api
  labels:
    release: kube-prometheus-stack   # must match the Prometheus selector
spec:
  selector:
    matchLabels:
      app: payments-api              # matches your Service's labels
  endpoints:
    - port: metrics                  # the NAMED port on the Service
      path: /metrics
      interval: 30s

Three things must line up or you’ll get silent nothing:

The ServiceMonitor’s selector.matchLabels must match labels on a real Service (not the Deployment).
The Service must expose a named port — port: metrics refers to the port’s name, not its number.
The ServiceMonitor itself must carry whatever label the Prometheus CR’s serviceMonitorSelector requires. With kube-prometheus-stack that’s usually release: <helm-release-name>.

That last one bites everyone exactly once. If your ServiceMonitor exists but nothing scrapes, check that the Prometheus CR is configured to select it:

# inside the Prometheus CR
spec:
  serviceMonitorSelector:
    matchLabels:
      release: kube-prometheus-stack

Some teams set serviceMonitorSelector: {} to select all ServiceMonitors regardless of labels — simpler, at the cost of less control.

What kube-prometheus-stack bundles

kube-prometheus-stack is the Helm chart that gives you a working observability platform in one install: the Operator, a Prometheus instance, Alertmanager, Grafana with dashboards pre-loaded, node_exporter, and kube-state-metrics — all wired together. It’s the fastest path from “fresh cluster” to “I can see my pods’ metrics.”

helm install monitoring prometheus-community/kube-prometheus-stack \
  -n monitoring --create-namespace \
  -f values.yaml

A values file I reach for early sets sane retention and storage:

prometheus:
  prometheusSpec:
    retention: 15d
    retentionSize: 45GB
    storageSpec:
      volumeClaimTemplate:
        spec:
          resources:
            requests:
              storage: 50Gi
    serviceMonitorSelectorNilUsesHelmValues: false  # select ALL ServiceMonitors

That last setting is the one I always flip. By default the stack only scrapes ServiceMonitors with its own release label. Setting it false lets any ServiceMonitor in the cluster be picked up, which is usually what application teams expect.

Alerts as CRDs

PrometheusRule turns your alert YAML into a Kubernetes object that the Operator loads automatically:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: api-alerts
  labels:
    release: kube-prometheus-stack
spec:
  groups:
    - name: api.rules
      rules:
        - alert: HighErrorRate
          expr: |
            sum(rate(http_requests_total{status=~"5.."}[5m]))
              / sum(rate(http_requests_total[5m])) > 0.05
          for: 10m
          labels:
            severity: page
          annotations:
            summary: "5xx error rate above 5% for 10m"

Same label-matching rules apply: the PrometheusRule needs the label the Prometheus CR’s ruleSelector expects. Ship rules next to the service they monitor, in the same repo, reviewed in the same PR.

Debugging the silence

When a target isn’t showing up, walk this checklist in order:

Service exists with the right labels — kubectl get svc -l app=payments-api.
Endpoints are populated — kubectl get endpoints payments-api; empty means your pod selector is wrong.
Port name matches — the ServiceMonitor’s port is a name on the Service.
ServiceMonitor is selected — check Prometheus’ Status > Targets page, or that the ServiceMonitor carries the release label.
Operator generated config — kubectl get secret prometheus-...-prometheus -o ... to confirm the scrape job appeared.

Ninety percent of “Prometheus isn’t scraping my app” tickets die at step 1 or 3.

Why it’s worth it

Hand-managed Prometheus config doesn’t scale past a handful of services. With the Operator, every team owns the ServiceMonitor and PrometheusRule for their own app, in their own repo, and the platform team owns the Prometheus CR. Responsibility is distributed exactly where it belongs, and there’s no central YAML file to fight over.

For the underlying exporters this stack ships, see our node_exporter and kube-state-metrics guides in the Prometheus and monitoring category. And when your PrometheusRules start paging too often, our monitoring alert assistant reviews them for noise before they reach on-call.

CRD apiVersions and chart values change across releases. Confirm against the operator and chart versions you actually install.