You are a senior Kubernetes engineer who can read `kubectl get events` to spot cluster-wide trouble fast. You know which events are noise (NodeReady), which are signal (FailedScheduling, FailedMount), and how to deduplicate. I will provide: - The investigation context (cluster-wide health check, specific namespace, specific workload) - Recent event dump: `kubectl get events -A --sort-by='.lastTimestamp'` (recent few hundred) - Optional: timeframe of interest Your job: 1. **Filter to Warning events first**: `kubectl get events -A --field-selector type=Warning` 2. **Identify event categories**: - **Scheduling**: `FailedScheduling`, `Preempted`, `NotTriggerScaleUp` - **Image / Container**: `Failed`, `BackOff`, `ImagePullBackOff`, `ErrImagePull`, `InspectFailed` - **Volume**: `FailedMount`, `FailedAttachVolume`, `VolumeFailedAlreadyOnNode`, `ProvisioningFailed` - **Node**: `NodeNotReady`, `NodeHasInsufficientMemory`, `NodeHasDiskPressure`, `Rebooted` - **Pod lifecycle**: `Killing`, `Unhealthy`, `BackOff`, `Pulled`, `Created`, `Started` - **Admission**: webhook errors, validation failures - **Autoscaler**: scale-up/down decisions, `ScaleUpFailed` - **Helm / GitOps controller** events (ArgoCD, Flux): `SyncFailed`, `OutOfSync` 3. **Deduplicate** — same event from many objects often indicates a single root cause: - 50 pods with `FailedScheduling: 0/3 nodes have sufficient cpu` → cluster CPU exhausted - 20 pods with `FailedMount` on same PV → CSI driver issue - All `aws-load-balancer-controller` events failing → controller down 4. **For each notable cluster of events**: - **What** is the event reason - **Who** is affected (count, namespaces, workloads) - **When** did it start (timing pattern) - **Why** (likely root cause) - **Next step** (where to look deeper) 5. **Cross-reference timing**: - Many events at the same minute → cluster-wide trigger (deploy, node death, autoscaler decision) - Periodic events (every 5 min) → cron-like; CronJob or controller reconcile - Recurring same-object events → loop (eg failing Helm rollout retrying) 6. **For event noise control**: - Default event TTL: 1 hour; older events drop - Set `--event-ttl` on apiserver for retention adjustment - Aggregated event-source tools (Eventrouter, kube-state-metrics) for retention 7. **For "no events but problem exists"**: - Events may have aged out (>1h) - Object's controller might not emit events (some custom controllers are silent) - Use logs from controller instead Mark DESTRUCTIVE: clearing all events (`kubectl delete events -A --all`), interpreting normal events (Created, Pulled) as warnings, attempting cluster-wide fixes from a noisy event stream without root-cause analysis. --- Investigation context: [DESCRIBE] Recent events (last few hundred): ``` [PASTE `kubectl get events -A --sort-by='.lastTimestamp' | tail -200`] ``` Or filtered: `kubectl get events -A --field-selector type=Warning`: ``` [PASTE] ``` Timeframe of interest: [DESCRIBE]

Why this prompt works

kubectl get events is the cluster’s stream of consciousness — what scheduler decided, what kubelet rejected, what controller failed. Many engineers skip events because they’re noisy. This prompt forces filtered, categorized analysis.

How to use it

Filter to Warning first. Normal events are noise for problem-solving.
Group by reason and object. Patterns emerge.
Look at the first event in a chain, not the latest.
For retention beyond 1 hour, you need an event shipper.

Useful commands

# Sorted by time
kubectl get events -A --sort-by='.lastTimestamp' | tail -100
kubectl get events -A --sort-by='.firstTimestamp'

# Warnings only
kubectl get events -A --field-selector type=Warning --sort-by='.lastTimestamp'

# By namespace
kubectl get events -n <ns>

# By specific object
kubectl get events --field-selector involvedObject.name=<pod>,involvedObject.namespace=<ns>

# By reason
kubectl get events -A --field-selector reason=FailedScheduling
kubectl get events -A --field-selector reason=FailedMount

# JSON for tooling
kubectl get events -A -o json | jq -r '.items[] | "\(.lastTimestamp) \(.type) \(.reason) \(.involvedObject.namespace)/\(.involvedObject.name): \(.message)"' | tail

# Count events by reason
kubectl get events -A -o json | jq -r '.items[].reason' | sort | uniq -c | sort -nr

# Count events by reason + namespace
kubectl get events -A -o json | \
  jq -r '.items[] | "\(.involvedObject.namespace) \(.reason)"' | \
  sort | uniq -c | sort -nr | head

# Watch live
kubectl get events -A --watch-only       # only new events

# Most recent warning per workload
kubectl get events -A --field-selector type=Warning -o json | \
  jq -r '.items | group_by(.involvedObject.name) | .[] | sort_by(.lastTimestamp) | .[-1] | "\(.lastTimestamp) \(.involvedObject.namespace)/\(.involvedObject.name) \(.reason): \(.message)"'

Event categories

Reason	What it means	Where to look
`FailedScheduling`	Scheduler couldn’t place pod	Node resources, taints, affinity
`Preempted`	Higher-priority pod evicted this one	PriorityClass usage
`FailedMount`	Volume mount failed	PVC binding, CSI driver
`ProvisioningFailed`	PV couldn’t be created	StorageClass provisioner, cloud quotas
`ImagePullBackOff` / `ErrImagePull`	Image fetch failed	Registry, secret, network
`BackOff`	Container CrashLoopBackOff	Pod logs
`Unhealthy`	Probe failed	Probe config + app state
`NodeNotReady`	Node went NotReady	Kubelet, container runtime
`NodeHasDiskPressure`	Node disk filling	Image GC, log volume
`Killing`	Container being terminated	Eviction, rollout, OOM
`FailedKillPod`	Couldn’t terminate; stuck	Finalizer, stuck mount
`Created` / `Pulled` / `Started`	Normal lifecycle	(noise during normal ops)

Analysis patterns

Burst at a single timestamp

kubectl get events -A -o json | \
  jq -r '.items[].lastTimestamp' | \
  cut -c1-16 | \
  sort | uniq -c | sort -nr | head
# Spikes at one minute = cluster event (deploy, node death)

Recurring events on one object (controller loop)

kubectl get events --field-selector involvedObject.name=<pod> -o json | \
  jq -r '.items | sort_by(.firstTimestamp) | .[] | "\(.firstTimestamp) \(.count)x \(.reason)"'
# `count` field high = same event over and over; controller retry loop

Cluster-wide problem detection

# Count Warning events by reason in last 10 minutes
kubectl get events -A --field-selector type=Warning -o json | \
  jq -r --arg cutoff "$(date -u -d '10 minutes ago' +%Y-%m-%dT%H:%M:%SZ)" \
  '.items[] | select(.lastTimestamp > $cutoff) | .reason' | \
  sort | uniq -c | sort -nr

Common findings this catches

50 pods FailedScheduling: 0/N nodes have sufficient cpu → cluster out of CPU; add nodes or evict noisy.
All pods in a namespace FailedMount → CSI driver / PVC issue affecting that namespace.
NodeHasDiskPressure on multiple nodes → image cleanup not running; check kubelet image GC.
Cluster-wide FailedKillPod → kubelet container runtime issue.
Cluster autoscaler ScaleUpFailed → cloud quota / IAM issue.
BackOff events repeating every 5min → CrashLoopBackOff retry interval (kubelet backoff).
Periodic Killing of Job pods → CronJob concurrencyPolicy: Replace killing previous run.

Event retention beyond 1 hour

# kube-event-exporter to Elastic / Slack / log file
# https://github.com/resmoio/kubernetes-event-exporter
apiVersion: apps/v1
kind: Deployment
metadata:
  name: event-exporter
spec:
  template:
    spec:
      containers:
      - name: event-exporter
        image: ghcr.io/resmoio/kubernetes-event-exporter:latest
        # config in ConfigMap routes events to receivers

When to escalate

Cluster-wide event burst correlating with a control-plane issue — engage cluster admin.
Same event reason flooded from a specific controller — coordinate with controller’s team.
Loss of historical events for an incident — install event shipper before next incident.

Reading prompts? Get all 500 in one free PDF

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response

Instant PDF download — yours free, forever

Plus one practical AI-workflow email a week (no spam)

Kubernetes Events Analysis Prompt