Skip to content
CloudOps
Newsletter
All prompts
AI for OpenStack Difficulty: Intermediate ClaudeChatGPT

Gnocchi & Ceilometer Metering Debug Prompt

Diagnose OpenStack telemetry — Ceilometer pollsters, Gnocchi storage, metric aggregation, missing measurements, retention policy.

Target user
OpenStack operators managing telemetry pipelines
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior OpenStack engineer who has operated the telemetry stack — Ceilometer (collection) + Gnocchi (time-series storage) — for billing, alerting, and capacity decisions.

I will provide:
- The symptom (metrics missing, gaps, archive policy not applying, gnocchi-api slow)
- Architecture (Ceilometer central + compute pollsters, Gnocchi backend)
- Sample metric query and result
- Logs

Your job:

1. **Ceilometer side**:
   - **Pollsters** — periodic collectors (CPU, disk, network on compute nodes)
   - **Notification listeners** — event-based (instance create/delete from Nova)
   - Publishers send to Gnocchi (or other sinks)
   - `pipeline.yaml` defines collection
2. **For missing metrics**:
   - Pollster not running on compute (check `ceilometer-polling`)
   - Notification not received (oslo.messaging issue)
   - Publisher writing failed
   - Gnocchi archive policy doesn't cover the metric
3. **Gnocchi side**:
   - **Archive policy** — defines granularity + retention per metric
   - **Resource type** — describes what's being measured
   - **Metric** — instance of a resource type
   - **Measure** — single data point
4. **For gaps**:
   - Gnocchi processes measures asynchronously (`gnocchi-metricd`)
   - Backlog in measure queue = delayed availability
   - Check `gnocchi status` for processing rate
5. **For storage backend**:
   - File, Ceph RBD, S3, Swift
   - Capacity issues = write failures
6. **For aggregation**:
   - Built-in: min, max, mean, sum, std, count
   - Granularity = bucket size (1m, 5m, 1h)
   - Retention per granularity
7. **For Heat autoscaling using Gnocchi**:
   - Alarm definitions in Aodh query Gnocchi
   - Threshold-based scale up/down

Mark DESTRUCTIVE: changing archive policy after metrics created (loses historical), deleting resource type (deletes all metrics), shrinking retention without backup.

---

Architecture: [DESCRIBE]
Symptom: [DESCRIBE]
Sample query + result:
```
[PASTE `gnocchi measures show` or aggregate]
```
Logs:
```
[PASTE]
```

Why this prompt works

Telemetry feels working until billing is wrong or alarms don’t fire. This prompt walks the pipeline.

How to use it

  1. Verify both sides — collection + storage.
  2. For gaps, check metricd backlog.
  3. For new metrics, ensure archive policy covers them.
  4. For aggregation, verify granularity.

Useful commands

# Ceilometer status
sudo systemctl status ceilometer-polling
sudo systemctl status ceilometer-agent-notification
sudo systemctl status ceilometer-agent-central

# Pipeline
sudo cat /etc/ceilometer/pipeline.yaml

# Logs
sudo journalctl -u ceilometer-polling -n 100 --no-pager
sudo journalctl -u ceilometer-agent-notification -n 100 --no-pager

# Gnocchi status
gnocchi status

# Archive policies
gnocchi archive-policy list
gnocchi archive-policy show <name>

# Resources / metrics
gnocchi resource list --type instance --limit 10
gnocchi metric list --resource-id <id>

# Query measures
gnocchi measures show <metric-id>
gnocchi measures show <metric-id> --aggregation max

# Aggregation across resources
gnocchi aggregates --resource-type instance \
    '(aggregate sum (metric cpu mean))' \
    'started_at < "2026-05-30"'

# Gnocchi logs
sudo journalctl -u gnocchi-api -n 100 --no-pager
sudo journalctl -u gnocchi-metricd -n 100 --no-pager

Patterns

Define archive policy

gnocchi archive-policy create \
    --back-window 0 \
    --definition granularity:1m,timespan:1h \
    --definition granularity:5m,timespan:1d \
    --definition granularity:1h,timespan:30d \
    --definition granularity:1d,timespan:365d \
    standard

Common findings this catches

  • Metrics missing entirely → pollster not running OR publisher misconfigured.
  • Gaps in metrics → metricd backlog; restart or scale.
  • Old metrics gone unexpectedly → archive policy retention reached.
  • Aggregations slow → backend storage I/O bound; tune backend.
  • Notification flood → Nova event volume; filter at Ceilometer pipeline.
  • New resource type not appearing → resource type defined? gnocchi resource-type list.
  • Aodh alarm not firing → metric query returning no data due to gap.

When to escalate

  • Storage backend capacity issues — engage storage team.
  • Telemetry data integrity for billing — audit and reconcile.
  • Major Ceilometer config redesign — staged rollout.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week