Gnocchi & Ceilometer Metering Debug Prompt
Diagnose OpenStack telemetry — Ceilometer pollsters, Gnocchi storage, metric aggregation, missing measurements, retention policy.
- Target user
- OpenStack operators managing telemetry pipelines
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior OpenStack engineer who has operated the telemetry stack — Ceilometer (collection) + Gnocchi (time-series storage) — for billing, alerting, and capacity decisions. I will provide: - The symptom (metrics missing, gaps, archive policy not applying, gnocchi-api slow) - Architecture (Ceilometer central + compute pollsters, Gnocchi backend) - Sample metric query and result - Logs Your job: 1. **Ceilometer side**: - **Pollsters** — periodic collectors (CPU, disk, network on compute nodes) - **Notification listeners** — event-based (instance create/delete from Nova) - Publishers send to Gnocchi (or other sinks) - `pipeline.yaml` defines collection 2. **For missing metrics**: - Pollster not running on compute (check `ceilometer-polling`) - Notification not received (oslo.messaging issue) - Publisher writing failed - Gnocchi archive policy doesn't cover the metric 3. **Gnocchi side**: - **Archive policy** — defines granularity + retention per metric - **Resource type** — describes what's being measured - **Metric** — instance of a resource type - **Measure** — single data point 4. **For gaps**: - Gnocchi processes measures asynchronously (`gnocchi-metricd`) - Backlog in measure queue = delayed availability - Check `gnocchi status` for processing rate 5. **For storage backend**: - File, Ceph RBD, S3, Swift - Capacity issues = write failures 6. **For aggregation**: - Built-in: min, max, mean, sum, std, count - Granularity = bucket size (1m, 5m, 1h) - Retention per granularity 7. **For Heat autoscaling using Gnocchi**: - Alarm definitions in Aodh query Gnocchi - Threshold-based scale up/down Mark DESTRUCTIVE: changing archive policy after metrics created (loses historical), deleting resource type (deletes all metrics), shrinking retention without backup. --- Architecture: [DESCRIBE] Symptom: [DESCRIBE] Sample query + result: ``` [PASTE `gnocchi measures show` or aggregate] ``` Logs: ``` [PASTE] ```
Why this prompt works
Telemetry feels working until billing is wrong or alarms don’t fire. This prompt walks the pipeline.
How to use it
- Verify both sides — collection + storage.
- For gaps, check metricd backlog.
- For new metrics, ensure archive policy covers them.
- For aggregation, verify granularity.
Useful commands
# Ceilometer status
sudo systemctl status ceilometer-polling
sudo systemctl status ceilometer-agent-notification
sudo systemctl status ceilometer-agent-central
# Pipeline
sudo cat /etc/ceilometer/pipeline.yaml
# Logs
sudo journalctl -u ceilometer-polling -n 100 --no-pager
sudo journalctl -u ceilometer-agent-notification -n 100 --no-pager
# Gnocchi status
gnocchi status
# Archive policies
gnocchi archive-policy list
gnocchi archive-policy show <name>
# Resources / metrics
gnocchi resource list --type instance --limit 10
gnocchi metric list --resource-id <id>
# Query measures
gnocchi measures show <metric-id>
gnocchi measures show <metric-id> --aggregation max
# Aggregation across resources
gnocchi aggregates --resource-type instance \
'(aggregate sum (metric cpu mean))' \
'started_at < "2026-05-30"'
# Gnocchi logs
sudo journalctl -u gnocchi-api -n 100 --no-pager
sudo journalctl -u gnocchi-metricd -n 100 --no-pager
Patterns
Define archive policy
gnocchi archive-policy create \
--back-window 0 \
--definition granularity:1m,timespan:1h \
--definition granularity:5m,timespan:1d \
--definition granularity:1h,timespan:30d \
--definition granularity:1d,timespan:365d \
standard
Common findings this catches
- Metrics missing entirely → pollster not running OR publisher misconfigured.
- Gaps in metrics → metricd backlog; restart or scale.
- Old metrics gone unexpectedly → archive policy retention reached.
- Aggregations slow → backend storage I/O bound; tune backend.
- Notification flood → Nova event volume; filter at Ceilometer pipeline.
- New resource type not appearing → resource type defined?
gnocchi resource-type list. - Aodh alarm not firing → metric query returning no data due to gap.
When to escalate
- Storage backend capacity issues — engage storage team.
- Telemetry data integrity for billing — audit and reconcile.
- Major Ceilometer config redesign — staged rollout.
Related prompts
-
OpenStack Capacity Planning Prompt
Plan OpenStack capacity — CPU/RAM/disk oversubscription, growth modeling, hypervisor sizing, Cinder backend planning, network bandwidth.
-
OpenStack Request-ID Log Trace Prompt
Correlate a single API request across services (nova-api → conductor → scheduler → compute → neutron → cinder) using OpenStack request IDs.
-
OpenStack VM Troubleshooting Prompt
Diagnose Nova VM boot failures, networking issues, and stuck instances using nova/openstack CLI output.