Skip to content
CloudOps
All prompts
AI for OpenStack Difficulty: Advanced ClaudeChatGPT

OpenStack Request-ID Log Trace Prompt

Correlate a single API request across services (nova-api → conductor → scheduler → compute → neutron → cinder) using OpenStack request IDs.

Target user
OpenStack operators debugging cross-service issues
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior OpenStack operator with deep experience tracing a single user request as it fans out across nova-api, conductor, scheduler, compute, neutron-server, OVS/OVN agents, cinder-api, cinder-volume, glance, keystone, and placement.

I will provide:
- The user-facing symptom (boot failed, attach hung, etc.)
- The initial `X-Openstack-Request-Id` (req-XXXXXXXX) from the failing API call
- Raw log excerpts from multiple services — possibly out of order, possibly partial
- The OpenStack release

Your job:

1. **Build a timeline** of the request as it crosses service boundaries. Each row: `timestamp | service | host | event | child request-id (if any)`.
2. **Track the request-id chain**: when service A calls service B, the *global* request-id is propagated as `X-Openstack-Request-Id` and B logs a new *local* request-id linked to A's. Reconstruct that chain.
3. **Identify the first error or anomaly** in the timeline (not the user-visible failure, which is downstream).
4. **Flag missing hops**: if you expect e.g. nova-conductor → nova-scheduler but the scheduler log doesn't show the request, that gap is the bug.
5. **Suggest the next log to fetch** if the trace is incomplete (be exact: which host, which service, which log file, what time window).
6. **Conclude** with the root-cause hypothesis grounded in the timeline.

Format the timeline as a markdown table. Use UTC timestamps consistently. If timezones differ across logs, normalize and note assumptions.

---

OpenStack release: [yoga / zed / antelope / bobcat / caracal / dalmatian / epoxy]
Symptom: [DESCRIBE]
Initial request-id: [req-XXXXXXXX]
Affected resource (server/volume/network UUID): [UUID]

Log excerpts (label each block with `service @ host`):

```
# nova-api @ ctrl-01
[PASTE]
```

```
# nova-conductor @ ctrl-01
[PASTE]
```

```
# nova-scheduler @ ctrl-02
[PASTE]
```

```
# nova-compute @ compute-17
[PASTE]
```

```
# neutron-server @ ctrl-01
[PASTE]
```

```
# cinder-volume @ storage-03 (if applicable)
[PASTE]
```

```
# other services / agents
[PASTE]
```

Why this prompt works

OpenStack is a microservices system that’s older than the term “microservice.” A single openstack server create call touches Keystone (auth), Nova-api, Nova-conductor, Nova-scheduler, Placement, Glance, Neutron, possibly Cinder, then libvirt on the compute. The user sees ERROR — but the actual error happened five hops in.

Without forcing the model to build a timeline, it tends to fixate on the most recent log line and miss the upstream cause. This prompt produces a row-by-row trace that surfaces the first anomaly.

How to use it

  1. Find the request-id from the failing API response header: X-Openstack-Request-Id: req-XXXXXXXX.
  2. On each candidate host, narrow logs to a tight time window:
    sudo journalctl --since "2026-05-21 14:30" --until "2026-05-21 14:35" -u nova-api -u nova-conductor
  3. Then grep the request-id within that window. Don’t grep the whole log — too much noise.
  4. Paste each service’s relevant lines under a clearly-labeled block.

Useful one-liner: pull a request across all services

# On each controller / compute / storage host:
sudo grep -rh "req-XXXXXXXX" /var/log/{nova,neutron,cinder,keystone,glance,placement}/ 2>/dev/null | \
  sort -k1,2

# Or via journalctl (systemd-journal):
sudo journalctl --since "1 hour ago" | grep "req-XXXXXXXX"

What “good propagation” looks like

A request-id chain typically looks like:

req-AAAA  user → nova-api
  req-AAAA  nova-api → keystone (auth)
  req-AAAA  nova-api → nova-conductor
    req-AAAA  nova-conductor → nova-scheduler
    req-BBBB  nova-conductor → glance (image lookup, new global req-id sometimes)
    req-AAAA  nova-conductor → neutron-server (port allocate)
    req-AAAA  nova-conductor → cinder-api (volume attach)
  req-AAAA  nova-scheduler → placement (resource claim)
req-AAAA → nova-compute (build task)
  req-AAAA  nova-compute → libvirt (define + start)
  req-AAAA  nova-compute → neutron-l2-agent (port wire-up)

If your trace shows req-AAAA reaching nova-scheduler but nothing logged in placement, the call never completed there — that’s the bug.

Common findings this catches

  • Request times out at the scheduler because [scheduler] max_attempts exhausted but the nova-api log just shows NoValidHost. Trace reveals the retry storm.
  • Neutron port allocation succeeds but the compute log shows the port wiring never happened — agent on that compute is down.
  • Cinder attach returns 200 OK to nova-conductor but os-brick on compute fails minutes later. Two separate request IDs, one timeline.
  • Auth failures invisible upstream: Keystone validates, but a downstream service’s local policy.yaml rejects. The Keystone log looks clean.
  • Cross-cell calls (in cellv2 deployments) routed wrong — request-id appears in cell0 logs when it should be in cell1.

When use_global_request_id matters

Some releases need explicit config in [DEFAULT]:

[DEFAULT]
use_global_request_id = true

to ensure the same request-id propagates rather than each service generating its own. If your trace shows brand-new request-ids appearing at each hop with no link, check this setting in your release.

When to escalate

If the timeline shows the request leaving service A but never arriving at service B, and both services’ clocks are NTP-synced, this is almost always message bus (RabbitMQ) — a queue is stuck, a binding is wrong, or the consumer is in a slow GC cycle. Time to look at the broker, not the apps.

Related prompts

Newsletter

Get weekly AI workflows for DevOps engineers

Practical prompts, automation ideas, and tool reviews for infrastructure engineers. One email per week. No spam.