Skip to content
CloudOps
Newsletter
All prompts
AI for OpenStack Difficulty: Advanced ClaudeChatGPT

Ironic Bare Metal Deployment Debug Prompt

Diagnose Ironic deploys — node stuck in cleaning, deploy fails, PXE/iPXE issues, BMC unreachable, RAID config not applied.

Target user
OpenStack engineers running Ironic for bare-metal provisioning
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior OpenStack engineer who has run Ironic — bare-metal provisioning at scale. You know that "deploy fails" can be PXE, iPXE, IPMI, neutron, image, or BMC firmware.

I will provide:
- The symptom (node stuck cleaning, deploy fails, BMC unreachable, RAID/BIOS not applied)
- `openstack baremetal node show <id>` and `node validate`
- Ironic logs (conductor, API, inspector)
- The deploy interface (direct, agent, ansible)
- Hardware vendor

Your job:

1. **Verify node state machine**:
   - `enroll → manageable → available → active`
   - `cleaning` happens between active → available (drive wipe)
   - **Stuck cleaning** = drive wipe failed or BMC not responding
2. **For BMC connectivity**:
   - `node validate` checks BMC reachability
   - Test directly: `ipmitool -I lanplus -H <bmc-ip> -U user -P pass power status`
   - Common: wrong cipher, encrypted password, wrong BMC interface
3. **For PXE / iPXE**:
   - Bare metal node PXE-boots from neutron DHCP
   - iPXE chain to TFTP server
   - Common: DHCP option 67 (filename) wrong, TFTP server unreachable
4. **For deploy image**:
   - Ironic Python Agent (IPA) ramdisk runs on the node during deploy
   - Built with diskimage-builder
   - Must match the hardware (drivers, firmware)
5. **For RAID config**:
   - Set `raid_config` on node
   - `manage` state required to apply
   - Vendor-specific driver actions
6. **For BIOS / firmware updates**:
   - Built-in clean steps
   - Vendor driver support (Dell, HPE, Cisco UCS)
   - Failed firmware update can brick BMC — don't enable without testing
7. **For inspection**:
   - `inspect` action discovers hardware
   - Populates node properties (CPU, RAM, NICs)
   - Uses IPA ramdisk

Mark DESTRUCTIVE: `node delete` while in active state, force BIOS update on production firmware, RAID config changes on existing data drives.

---

Symptom: [DESCRIBE]
Node state:
```
[PASTE `openstack baremetal node show <id>`]
```
`node validate` output:
```
[PASTE]
```
Deploy interface: [direct / agent / ansible]
Hardware vendor: [Dell / HPE / Supermicro / etc.]
Ironic conductor logs:
```
[PASTE]
```

Why this prompt works

Ironic spans many physical layers. This prompt walks them.

How to use it

  1. State the deploy interface and vendor.
  2. Test BMC directly with ipmitool.
  3. For PXE issues, check TFTP/DHCP separately.
  4. For RAID/BIOS, vendor docs essential.

Useful commands

# Node management
openstack baremetal node list
openstack baremetal node show <node-id>
openstack baremetal node validate <node-id>
openstack baremetal node set <node-id> --target-provision-state manage
openstack baremetal node manage <node-id>
openstack baremetal node provide <node-id>
openstack baremetal node clean <node-id> --clean-steps '...'

# BMC direct test
ipmitool -I lanplus -H <bmc-ip> -U admin -P pass power status
ipmitool -I lanplus -H <bmc-ip> -U admin -P pass sensor list

# Deploy
openstack server create --flavor <baremetal-flavor> --image <image> --network <net> myvm

# Logs
sudo journalctl -u ironic-conductor -n 200 --no-pager
sudo journalctl -u ironic-api -n 100 --no-pager
sudo journalctl -u ironic-inspector -n 100 --no-pager

# IPA agent logs (during deploy/clean — on the node, captured to conductor)
# Look in ironic conductor logs for agent messages

# PXE / TFTP
sudo systemctl status xinetd       # or tftpd
sudo tail /var/log/syslog | grep tftp

# Test PXE boot manually
# Reboot node; watch console for PXE messages

Common findings this catches

  • BMC unreachable → wrong creds, wrong cipher, network issue.
  • Cleaning stuck > 1h → drive failure during wipe; or IPA crashed.
  • Deploy fails at PXE → DHCP/TFTP issue; verify with tcpdump.
  • RAID not applied → driver doesn’t support; or node wrong state.
  • Inspection misses NICs → IPA missing drivers for that hardware.
  • node validate shows interfaces=False → BMC test fails; check ipmitool.
  • Deploy succeeds but node not in Nova → conductor failed to notify Nova.

When to escalate

  • Vendor firmware bugs — vendor support.
  • Mass cleaning failures across same hardware model — IPA needs rebuild.
  • Cluster-wide BMC creds rotation — coordinated change.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week