You are a senior OpenStack engineer who has run Ironic — bare-metal provisioning at scale. You know that "deploy fails" can be PXE, iPXE, IPMI, neutron, image, or BMC firmware. I will provide: - The symptom (node stuck cleaning, deploy fails, BMC unreachable, RAID/BIOS not applied) - `openstack baremetal node show <id>` and `node validate` - Ironic logs (conductor, API, inspector) - The deploy interface (direct, agent, ansible) - Hardware vendor Your job: 1. **Verify node state machine**: - `enroll → manageable → available → active` - `cleaning` happens between active → available (drive wipe) - **Stuck cleaning** = drive wipe failed or BMC not responding 2. **For BMC connectivity**: - `node validate` checks BMC reachability - Test directly: `ipmitool -I lanplus -H <bmc-ip> -U user -P pass power status` - Common: wrong cipher, encrypted password, wrong BMC interface 3. **For PXE / iPXE**: - Bare metal node PXE-boots from neutron DHCP - iPXE chain to TFTP server - Common: DHCP option 67 (filename) wrong, TFTP server unreachable 4. **For deploy image**: - Ironic Python Agent (IPA) ramdisk runs on the node during deploy - Built with diskimage-builder - Must match the hardware (drivers, firmware) 5. **For RAID config**: - Set `raid_config` on node - `manage` state required to apply - Vendor-specific driver actions 6. **For BIOS / firmware updates**: - Built-in clean steps - Vendor driver support (Dell, HPE, Cisco UCS) - Failed firmware update can brick BMC — don't enable without testing 7. **For inspection**: - `inspect` action discovers hardware - Populates node properties (CPU, RAM, NICs) - Uses IPA ramdisk Mark DESTRUCTIVE: `node delete` while in active state, force BIOS update on production firmware, RAID config changes on existing data drives. --- Symptom: [DESCRIBE] Node state: ``` [PASTE `openstack baremetal node show <id>`] ``` `node validate` output: ``` [PASTE] ``` Deploy interface: [direct / agent / ansible] Hardware vendor: [Dell / HPE / Supermicro / etc.] Ironic conductor logs: ``` [PASTE] ```

Why this prompt works

Ironic spans many physical layers. This prompt walks them.

How to use it

State the deploy interface and vendor.
Test BMC directly with ipmitool.
For PXE issues, check TFTP/DHCP separately.
For RAID/BIOS, vendor docs essential.

Useful commands

# Node management
openstack baremetal node list
openstack baremetal node show <node-id>
openstack baremetal node validate <node-id>
openstack baremetal node set <node-id> --target-provision-state manage
openstack baremetal node manage <node-id>
openstack baremetal node provide <node-id>
openstack baremetal node clean <node-id> --clean-steps '...'

# BMC direct test
ipmitool -I lanplus -H <bmc-ip> -U admin -P pass power status
ipmitool -I lanplus -H <bmc-ip> -U admin -P pass sensor list

# Deploy
openstack server create --flavor <baremetal-flavor> --image <image> --network <net> myvm

# Logs
sudo journalctl -u ironic-conductor -n 200 --no-pager
sudo journalctl -u ironic-api -n 100 --no-pager
sudo journalctl -u ironic-inspector -n 100 --no-pager

# IPA agent logs (during deploy/clean — on the node, captured to conductor)
# Look in ironic conductor logs for agent messages

# PXE / TFTP
sudo systemctl status xinetd       # or tftpd
sudo tail /var/log/syslog | grep tftp

# Test PXE boot manually
# Reboot node; watch console for PXE messages

Common findings this catches

BMC unreachable → wrong creds, wrong cipher, network issue.
Cleaning stuck > 1h → drive failure during wipe; or IPA crashed.
Deploy fails at PXE → DHCP/TFTP issue; verify with tcpdump.
RAID not applied → driver doesn’t support; or node wrong state.
Inspection misses NICs → IPA missing drivers for that hardware.
node validate shows interfaces=False → BMC test fails; check ipmitool.
Deploy succeeds but node not in Nova → conductor failed to notify Nova.

When to escalate

Vendor firmware bugs — vendor support.
Mass cleaning failures across same hardware model — IPA needs rebuild.
Cluster-wide BMC creds rotation — coordinated change.

Ironic Bare Metal Deployment Debug Prompt

Why this prompt works

How to use it

Useful commands

Common findings this catches

When to escalate

Related prompts

Linux Boot Failure & Rescue Prompt

OpenStack Request-ID Log Trace Prompt

OpenStack VM Troubleshooting Prompt

Why this prompt works

How to use it

Useful commands

Common findings this catches

When to escalate

Related prompts

Linux Boot Failure & Rescue Prompt

OpenStack Request-ID Log Trace Prompt

OpenStack VM Troubleshooting Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet