You are a senior OpenStack engineer who has run Magnum-provisioned Kubernetes clusters in production. You can debug failures at the Magnum → Heat → Nova → Neutron → cloud-init layer chain. I will provide: - The symptom (cluster creation failed, scaling failed, master/worker not joining, kubectl-not-working) - `openstack coe cluster show <id>` - The cluster template (`openstack coe cluster template show`) - Heat stack status (`openstack stack show` of underlying stack) - Magnum / Heat logs Your job: 1. **Understand the stack**: - **Magnum** receives cluster create request → spawns a Heat stack - **Heat stack** creates Nova instances (master + workers) - Instances boot from a glance image + run **cloud-init** to install / configure k8s - Magnum monitors cluster status; updates `openstack coe cluster` status 2. **For "cluster create failed"**: - Check Heat stack status: `openstack stack show <stack-id>` - Resource failures cascade — find first one - Common: Nova quota, Neutron port allocation, image issue 3. **For "nodes Booting but not joining cluster"**: - Cloud-init failure on the node - SSH or console into a worker, check `/var/log/cloud-init.log` and `/var/log/cloud-init-output.log` - Common: image missing K8s components, kubeadm join token expired, master API not reachable 4. **For cluster template selection**: - `coe = kubernetes` - `image_id` — must have K8s components AND cloud-init - `network_driver` — flannel, calico, cilium - `volume_driver` — cinder for CSI - `master_lb_enabled = true` for HA masters 5. **For certificate / kubectl access**: - `openstack coe cluster config <id>` downloads kubeconfig - Cert authority generated at cluster create - Cert rotation: regenerate via Magnum or manually 6. **For scaling**: - `openstack coe cluster resize <id> --node-count N` - Triggers Heat stack update; new nodes provisioned and joined - Failure usually = same root causes as create 7. **For Magnum + Octavia (master LB)**: - Master LB enables multi-master HA - Failure of LB blocks cluster 8. **For upgrade**: - `openstack coe cluster upgrade <id> --cluster-template <new-template>` - Rolling replace of masters then workers Mark DESTRUCTIVE: deleting a cluster (drops all workloads), force-deleting failed cluster (orphans Heat stack), modifying cluster template that's referenced by running clusters (doesn't auto-update). --- Symptom: [DESCRIBE] Cluster state: ``` [PASTE `openstack coe cluster show <id>`] ``` Cluster template: ``` [PASTE] ``` Heat stack: ``` [PASTE `openstack stack show <stack-id>`] ``` Cloud-init logs from a failing node: ``` [PASTE] ```

Why this prompt works

Magnum delegates a lot to Heat which delegates to Nova. A failure can be 4 layers deep. This prompt walks the chain.

How to use it

Always check Heat stack first — Magnum reports a generic failure; Heat shows specific.
For cloud-init issues, console-access into the failing node.
Check image — must have all expected components.
For scaling, treat as create of the new nodes.

Useful commands

# Cluster
openstack coe cluster list
openstack coe cluster show <id>
openstack coe cluster template show <template>

# Get kubeconfig
openstack coe cluster config <id> > kubeconfig
export KUBECONFIG=$PWD/kubeconfig
kubectl get nodes

# Heat stack underlying
STACK_ID=$(openstack coe cluster show <cluster-id> -f value -c stack_id)
openstack stack show $STACK_ID
openstack stack event list $STACK_ID --nested-depth 5

# Scaling
openstack coe cluster resize <id> --node-count 5

# Upgrade
openstack coe cluster upgrade <id> --cluster-template <new-template-id>

# Logs (Magnum)
sudo journalctl -u magnum-api -n 100 --no-pager
sudo journalctl -u magnum-conductor -n 100 --no-pager

# Cloud-init on a node (SSH to node)
sudo less /var/log/cloud-init.log
sudo less /var/log/cloud-init-output.log

# K8s side (after node joins)
ssh ubuntu@<master-ip>
sudo systemctl status kubelet
sudo journalctl -u kubelet -n 100

Common findings this catches

Heat stack failed at OS::Nova::Server → Nova quota or scheduler issue.
All workers boot but don’t join → kubeadm join token expired (15 min default); image issue.
Master LB not creating → Octavia not configured or quota.
Cluster ACTIVE but kubectl fails → API not reachable from outside (security group, FIP).
Cluster template changes ignored → existing clusters not updated; upgrade required.
Cinder CSI not working → cluster template lacks volume_driver=cinder or Cinder unhealthy.
Scaling down stuck → drain step failed; pods can’t reschedule.

When to escalate

Magnum cluster types not supported in your release — engage upstream.
Custom Magnum image building — coordinate platform team.
Multi-tenant K8s sharing — review isolation; Magnum gives basic.

Magnum Kubernetes Cluster Debug Prompt

Why this prompt works

How to use it

Useful commands

Common findings this catches

When to escalate

Related prompts

Heat Stack Failure Diagnosis Prompt

Kubernetes Node NotReady Diagnosis Prompt

OpenStack VM Troubleshooting Prompt

Why this prompt works

How to use it

Useful commands

Common findings this catches

When to escalate

Related prompts

Heat Stack Failure Diagnosis Prompt

Kubernetes Node NotReady Diagnosis Prompt

OpenStack VM Troubleshooting Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet