Ansible WinRM and Kerberos Connection Troubleshooting Prompt
Diagnose Ansible-to-Windows connection failures over WinRM with Kerberos auth and produce a ranked, low-risk fix list instead of guesswork.
- Target user
- Engineers hitting WinRM/Kerberos auth, SPN, or transport errors when running Ansible against Windows hosts
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a senior automation engineer who has untangled the full zoo of Ansible-on-Windows connection failures: WinRM transport mismatches, Kerberos SPN and realm problems, certificate/HTTPS issues, and CredSSP quirks. You know these errors are cryptic and that the right fix depends on the exact transport and auth method in play. I will give you the connection variables and the error output. Diagnose the failure and produce a ranked fix list. Steps: 1. **Restate the connection profile**: from the vars, state the transport (`ansible_winrm_transport`: kerberos / ntlm / credssp / basic), scheme (http/https), port, and whether `message_encryption` is in play. Flag any insecure combination (basic over http, validation disabled). 2. **Parse the error**: classify it — auth failure, SPN/realm mismatch, certificate/validation error, transport/port mismatch, or timeout — quoting the line that proves the class. 3. **Kerberos specifics** (if kerberos): check the controller's krb5 realm/KDC config, the SPN format (`HTTP/host.fqdn`), time skew, and that `kinit` succeeds for the account; list what to verify. 4. **Rank fixes by safety**: order them least-risky first (e.g. fix client-side krb5.conf, correct SPN) before anything that changes the Windows host's WinRM listener or auth config. 5. **Verification per fix**: give the exact command to confirm each fix (e.g. `kinit`, `ansible -m win_ping`, a raw WinRM probe) before moving to the next. 6. **Security note**: never recommend disabling certificate validation or enabling basic-over-http as the "fix" without flagging it as a security regression and offering the secure alternative. Fill in: - Connection vars (host/group_vars, redact secrets): [PASTE] - Full error output (verbose -vvv): [PASTE] - Controller OS and Windows host version: [DESCRIBE] - Domain-joined or workgroup: [DESCRIBE] Output format: the restated connection profile, the error classification with the proving line, a ranked fix table (fix, why, risk, verification command), and a one-line note on the most likely root cause. Do not apply changes to the Windows host's WinRM config blindly. Verify each fix with win_ping or a raw probe before the next, and treat any change to listeners, auth methods, or validation as a security-relevant change needing review.
Why this prompt works
Ansible-on-Windows connection failures are uniquely miserable because the error messages are cryptic and the correct fix depends entirely on a configuration you have to reconstruct from variables: which transport, which scheme, which auth method. The most common debugging failure is reaching for the first plausible fix — usually disabling certificate validation or flipping to basic auth — because it makes the error go away. This prompt blocks that reflex by forcing the connection profile to be restated first and by explicitly refusing to treat security regressions as fixes, which is the difference between a real diagnosis and a credential leak waiting to happen.
Kerberos is where most of the genuinely hard cases live, and the prompt gives it dedicated attention: SPN format, realm and KDC config in krb5.conf, clock skew, and a kinit that actually succeeds for the service account. These are the four things that account for the vast majority of “Kerberos authentication failed” errors, and checking them in order turns an opaque failure into a short verification checklist. Pinning each conclusion to the line of -vvv output that proves it keeps the diagnosis honest rather than pattern-matched.
The ranked, verify-as-you-go structure is what makes the output safe to act on. Ordering fixes least-risky-first means you exhaust client-side controller changes before touching the Windows host’s WinRM listener, and confirming each fix with win_ping before moving on prevents the classic mistake of stacking three changes and not knowing which one helped or what else broke. Treating listener and auth changes as security-relevant keeps a connection-debugging session from quietly weakening the host’s posture.