Automating Windows With Ansible WinRM and Kerberos Using AI

Ansible managing Windows is one of those things that’s straightforward once it works and maddening until it does. The Linux mental model — SSH, a key, you’re in — does not transfer. Windows hosts are reached over WinRM, the remote management protocol, and in any domain environment the authentication you actually want is Kerberos. The gap between those two facts and a working connection is paved with cryptic errors about transports, SPNs, realms, and certificates, each of which sends people down the wrong rabbit hole. The most common wrong turn is the dangerous one: making the error go away by disabling certificate validation or falling back to basic auth over plain HTTP, which quietly exposes credentials in transit.

I lean on AI to reason through the connection profile and the failure modes here, because the right fix depends entirely on which transport and auth method are in play. But I treat every “just disable validation” suggestion as a red flag, and I verify each change with win_ping before moving on. Here’s the path that actually gets you connected without weakening security.

The connection profile is everything

Before debugging anything, you have to be able to state your connection profile, because the fix depends on it. The variables that define it:

# group_vars/windows.yml
ansible_connection: winrm
ansible_port: 5986
ansible_winrm_transport: kerberos
ansible_winrm_scheme: https
ansible_winrm_server_cert_validation: validate

Read that top to bottom. The connection is WinRM. Port 5986 is the HTTPS WinRM listener (5985 is plain HTTP — avoid it for real credentials). The transport is kerberos, which is what you want in a domain. The scheme is HTTPS, and certificate validation is on. That last line is the one people are tempted to flip to ignore when certs misbehave, and it’s exactly the wrong move — it disables the protection HTTPS is there to provide.

Why Kerberos, and what it needs

Kerberos is the right auth method in a domain because it’s ticket-based, integrates with Active Directory, and doesn’t ship credentials around the way basic auth does. The cost is that it has more moving parts on the controller side, and those parts are where most failures live. The controller needs Kerberos client packages and a krb5.conf that knows your realm and KDC:

# /etc/krb5.conf on the Ansible controller
[libdefaults]
    default_realm = CORP.EXAMPLE.COM
    dns_lookup_realm = false
    dns_lookup_kdc = true

[realms]
    CORP.EXAMPLE.COM = {
        kdc = dc01.corp.example.com
        admin_server = dc01.corp.example.com
    }

The single best diagnostic is to take Ansible out of the loop and test Kerberos directly:

# If this fails, Ansible will too — fix Kerberos first
kinit administrator@CORP.EXAMPLE.COM
klist

When I’m stuck, I describe the setup to AI and ask it to focus on the Kerberos specifics:

Ansible to a domain-joined Windows host over WinRM HTTPS with kerberos transport fails with “Server not found in Kerberos database.” The controller is Ubuntu, the host is Windows Server 2022 in realm CORP.EXAMPLE.COM. Walk me through SPN format, krb5.conf realm/KDC settings, and time skew — ranked by likelihood — and give me the command to verify each, without suggesting I disable cert validation or switch to basic auth.

That last clause matters. I explicitly forbid the insecure shortcuts so the model has to give me the real fix.

The errors that send people the wrong way

A few classes of error account for most of the pain, and each has a correct fix and a tempting wrong one:

“Server not found in Kerberos database” usually means an SPN problem. The host needs an SPN like HTTP/host.fqdn registered, and the FQDN you connect with has to match. The wrong fix is switching to NTLM or basic; the right one is fixing the SPN and connecting by the correct name.
Clock skew errors come from Kerberos’s tight time tolerance — controller and KDC must agree to within a few minutes. The fix is NTP, not loosening auth.
Certificate validation failures mean the controller doesn’t trust the host’s WinRM cert. The right fix is trusting the proper CA; the wrong, common one is ansible_winrm_server_cert_validation: ignore, which throws away HTTPS’s protection.

The pattern is consistent: every cryptic WinRM error has a proper fix that keeps the channel secure and a shortcut that compromises it. AI is good at laying out both; your job is to refuse the shortcut.

Pro Tip: Keep a plain HTTP listener (5985) disabled or firewalled in any environment with real credentials. Its existence is a standing temptation to “just try basic auth” the next time Kerberos acts up, and that’s a credential-in-cleartext incident waiting for a bad day.

Verify with win_ping, one change at a time

The cardinal rule of WinRM debugging is to verify each change before making the next, because stacking three changes and re-testing tells you nothing about which one helped. win_ping is the id-equivalent for Windows — a trivial round trip that confirms the whole connection and auth chain:

ansible windows -m ansible.windows.win_ping --limit 1 -vvv

web-win01 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}

I change one thing — fix the SPN, correct krb5.conf, trust the CA — then run win_ping and read the result. The -vvv output is verbose but it’s where the proving detail lives when something’s still wrong. Only once win_ping returns pong do I run a real playbook, and only then do I widen beyond the single test host.

Keep the secure path the easy path

The throughline of Windows automation is that the secure configuration — WinRM over HTTPS on 5986 with Kerberos and validation on — is also the one that, once correct, stays reliable. The insecure fallbacks feel faster in the moment of frustration and cost you later. Let AI help you reason through SPNs, realms, and certificate trust, but hold the line on the security-relevant settings and verify every step with win_ping. A connection you debugged by weakening it isn’t fixed; it’s a liability you haven’t noticed yet.

For authoring the playbooks once you’re connected, see generating Windows Ansible playbooks with AI safely and the AI for Ansible category. For a structured troubleshooting prompt, browse the Ansible prompts.