Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for OpenStack By James Joyner IV · · 8 min read

Debugging Keystone Identity and Authentication in OpenStack

401s, token expiry, and role mistakes block every other OpenStack service. Here's how to debug Keystone identity, tokens, and RBAC methodically.

  • #openstack
  • #keystone
  • #identity
  • #authentication
  • #rbac
  • #tokens

Keystone is the front door to the entire cloud. When it misbehaves, everything misbehaves — Nova, Neutron, Cinder all return cryptic errors that are really just “I couldn’t validate your token.” After 25 years in infrastructure, I’ve learned that most “OpenStack is down” pages are actually Keystone auth problems wearing another service’s mask.

Here’s how I debug identity and authentication without flailing.

Step 1: Prove it’s actually auth

Before anything, confirm the failure is authentication and not the downstream service. The fastest test is to request a token directly:

openstack token issue

If that fails with 401, the problem is your credentials or Keystone itself. If it succeeds but openstack server list still fails, the problem is authorization (roles/scope) or the other service’s endpoint — not authentication.

This one command splits your entire problem space in half.

Step 2: Inspect what you’re actually sending

Authentication failures are usually a mismatch between what you think your environment says and what it actually says:

env | grep OS_

The usual offenders:

  • OS_AUTH_URL pointing at the wrong version/v3 vs /v2.0. v2.0 is long dead; if anything references it, that’s your bug.
  • Missing OS_PROJECT_DOMAIN_NAME / OS_USER_DOMAIN_NAME — v3 requires domain scoping. A token request that “works for admin but not the user” is almost always a missing domain variable.
  • Stale OS_TOKEN — a cached token that expired.

Step 3: Distinguish 401 from 403

These look similar and mean opposite things:

  • 401 Unauthorized — Keystone couldn’t authenticate you. Wrong password, expired token, wrong domain, or clock skew.
  • 403 Forbidden — You authenticated fine, but your role doesn’t grant the action. This is RBAC, not identity.

For a 403, check the role assignments:

openstack role assignment list --user <user> --project <project> --names

A user with no role on the target project gets a token but can do nothing with it. The fix is a role grant, not a password reset.

Step 4: Clock skew is the silent killer

Fernet tokens (the modern default) encode timestamps. If the controller issuing tokens and the service validating them disagree on the time, tokens are rejected as expired the moment they’re issued. I’ve watched a whole cloud “lose auth” because one controller’s NTP died.

chronyc tracking      # on every controller

Any controller more than a few seconds off is suspect. This is the first thing I check when auth fails intermittently or only against certain nodes.

Step 5: Fernet key rotation gone wrong

Fernet keys must be identical across all Keystone nodes. If key rotation didn’t sync the key repository everywhere, tokens issued by node A fail on node B:

ls -la /etc/keystone/fernet-keys/
md5sum /etc/keystone/fernet-keys/* 

Compare the hashes across controllers. A mismatch means your rotation/distribution broke — re-sync the key repo and restart Keystone. The symptom is the giveaway: auth works sometimes and fails sometimes, depending on which node load balancer landed you on.

Step 6: Read the Keystone log with debug context

When the cause still isn’t obvious:

grep -iE 'authentication failed|invalid|expired' /var/log/keystone/keystone.log

For deep dives, temporarily enable debug = true and insecure_debug = true in keystone.conf (non-production only — insecure_debug leaks why auth failed in API responses). Always turn it back off.

Using AI to untangle RBAC and policy

Keystone’s role/scope/policy model is where people get lost — system vs domain vs project scope, implied roles, and policy.yaml overrides interact in non-obvious ways. I describe the setup to an LLM and ask it to reason about it:

“A user has the ‘reader’ role on project X but gets 403 creating a volume. Here is the role assignment list and the relevant policy.yaml rules for Cinder. Explain why the request is denied and exactly which role or policy change would allow it — without weakening other permissions. Read-only analysis.”

It’s genuinely good at tracing “this rule requires role:admin, the user only has reader, so it’s denied here” through a policy file you’d otherwise read line by line. I keep these RBAC-explainer prompts with my other OpenStack prompts so policy debugging is repeatable rather than archaeological.

The auth debugging checklist

When the pager says “OpenStack is broken,” I run this in order:

  1. openstack token issue — auth or authz?
  2. env | grep OS_ — right URL, right domains?
  3. 401 vs 403 — identity or RBAC?
  4. chronyc tracking on all controllers — clock skew?
  5. Compare Fernet keys across nodes — rotation synced?
  6. Grep the Keystone log for the real reason.

That sequence resolves the overwhelming majority of identity incidents, and it does it without touching downstream services that were never actually broken.

For more auth and RBAC prompts tuned to OpenStack, browse our prompt library. The mindset that makes Keystone manageable is simple: prove it’s auth first, then split identity from authorization, and only then go digging.

AI policy analysis is assistive, not authoritative. Validate role and policy changes in a non-production realm before applying them.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.