mTLS and Service Identity with SPIFFE: Giving Every Workload

For years we authenticated services to each other with the worst credential ever invented: the source IP address. “Traffic from this subnet is trusted.” Then we autoscaled, containers got ephemeral IPs, pods churned every few minutes, and the whole model quietly stopped meaning anything. The IP allowlist became a comforting fiction in a firewall rule nobody dared touch.

The fix is to stop authenticating where a workload runs and start authenticating what it is. That’s service identity, and SPIFFE is the open standard that makes it portable across Kubernetes, VMs, and clouds.

What SPIFFE actually gives you

SPIFFE (Secure Production Identity Framework For Everyone) defines a universal identity for workloads: the SPIFFE ID, a URI like spiffe://prod.example.com/ns/payments/sa/checkout. That ID is delivered to the workload as a short-lived X.509 certificate (an SVID — SPIFFE Verifiable Identity Document) or a JWT.

The important properties:

No long-lived secrets. SVIDs rotate automatically, often hourly. A leaked cert is useless tomorrow.
Identity is attested, not declared. A workload doesn’t claim to be the checkout service; the platform proves it by inspecting the kernel, the Kubernetes API, or the cloud instance metadata.
It’s mutual. Both sides present and verify SVIDs. The client knows it’s talking to the real payments service, and payments knows the caller is really checkout.

SPIRE is the most common implementation: a server that acts as the certificate authority and policy brain, and an agent on every node that attests workloads and hands them SVIDs over a local Unix socket.

A minimal registration

You tell SPIRE how to recognize a workload with a registration entry. On Kubernetes you typically attest by namespace and service account:

spire-server entry create \
  -spiffeID spiffe://prod.example.com/ns/payments/sa/checkout \
  -parentID spiffe://prod.example.com/spire/agent/k8s_psat/prod/node-1 \
  -selector k8s:ns:payments \
  -selector k8s:sa:checkout \
  -ttl 3600

Now any pod running as the checkout service account in the payments namespace — and nothing else — receives that identity. There’s no secret to mount, no key to rotate by hand, no imagePullSecret lookalike to leak.

The workload fetches its SVID from the agent’s socket:

spire-agent api fetch x509 \
  -socketPath /run/spire/sockets/agent.sock \
  -write /tmp/svid

In real code you use the SPIFFE Workload API library so rotation is handled for you — the cert refreshes in the background and your TLS config picks it up.

Enforcing mTLS that means something

Handing out certs is half the job. The other half is requiring them and authorizing on identity. In Go, the go-spiffe library lets a server demand a specific caller:

// Only allow the checkout service to call this endpoint
authorizer := tlsconfig.AuthorizeID(
    spiffeid.RequireFromString("spiffe://prod.example.com/ns/payments/sa/checkout"),
)
tlsConfig := tlsconfig.MTLSServerConfig(source, source, authorizer)

That AuthorizeID is the whole point. Plenty of teams turn on mTLS, feel secure, and never check the peer identity — so any workload with a valid cert can call any service. That’s encryption without authorization, which is theater. Always pin the allowed caller identities.

If you run a service mesh, this is mostly handled for you. Istio issues SPIFFE-format identities and you express policy declaratively:

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: checkout-to-payments
  namespace: payments
spec:
  selector:
    matchLabels:
      app: payments
  action: ALLOW
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/payments/sa/checkout"]

The mesh refuses any connection whose verified identity isn’t on the list, regardless of source IP.

How this changes your threat model

Once identity is cryptographic and attested, several classes of attack get much harder:

Lateral movement. A compromised pod can’t impersonate another service. It only ever gets its own SVID, scoped to its own service account.
Stolen credentials. There’s no API key to steal that’s valid for more than an hour, and the key never leaves the node it was issued on.
Spoofed traffic. An attacker on the network can’t pretend to be an internal service without a valid, attested SVID — which they can’t mint.

This pairs naturally with network segmentation. Identity answers “who are you,” segmentation answers “are you even allowed on this path,” and you want both. I wrote up the complementary side in the broader security hardening guides.

Rollout without a big bang

You do not flip mTLS to “strict” everywhere on day one — you’ll black out half your traffic. Phase it:

Deploy SPIRE in observe mode. Issue identities, log them, change no policy.
Run permissive mTLS. Accept both mTLS and plaintext while clients migrate. Watch the metrics for which callers still come in plaintext.
Tighten per-namespace. Move one namespace at a time to strict mTLS once its inbound traffic is fully identified.
Authorize, then deny-by-default. Add AuthorizationPolicy rules, confirm the allow-list is complete, then drop the implicit allow.

The permissive window is where you find the forgotten cron job, the legacy VM, the debugging sidecar that nobody documented. Skipping it is how you turn a security upgrade into an outage.

Operating it day to day

A few things that bite teams later:

Trust domain boundaries are real. Pick your trust domain (prod.example.com) deliberately; federating two domains later is more work than getting it right once.
Watch SVID rotation. Alert if a workload’s cert age approaches its TTL without renewing — that means the agent socket broke and an outage is minutes away.
Keep the registration entries in code. Hand-created entries rot. Generate them from the same manifests that define your workloads, and review identity changes the way you’d review any access change — a quick pass through something like automated code review catches over-broad selectors before they ship.

Service identity feels like a lot of moving parts the first week. But the payoff is durable: you stop trusting the network and start trusting cryptographically-proven names, and that’s the foundation everything else in zero trust sits on.

Identity and policy examples here are starting points. Validate attestation selectors and authorization rules against your own environment before enforcing them in production.

mTLS and Service Identity with SPIFFE: Giving Every Workload a Real Name