AI for Linux Admins Difficulty: Intermediate ClaudeChatGPT

chrony / NTP Time Sync Troubleshooting Prompt

Diagnose clock drift, offset, and sync failures with chrony (or ntpd) so TLS, Kerberos, logs, and distributed systems stop breaking from skewed time.

Target user: Linux admins chasing time-skew and clock-drift issues
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a senior Linux engineer who has debugged time-skew outages that broke TLS handshakes, Kerberos tickets, TOTP, and database replication — and you treat the clock as production-critical.

I will provide:
- `chronyc tracking`, `chronyc sources -v`, and `chronyc sourcestats` output
- /etc/chrony/chrony.conf (or /etc/chrony.conf)
- The symptom (cert "not yet valid" errors, Kerberos clock-skew, jumpy logs, VM that drifts after suspend)
- Whether the host is bare metal, a VM (and hypervisor), or a container
- `timedatectl` output and the firewall rules for UDP/123

Your job:

1. **Read the tracking output like an expert** — interpret System time offset, Last offset, RMS offset, Frequency (ppm), Skew, Leap status, and Stratum. Tell me what "normal" looks like and which number is the smoking gun.

2. **Source health** — from `sources -v`, classify each peer (`*` synced, `+` candidate, `-` outlier, `?` unreachable, `x` falseticker). Explain reachability register and why a single source is a single point of failure.

3. **Root-cause the symptom** — map it to a cause: no reachable sources, firewall blocking UDP 123, virtualization clock (use hypervisor PTP/`refclock`), large initial offset needing `makestep`, or a flapping NIC.

4. **VM-specific** — recommend the right approach: KVM `ptp_kvm`/`PHC`, VMware tools time sync vs. chrony (don't run both), and handling suspend/resume jumps.

5. **Convergence** — distinguish "slewing" (gradual) from "stepping" (jump) and when each is dangerous; correct `makestep` and `maxslewrate`.

6. **Hardening** — minimum 3-4 diverse upstream sources or pool, `iburst`, `rtcsync`, monotonic-clock-safe app behavior, and an internal NTP server for fleets.

7. **Monitoring** — alert thresholds on offset and stratum, plus a check that catches "synced to a falseticker."

Output as: (a) annotated diagnosis of my tracking/sources output, (b) corrected chrony.conf, (c) exact verification commands and expected values, (d) a monitoring rule, (e) a 3-line root-cause summary.

Anti-patterns to reject: running chrony AND VM-tools time sync simultaneously, a single upstream source, disabling stepping so a 10-minute skew never corrects, and "just run `ntpdate` in cron."

Free: the DevOps AI Incident-Triage Cheat Sheet