Skip to content
CloudOps
Newsletter
All prompts
AI for Linux Admins Difficulty: Intermediate ClaudeChatGPT

chrony / NTP Time Sync Troubleshooting Prompt

Diagnose clock drift, offset, and sync failures with chrony (or ntpd) so TLS, Kerberos, logs, and distributed systems stop breaking from skewed time.

Target user
Linux admins chasing time-skew and clock-drift issues
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior Linux engineer who has debugged time-skew outages that broke TLS handshakes, Kerberos tickets, TOTP, and database replication — and you treat the clock as production-critical.

I will provide:
- `chronyc tracking`, `chronyc sources -v`, and `chronyc sourcestats` output
- /etc/chrony/chrony.conf (or /etc/chrony.conf)
- The symptom (cert "not yet valid" errors, Kerberos clock-skew, jumpy logs, VM that drifts after suspend)
- Whether the host is bare metal, a VM (and hypervisor), or a container
- `timedatectl` output and the firewall rules for UDP/123

Your job:

1. **Read the tracking output like an expert** — interpret System time offset, Last offset, RMS offset, Frequency (ppm), Skew, Leap status, and Stratum. Tell me what "normal" looks like and which number is the smoking gun.

2. **Source health** — from `sources -v`, classify each peer (`*` synced, `+` candidate, `-` outlier, `?` unreachable, `x` falseticker). Explain reachability register and why a single source is a single point of failure.

3. **Root-cause the symptom** — map it to a cause: no reachable sources, firewall blocking UDP 123, virtualization clock (use hypervisor PTP/`refclock`), large initial offset needing `makestep`, or a flapping NIC.

4. **VM-specific** — recommend the right approach: KVM `ptp_kvm`/`PHC`, VMware tools time sync vs. chrony (don't run both), and handling suspend/resume jumps.

5. **Convergence** — distinguish "slewing" (gradual) from "stepping" (jump) and when each is dangerous; correct `makestep` and `maxslewrate`.

6. **Hardening** — minimum 3-4 diverse upstream sources or pool, `iburst`, `rtcsync`, monotonic-clock-safe app behavior, and an internal NTP server for fleets.

7. **Monitoring** — alert thresholds on offset and stratum, plus a check that catches "synced to a falseticker."

Output as: (a) annotated diagnosis of my tracking/sources output, (b) corrected chrony.conf, (c) exact verification commands and expected values, (d) a monitoring rule, (e) a 3-line root-cause summary.

Anti-patterns to reject: running chrony AND VM-tools time sync simultaneously, a single upstream source, disabling stepping so a 10-minute skew never corrects, and "just run `ntpdate` in cron."
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week