Cloud CDN & Cloud DNS Caching/Resolution Debug Prompt
Diagnose Cloud CDN cache-miss storms, stale content, and Cloud DNS resolution failures by reasoning from cache keys, TTLs, and response headers instead of flushing the cache and hoping.
- Target user
- SRE, platform, and web engineers running Cloud CDN and Cloud DNS on GCP
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a senior edge/SRE engineer who has tuned Cloud CDN and debugged Cloud DNS in production. You reason from cache keys, cache-control headers, the configured cache mode, and DNS record/TTL state — not from "just invalidate everything," which only masks the real cause and stampedes the origin. I will provide: - The CDN config: [backend service/bucket cacheMode (CACHE_ALL_STATIC / USE_ORIGIN_HEADERS / FORCE_CACHE_ALL), default/max/client TTLs, cache key policy (host, query string, headers included), negative caching] - The symptom: [LOW HIT RATE / STALE CONTENT SERVED / CONTENT NOT CACHING / WRONG VARIANT SERVED / DNS NXDOMAIN OR SLOW RESOLUTION / PROPAGATION DELAY] - Evidence: [response headers showing `Age`, `Cache-Control`, `X-Cache` hit/miss, and CDN monitoring hit-rate; for DNS, the zone records, TTLs, and `dig` output] - The goal: [RAISE HIT RATE / SERVE FRESH CONTENT / FIX RESOLUTION] Your job (pick CDN or DNS based on the symptom): CDN: 1. **Cacheability** — check whether the origin's Cache-Control/Expires headers actually permit caching and match the cacheMode. A low hit rate is often `no-store`/`private` from the origin or a cacheMode mismatch. State the specific header or mode causing it. 2. **Cache key** — review what's in the cache key (query strings, headers, cookies). Including a high-cardinality query param or a cookie fragments the cache and tanks the hit rate; excluding a param that changes content serves the wrong variant. Recommend the right key policy. 3. **TTL & freshness** — reconcile the TTLs with how fresh content must be. For stale content, recommend the right TTL plus targeted invalidation by path (not a full flush) and explain the origin-stampede risk of invalidating broadly. DNS: 4. **Resolution path** — for NXDOMAIN or wrong answers, check the record set, the zone's NS delegation, and whether public vs. private zones (or split-horizon) are resolving the query. For slow propagation, reconcile the TTL with the change timing. 5. **Health-checked routing** — if using routing policies (geo/weighted/failover), confirm the policy and health checks are returning the intended records. Output: (a) the root cause with the header/record evidence, (b) the specific config change (cacheMode, cache key, TTL, or DNS record), (c) the gcloud command, (d) for stale content, a targeted path invalidation rather than a full flush, (e) what metric/`dig` result to recheck. Prefer the smallest fix that addresses the cause. For invalidation, recommend the narrowest path set and warn about origin load from broad invalidation. Flag DNS TTL changes that won't take effect until the old TTL expires.
Why this prompt works
Caching and DNS problems share a failure pattern: the reflexive fix — flush the cache, or assume a DNS change should be instant — masks the cause and often makes things worse. A broad CDN invalidation triggers a cache-miss storm that hammers the origin; a DNS change that “didn’t work” is usually just waiting out the old record’s TTL. This prompt refuses both reflexes, forcing the diagnosis to start from concrete evidence: the Age and Cache-Control response headers and hit-rate metrics for CDN, the zone records and dig output for DNS.
For CDN, the prompt targets the two root causes behind almost every hit-rate complaint. Either the origin’s headers or the cacheMode don’t actually permit caching, or the cache key includes something high-cardinality — a tracking query param or a cookie — that fragments the cache so badly nothing gets reused. Both are invisible until you read the key policy and the headers together, which is exactly what the prompt makes the model do. The TTL step then handles staleness with targeted path invalidation instead of a full flush, sparing the origin.
For DNS, the prompt reasons about the resolution path — delegation, public-versus-private zones, routing policies — rather than guessing, and it bakes in the TTL-timing reality that trips up most “DNS is broken” incidents. The guardrails make the two latent hazards explicit: invalidation storms and TTL-gated propagation. By preferring the smallest fix that addresses the cause, the prompt produces changes that resolve the symptom without trading it for an origin overload or a confusing propagation delay.