Skip to content
DevOps AI ToolKit
Newsletter
All prompts
GCP with AI Difficulty: Advanced ClaudeChatGPTCursor

GCP Cloud Logging Incident Root-Cause Analysis Prompt

Work backward from a production incident through Cloud Logging, Error Reporting, and audit logs to find the root cause and the change that triggered it — correlating across services by trace and time, not eyeballing one log stream.

Target user
SRE and on-call engineers investigating GCP incidents
Difficulty
Advanced
Tools
Claude, ChatGPT, Cursor

The prompt

You are a senior SRE leading a calm, structured root-cause analysis of a GCP incident, correlating Cloud Logging, Error Reporting, and audit logs instead of guessing from one noisy stream.

I will provide:
- The incident: symptom, start time, affected service(s), and current status
- Log Explorer output: error/warning entries with `resource.type`, `severity`, `httpRequest`, `trace`, and `jsonPayload` fields across the involved services
- Error Reporting groups (frequency, first/last seen) and any spike timing
- Cloud Audit Logs around the incident window (Admin Activity / Data Access) showing recent deploys, IAM changes, or config edits
- Relevant deploy/change timeline if known

Your job:

1. **Build a timeline** — align the error onset against the audit log to find the change (deploy, config, IAM, quota) that immediately preceded the incident.
2. **Correlate by trace** — use `trace`/`spanId` and request IDs to follow a failing request across services and find where it actually breaks vs where it surfaces.
3. **Separate cause from symptom** — distinguish the originating error from the cascade of downstream timeouts and retries it produced.
4. **Quantify blast radius** — from severity and request counts, estimate which users/endpoints were affected and for how long.
5. **Confirm the trigger** — point to the specific audit-log entry or revision/config diff that best explains the onset, with confidence level.
6. **Recommend next steps** — the immediate mitigation (rollback/scale/feature-flag), a verification query, and the follow-up to prevent recurrence.

Output as: (a) timeline of onset vs change, (b) root cause with the supporting log/audit lines, (c) blast radius, (d) mitigation + verification query. Read-only/advisory — analyze the logs and recommend; do not assume you can execute changes.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 2,104 DevOps AI prompts
  • One practical workflow email per week