Writing Cloud Monitoring MQL and Log Explorer Queries With

At 2am during an incident, I needed the 99th-percentile latency of a Cloud Run service grouped by revision, as a rate over five-minute windows. I know that’s expressible in MQL. I did not, at 2am, remember the exact align, group_by, and percentile syntax to write it. I burned ten minutes on documentation while the incident burned. That’s the recurring problem with GCP’s query languages: Monitoring Query Language and the Log Explorer filter syntax are both expressive enough to answer almost anything and arcane enough that nobody writes them fluently from memory. So I stopped trying to. I describe the question in English, let AI draft the query, and I verify it runs and returns what I expect.

MQL: describe the metric, get the query

MQL’s pipeline model — fetch, then align, then group_by, then a reducer — is logical once you know it and opaque until you do. The model knows the grammar cold. I give it the metric type and the shape of answer I want:

Prompt: “Write a GCP Monitoring Query Language (MQL) query for the metric run.googleapis.com/request_latencies on resource type cloud_run_revision. I want the 99th percentile latency, grouped by the revision_name label, aligned to 5-minute windows over the last hour. Explain each pipeline stage in one line.”

fetch cloud_run_revision
| metric 'run.googleapis.com/request_latencies'
| align delta(5m)
| group_by [resource.revision_name], [value_p99: percentile(value.request_latencies, 99)]
| every 5m

I always verify it against the real metric explorer before trusting a number from it, because the model can pick a plausible-but-wrong metric type or a reducer that double-counts. The grammar it gets right; the semantics — is this metric a distribution, a gauge, a delta? — I confirm. A latency metric is a distribution, so percentile is valid; if the model had reached for mean on a counter, I’d catch it by sanity-checking the magnitude.

Build an alert policy from the query

Once the MQL is right, the model drafts the full alert policy, which is otherwise a wall of nested JSON nobody enjoys writing:

Prompt: “Turn that MQL query into a GCP alerting policy condition. Fire when p99 latency exceeds 2000ms for 5 minutes. Output the JSON for gcloud alpha monitoring policies create --policy-from-file. Include a clear display name and documentation field, no notification channels (I’ll attach those).”

gcloud alpha monitoring policies create --policy-from-file=p99-latency-alert.json

I review the threshold and the duration by hand — those are judgment calls about my SLO, not syntax the model can know. Leaving notification channels out of the AI’s scope is deliberate: I attach those myself so a generated policy can never page the wrong on-call.

Log Explorer: the filter language is the other half

The Logging query language has its own syntax, and combining resource type, severity, and a jsonPayload field filter trips people up. I describe the hunt:

Prompt: “Write a GCP Log Explorer query that finds all logs from Cloud Run service checkout-api with severity ERROR or higher, where jsonPayload.message contains ‘timeout’, in the last 6 hours. Then give me the equivalent gcloud logging read command with the right --freshness flag.”

gcloud logging read \
  'resource.type="cloud_run_revision"
   AND resource.labels.service_name="checkout-api"
   AND severity>=ERROR
   AND jsonPayload.message:"timeout"' \
  --freshness=6h --limit=100 --format=json

The : operator for substring vs = for exact match is a constant source of empty result sets, and AI picks the right one when I tell it “contains.” Small thing, but it’s the difference between zero rows and the rows I needed.

Let AI read the results, not just write the query

The query is half the job; reading 100 JSON log lines is the other half. I hand the output straight back:

Prompt: “Here are 100 Cloud Run error logs (JSON). Cluster them by root cause — group similar jsonPayload.message values, ignoring request IDs and timestamps. Give me a table of cause, count, and one example, sorted by count. Tell me which cluster started most recently.”

That “started most recently” question is the gold during an incident: it points at what changed. The model is genuinely good at fuzzy-clustering messages that differ only in IDs, which is miserable to do by eye.

Log-based metrics and dashboards

To make a recurring error trackable, I have AI draft a log-based metric so I can alert on its rate later:

gcloud logging metrics create checkout_timeouts \
  --description="Count of checkout-api timeout errors" \
  --log-filter='resource.type="cloud_run_revision"
    AND resource.labels.service_name="checkout-api"
    AND jsonPayload.message:"timeout"'

Prompt: “Given that log-based metric, write the MQL to chart its rate per minute, and a one-line description of what a healthy baseline looks like so I know what threshold to alert on.”

Where the human stays in the loop

The pattern across MQL, log filters, alert policies, and log-based metrics is the same: the syntax is hard and the model knows it, so I delegate the grammar entirely. But which metric, which threshold, which severity, and what counts as healthy — those are judgment, and I own them. I verify every generated query returns sane numbers against the real backend before I build an alert on it, because a query that runs cleanly can still measure the wrong thing.

The reusable versions of these prompts are in my prompts collection, and the rest of the GCP with AI series covers the services these queries tend to watch. You don’t need to memorize MQL. You need to verify it.

Writing Cloud Monitoring MQL and Log Explorer Queries With AI