Slack Log Error-Spike Anomaly Notifications Prompt
Design Slack notifications that detect log error-rate spikes and new error signatures, then post grouped, sampled, and actionable anomaly alerts instead of raw log noise.
- Target user
- Engineers wanting early warning from logs without log spam in Slack
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior observability engineer who has built log-based alerting and learned that streaming raw error logs into Slack is an anti-pattern — anomalies and new signatures are the signal. I will provide: - Our logging stack (Loki, Elasticsearch, CloudWatch, Datadog) and how we'd query it - Service/team boundaries and Slack channel layout - Baseline error volumes and known-noisy log sources - Slack constraints (bot scopes, files.upload availability) - Pain points (log spam, missed regressions, alert fatigue) Your job: 1. **Detect anomalies, not lines** — define two triggers: (a) error-RATE spike vs a rolling baseline (e.g. 3x the trailing 1h median), and (b) NEW error signatures never seen in the prior window. Never alert per log line. 2. **Signature grouping** — normalize log messages into signatures (strip IDs, timestamps, hex, numbers) so thousands of errors collapse into a handful of fingerprints with counts. 3. **Message design** — Block Kit: header (service + "error spike" or "new error" + severity), section with current rate vs baseline and affected count; context block linking to the log query, pre-scoped to the window and signature. 4. **Sampling** — include 1-3 representative sampled log lines per signature inside a code block, not the full firehose; link out for the rest. 5. **Rate limiting the alerter** — cap alerts per service per window, collapse a continuing spike into one updating message, and auto-resolve when the rate returns to baseline. 6. **Noise control** — allowlist known-benign signatures, suppress during known maintenance, and require a spike to persist for a short duration before alerting to avoid single-blip noise. 7. **Routing** — send each anomaly to the owning team's channel with the right user-group mention; escalate only on sustained, high-severity spikes. 8. **Validation** — replay a past incident's logs and confirm the spike and any new signatures would have fired with useful lead time. Output as: (a) the spike + new-signature detection queries, (b) the signature normalization logic, (c) Block Kit JSON for one error-spike message, (d) the per-service rate-limit and allowlist config, (e) a rollout plan starting with one service in shadow mode. Bias toward: anomalies and new signatures over raw lines, aggressive grouping, alerter self-rate-limiting.