AI workflows that survive contact with production
Battle-tested prompts, free in-browser tools for incident triage and alert rules, and deep guides for the stack you actually run — Linux, OpenStack, Kubernetes, Terraform, Prometheus. Safety and back-out steps baked into every prompt.
What are you trying to do?
Choose your path and get to the right DevOps workflow faster.
I'm troubleshooting an incident
Most urgentUse the AI Incident Response Assistant to triage production issues, organize evidence, and generate next-step commands.
I need prompts
Browse battle-tested DevOps prompts for Kubernetes, Terraform, OpenStack, Linux, GitLab CI, and more.
Browse promptsI need validators
Validate YAML, JSON, Terraform, GitLab CI, Kubernetes, Helm, Ansible, NGINX, and Dockerfile configs.
Open validatorsI need OpenStack help
Troubleshoot OpenStack 504s, Cinder timeouts, RabbitMQ RPC issues, Neutron agents, Keystone, Horizon, and Kolla-Ansible.
View OpenStack guidesI want an audit
Get a senior DevOps review of your incident workflow, infrastructure automation, monitoring, or cloud operations.
Request an auditFree, no signup to start · For Linux admins, SREs & platform teams · New here? Start here →
- Linux
- OpenStack
- Kubernetes
- GitLab CI
- Prometheus
- Terraform
- Grafana
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.
From "what's broken?" to a safe next step
Pick your task
Incident triage, a Prometheus alert rule, a Terraform plan review, a stuck OpenStack VM — find the prompt or free tool that matches.
Fill in your specifics
Every prompt ships with fill-in placeholders, a real worked example, and safety + back-out notes. Paste into Claude, ChatGPT, or Cursor.
Save, export & run it
Upgrade to Pro to run the dashboards unlimited, save your work across devices, and export to YAML / JSON / PDF — or use the free library forever.
Featured categories
Pick a stack. Get prompts, guides, and reviews tuned for it.
-
AI for Linux Admins
Diagnose, automate, and harden Linux servers using AI assistants. Ubuntu, RHEL, Debian, Rocky.
-
AI for OpenStack
Troubleshoot Nova, Neutron, Cinder, RabbitMQ, and Keystone with AI-assisted workflows.
-
AI for Prometheus & Monitoring
Write better alert rules, PromQL queries, and Grafana dashboards with AI.
-
AI for Grafana
Design dashboards, panels, and alerts, wire data sources (Prometheus, Loki, Tempo, SQL), and debug Grafana errors with AI-assisted workflows.
-
AI for GitLab CI/CD
Debug pipelines, generate jobs, and review .gitlab-ci.yml with AI.
-
AI for Bash & Python Automation
Generate, review, and harden automation scripts. Idempotent, safe, production-ready.
-
AI for Incident Response
Faster RCAs, postmortems, runbooks, and on-call workflows powered by AI.
-
AI for Kubernetes & Helm
Troubleshoot clusters, review manifests, generate Helm charts, debug pods, and harden Kubernetes workloads with AI-assisted workflows.
-
AI for Infrastructure as Code
Generate, review, refactor, and secure Ansible, Helm, and cloud infrastructure code with AI.
-
AI for Terraform
Design state, modules, providers, and workflows. Plan reviews, drift detection, large-state refactors, and policy-as-code with AI.
-
AI for DevOps Security & Hardening
Use AI to review infrastructure security, harden Linux servers, detect risky commands, audit CI/CD pipelines, and improve production safety.
-
AI for Slack
Build smarter Slack workflows: ChatOps bots, alert routing, incident channels, on-call handoffs, message summarization, and webhook security.
-
AI for Microsoft Teams
AI-powered Teams workflows: adaptive cards, webhook routing, Bot Framework ChatOps, Power Automate flows, meeting transcripts to postmortems, Graph API automation.
-
AI for Automation
Automate runbooks, toil, and event-driven workflows with AI: intelligent runbook selection, self-healing, ChatOps automation, and orchestration across your stack.
-
AI for Ansible
Write, refactor, and debug Ansible playbooks, roles, and inventories with AI — idempotent tasks, Jinja2 templates, Vault secrets, and safe rolling changes.
-
AI for NGINX
Configure, debug, and harden NGINX with AI — reverse proxy, TLS, rate limiting, caching, location-block precedence, and performance tuning.
-
AI for Postgres
Tune, debug, and design PostgreSQL with AI — slow queries and EXPLAIN plans, indexing, vacuum/bloat, replication, and safe schema migrations.
-
AI for MySQL
Optimize and troubleshoot MySQL and MariaDB with AI — query tuning, InnoDB internals, indexing, replication, deadlocks, and zero-downtime migrations.
-
AI for RabbitMQ
Design and debug RabbitMQ with AI — exchanges and routing, queue backpressure, dead-lettering, clustering and quorum queues, and consumer reliability.
-
Reduce MTTR with AI
Cut mean time to resolution with AI — faster detection and triage, alert correlation, instant runbooks, quicker root-cause analysis, and tighter postmortem-to-fix loops.
-
Post Mortems with AI
Write better postmortems with AI — draft timelines from chat and alerts, keep the language blameless, surface contributing factors, and turn findings into action items that ship.
-
AWS with AI
Build, debug, and secure AWS with AI — IAM and least privilege, VPC and networking, EC2/ECS/EKS, Lambda, S3, CloudFormation and CDK, and cost control.
-
Azure with AI
Design, troubleshoot, and harden Azure with AI — RBAC and Entra ID, VNets and NSGs, AKS, App Service, Functions, Bicep and ARM, and cost management.
-
GCP with AI
Operate and secure Google Cloud with AI — IAM, VPC and firewall rules, GKE, Cloud Run, Cloud Functions, Terraform, and billing and cost optimization.
-
Docker with AI
Build, debug, and harden Docker with AI — Dockerfiles, image builds, registries, networking, volumes, the daemon, and container runtime errors.
-
AI for Kafka
Operate and debug Apache Kafka with AI — brokers and controllers, partitions and ISR, producers and consumers, KRaft and ZooKeeper, rebalances, retention, and throughput tuning.
-
AI for Redis
Operate and debug Redis with AI — data structures and memory, persistence (RDB/AOF), replication and Sentinel, Cluster sharding, eviction and TTLs, latency, and reliability.
Jump straight to your stack
Top errors, best prompts, a free tool, and a runbook — tailored to the stack you actually run.
- Kubernetes
- OpenStack
- Terraform
- GitLab CI
- Prometheus
- Linux
- RabbitMQ
- Docker
- Ansible
- Helm
- NGINX
- Kafka
- PostgreSQL
- Grafana
- Redis
- MySQL
Start here — the most useful reads
- 1 How AI Reduces DevOps Incident Response Time (MTTR Guide) Reduce MTTR with AI · 16 min read
- 2 The Most Common Linux Server Problems (and How to Fix Them) AI for Linux Admins · 18 min read
- 3 How to Use AI to Troubleshoot Kubernetes Clusters Faster AI for Kubernetes & Helm · 16 min read
- 4 The Best Way to Learn Terraform for Real Infrastructure AI for Terraform · 18 min read
- 5 How AI Helps DevOps Engineers Write Better Terraform Code AI for Terraform · 15 min read
- 6 Top 25 GitLab CI/CD Pipeline Mistakes (and How to Avoid Them) AI for GitLab CI/CD · 20 min read
- 7 How to Build a Production-Ready OpenStack Cloud (2026 Guide) AI for OpenStack · 20 min read
- 8 The Best AI Prompts for Linux System Administrators AI for Linux Admins · 16 min read
- 9 How DevOps Teams Use AI to Reduce Cloud Costs (FinOps) AI for Automation · 16 min read
- 10 What Does a Senior DevOps Engineer Do Every Day? AI for Automation · 15 min read
The prompts engineers reach for most
- 1 Prometheus Alert Rule Generator Prometheus & Monitoring · Intermediate
- 2 Kubernetes Node NotReady Diagnosis Kubernetes & Helm · Advanced
- 3 Linux Host Network Connectivity Debug Linux Admins · Intermediate
- 4 Dockerfile Security Review DevOps Security & Hardening · Beginner
- 5 Terraform Remote Backend Migration Terraform · Advanced
- 6 Ansible Playbook Generator Bash & Python Automation · Intermediate
- 7 GitLab CI/CD `rules:` Debugging GitLab CI/CD · Intermediate
- 8 NGINX 502/504 Bad Gateway Triage NGINX · Intermediate
- 9 Alert-Storm Correlation and Triage Incident Response · Beginner
- 10 Nova Instance Stuck-State Recovery OpenStack · Intermediate
- 11 Postgres Slow Query EXPLAIN Triage Postgres · Intermediate
- 12 MySQL Slow Query Log + EXPLAIN Tuning MySQL · Intermediate
- 13 RabbitMQ Queue Investigation RabbitMQ · Advanced
Featured AI prompts for cloud engineers
- AI for Infrastructure as Code Intermediate
Ansible Vault Secrets Management Prompt
Use Ansible Vault — encrypt secrets, vault IDs, multi-vault setups, integration with external secret managers.
- Claude
- ChatGPT
Open prompt - AI for GitLab CI/CD Intermediate
GitLab CI/CD → Kubernetes Deploy Patterns Prompt
Design GitLab CI/CD pipelines that deploy to Kubernetes — kubectl vs Helm vs Kustomize, secrets handling, multi-environment promotion, GitOps comparison.
- Claude
- ChatGPT
Open prompt - AI for GitLab CI/CD Intermediate
GitLab CI/CD Pipeline Optimization Prompt
Speed up slow GitLab pipelines — DAG with `needs:`, cache vs artifacts, parallel jobs, image pre-builds, dependency proxy, and shallow clones.
- Claude
- ChatGPT
Open prompt - AI for Prometheus & Monitoring Advanced
Grafana Loki + Prometheus Correlation Prompt
Correlate metrics and logs in Grafana — exemplars from Prometheus to traces, derived fields from Loki, jump from spike to log line.
- Claude
- ChatGPT
Open prompt - AI for Kubernetes & Helm Intermediate
Helm Chart Review Prompt
Get a senior-engineer review of a Helm chart — values hygiene, template correctness, security defaults, upgrade safety.
- Claude
- ChatGPT
- Cursor
Open prompt - AI for Infrastructure as Code Intermediate
Infrastructure as Code Security Review Prompt
AI security review of Terraform, CloudFormation, or Helm charts — surface dangerous defaults, missing encryption, overly-permissive IAM, and exposed services.
- Claude
- ChatGPT
Open prompt
DevOps & AI guides
- AI for Automation · 10 min read
Infrastructure Monitoring Explained for Cloud Engineers
Discover how infrastructure monitoring explained can enhance system health and performance. Learn key strategies for proactive incident prevention.
Read guide - AI for Grafana · 8 min read
Grafana Error Guide: 'Access denied' — Dashboard & Folder Permissions
Fix 'Access denied' to a Grafana dashboard or folder — check org role, folder/dashboard permissions, team membership, RBAC roles, and provisioned permission rules to restore access.
Read guide - AI for Grafana · 10 min read
Grafana Error Guide: 'failed to evaluate rule' — fixing unified alerting rule Error state
Fix 'failed to evaluate rule' in Grafana unified alerting — check datasource UID, query timeouts, NoData/Error handling, expressions and evaluation_timeout.
Read guide - AI for Grafana · 8 min read
Grafana Error Guide: '502 Bad Gateway' from the Datasource Proxy — Fix Unreachable Backends
Fix Grafana datasource proxy 502 Bad Gateway: diagnose unreachable backend, wrong datasource URL, TLS handshake failures, DNS errors, and connection-refused issues.
Read guide - AI for Grafana · 8 min read
Grafana Error Guide: 'Dashboard cannot be deleted because it was provisioned'
Fix 'Dashboard cannot be deleted because it was provisioned' in Grafana — remove the source JSON, set disableDeletion, or unprovision the provider, then reload provisioning to delete it cleanly.
Read guide
AI tools we actually use
-
ChatGPT
by OpenAI
4.6The broadest AI ecosystem with deep plugin support and the largest user community.
- Best for
- Ansible/Terraform generation, fast scaffolding, plugin-heavy workflows
- Pricing
- Free tier; Plus $20/mo; Team & Enterprise tiers
Read review -
Claude
by Anthropic
4.8The most cautious and context-aware AI assistant for infrastructure work.
- Best for
- Production troubleshooting, postmortems, IaC review
- Pricing
- Free tier; Pro $20/mo; Team & Enterprise tiers
Read review -
Cursor
by Anysphere
4.7The AI-first code editor that understands your whole repo.
- Best for
- Editing real IaC repos — Helm charts, Terraform modules, K8s operators
- Pricing
- Free tier (limited); Pro $20/mo; Business $40/seat/mo
Read review -
Amazon Q Developer
by Amazon Web Services
4.3AWS's AI assistant for building and operating on AWS — IaC, CLI, and resource Q&A grounded in your account.
- Best for
- Building & operating on AWS — CloudFormation/CDK/Terraform, CLI, and AWS resource troubleshooting
- Pricing
- Free tier (generous); Pro $19/user/mo
Read review
The following are representative examples of the types of feedback we aim to earn from clients. They are illustrative until public customer references become available.
“We were dealing with recurring OpenStack issues that took hours to diagnose. The structured troubleshooting process quickly narrowed the problem down and gave us a repeatable workflow our team still uses.”
“The Terraform review highlighted issues our internal reviews had missed and provided practical recommendations we could implement immediately. The guidance was concise, actionable, and clearly based on real operational experience.”
Senior infrastructure audits, fixed price
Get a senior set of eyes on your stack — fixed-price OpenStack/Kolla-Ansible, Kubernetes, Terraform, and observability audits from $250. Rare private-cloud expertise most consultants simply don't have.
James Joyner IV
Sr. Systems Software Engineer · San Jose, CA
I build and run large-scale, widely distributed Linux systems — AWS, CentOS/Ubuntu, and private cloud on OpenStack with Kolla-Ansible — and I live in the observability and on-call that comes with them (Prometheus, VictoriaMetrics, Grafana). I started DevOps AI ToolKit to share the AI workflows, prompts, and runbooks that actually survive contact with production.