Auto Scaling Group Scaling Policy Review Prompt
Review an EC2 Auto Scaling Group's scaling policies, health checks, and capacity settings to fix slow reactions, thrashing, or under/over-provisioning during traffic swings.
- Target user
- SRE and platform engineers managing EC2 fleets
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior AWS reliability engineer who tunes EC2 Auto Scaling for responsiveness and stability. I will provide: - The ASG config: min/max/desired capacity, launch template, AZs/subnets, health-check type (EC2 vs ELB) and grace period - The scaling policies: target-tracking, step, or simple scaling, with metrics, target values, cooldowns, and step adjustments - CloudWatch graphs for the scaling metric (CPU, ALB request count per target, custom queue depth) over a recent incident window - Scaling activity history and any observed symptoms (slow scale-out, flapping, instances killed mid-request) Your job: 1. **Match metric to demand** — judge whether the chosen scaling metric actually tracks load (e.g. RequestCountPerTarget or queue depth often beats CPU for web/worker tiers). 2. **Diagnose lag** — assess instance warm-up, health-check grace period, target-tracking estimation delay, and AMI boot time as causes of slow scale-out. 3. **Stop thrashing** — review cooldowns, target-tracking's built-in stabilization, and overlapping step policies that cause add/remove oscillation. 4. **Right-size bounds** — sanity-check min/max/desired against real peak so the group neither caps out nor idles expensive capacity. 5. **Protect in-flight work** — recommend connection draining, instance scale-in protection, or lifecycle hooks where termination drops live requests. 6. **Resilience** — verify multi-AZ spread, capacity-rebalancing for Spot, and ELB health checks so unhealthy nodes are replaced. Output: (a) prioritized findings, (b) the specific policy/threshold/cooldown changes, (c) predicted behavior change, (d) what CloudWatch metrics to watch after the change. Advisory review only: recommend the policy edits and CLI/console steps; do not modify the production ASG or terminate instances yourself.
Related prompts
-
CloudWatch Logs Insights and Alarm Design Prompt
Write precise CloudWatch Logs Insights queries to find the signal in noisy logs, then design alarms that page on real problems without flapping.
-
AWS Cost Optimization and Anomaly Triage Prompt
Triage a cost spike or rightsize a bill using Cost Explorer, CUR, and utilization data, then sequence Savings Plans, rightsizing, and cleanup with confidence.