AWS Auto Scaling · Cost Optimization Guide
AWS Auto Scaling Cost Optimization: Stop Paying for Idle Capacity
Most startups configure auto scaling once during launch and never revisit it. The result: EC2 fleets running at 10–20% CPU because scaling targets are set too conservatively. Here’s how target tracking, mixed instances, and scheduled scaling fix this.
Why Wrong Scaling Configs Are So Expensive
Auto scaling configuration is set once and forgotten. But unlike a one-off over-provisioning mistake, a conservative scaling policy compounds - you pay for the excess capacity every hour, on every instance, indefinitely. Teams running at 30-40% utilization are paying for the cloud capacity they aren’t using.
Common mistake
Target CPU at 20–30%
Paying for 70–80% idle capacity on every instance
Common mistake
Static min/max with no dynamic policy
Fleet stays at max capacity around the clock regardless of traffic
Common mistake
100% On-Demand in the ASG
No Spot discount applied even for stateless, fault-tolerant workloads
Common mistake
Dev/test ASGs running 24/7
Non-prod environments consuming 100% of business-hours cost overnight
Target Tracking: The Right CPU Targets
Target tracking is the correct default for most workloads - AWS adjusts capacity automatically to maintain your target metric, scaling in aggressively during low traffic. The critical question is what target to set.
Why teams set targets too low
Fear of spikes. A 30% CPU target feels safe because there’s 70% headroom. But AWS target tracking already builds in a buffer - it scales before you hit the target, using a scale-out alarm tuned to trigger before the target is breached. A 60% target doesn’t mean instances hit 60% before new ones launch.
| Strategy | CPU Target | Best for | Trade-off |
|---|---|---|---|
| Optimize for availability | 40% CPU | Unpredictable, bursty traffic where latency spikes are costly | Highest cost - maximum headroom maintained |
| Balance availability and cost | 50% CPU | Standard API or web workloads with moderate traffic variance | Good balance - AWS recommended default starting point |
| Optimize for cost | 60–70% CPU | Stable, predictable workloads with gradual ramp-up patterns | Lowest cost - less headroom, requires tuning warmup times |
Target tracking vs step scaling for cost
Step scaling requires manually defined CloudWatch alarms and fixed scaling increments. A misconfigured step scaling policy adds too many instances at once and then leaves them running through the full cooldown period. Target tracking is proportional - it scales to the smallest fleet that keeps the metric at target. AWS recommends target tracking over step scaling for CPU and request-count metrics.
Where step scaling still makes sense: workloads with extreme spike characteristics (e.g. flash sales, cron job bursts) where you need to add 10+ instances instantly at a specific threshold rather than gradually.
Instance warmup time: the hidden over-provisioning cause
If your application takes 3 minutes to pull a container image and initialize, but your instance warmup is set to 30 seconds, AWS will keep launching new instances because it thinks the first ones haven’t started contributing. Set warmup time to match your actual application startup - including Docker pull, JVM init, or Node module load. Thrashing from incorrect warmup settings is one of the most common over-provisioning causes we find in audits.
Mixed Instances Policy: RI Baseline + Spot Burst
A mixed instances policy lets a single ASG run On-Demand (or RI-covered) instances as a stable baseline while using Spot for burst capacity. This is how you get Spot savings without accepting Spot risk on your entire fleet.
| Configuration | Monthly (20× m5.xlarge baseline) | Risk profile | Savings |
|---|---|---|---|
| 100% On-Demand, single type | $4,200 | Full price, no flexibility | - |
| On-Demand only, multi-type | $4,200 | Same cost, slightly better availability | ~0% |
| RI baseline (30%) + On-Demand burst (70%) | $3,360 | Commitment risk on baseline only | ~20% |
| RI baseline (30%) + Spot burst (70%) | $2,730 | Spot interruptions on burst capacity | ~35% |
Illustrative figures based on us-east-1 m5.xlarge On-Demand ($0.192/hr), 1-yr no-upfront RI (~20% off), and Spot average (~70% off On-Demand). Actual Spot savings vary by instance family and AZ.
Allocation strategy
Use price-capacity-optimized
This is AWS’s recommended default for Spot in mixed instances groups. It balances lowest price with pool capacity availability, reducing interruption risk vs the lowest-price strategy while keeping costs near the floor.
Instance type diversification
Specify 6–10 compatible types
Spot interruption risk is inversely proportional to pool diversity. m5.xlarge, m5a.xlarge, m5n.xlarge, m6i.xlarge, and m6a.xlarge are all ABI-compatible - mix them freely. The more pools, the lower the chance all of them are reclaimed simultaneously.
Scheduled Scaling for Dev/Test: The Easiest 65% Saving
Scheduled scaling pre-sets desired capacity at specific times. For non-production environments that don’t need 24/7 availability, it’s the single highest-ROI change available with zero performance trade-off.
Business hours only (8am–6pm weekdays)
50 hrs/week vs 168 hrs/week
~70% compute reduction
Scale to 0 overnight + weekends, min 1 during day
Instances terminated when idle
65–70% cost reduction
Prod-like scale 9am–5pm, minimal outside
Min=0 overnight, min=2 daytime
50–60% reduction
What to schedule
- Set desired=0, min=0 at 6pm weekdays and all weekend
- Set desired=2, min=1 at 8am weekdays
- Use cron expressions in UTC - teams in multiple timezones need coordination
- Pair with RDS stop/start schedules for full non-prod savings
- Tag the ASG with Environment=staging to exclude from production monitoring alerts
Most startups we audit have staging environments running 24/7 at full capacity. A $5,000 staging bill becomes $1,500 with a scheduled scale-to-zero policy. This change takes under an hour to implement in Terraform.
Karpenter for EKS: Replace Cluster Autoscaler for 30–60% Node Savings
Cluster Autoscaler scales node groups up and down. Karpenter goes further - it consolidates underutilized nodes, selects the cheapest available instance type per workload, and integrates Spot natively without separate node groups. On a 50-node EKS cluster, the difference is typically $8K–$15K per month.
Bin-packing consolidation
Karpenter continuously replaces underutilized nodes with fewer, larger ones. Cluster Autoscaler only removes idle nodes - it doesn't consolidate partially-used ones. This alone delivers 30–60% node cost reduction on typical clusters.
800+ instance types via EC2 Fleet
Rather than choosing instance types per node group, Karpenter dynamically selects the cheapest available type that fits pending pods. This results in 20–40% better price-performance than fixed node groups.
Native Spot + On-Demand mixing
A single NodePool can mix Spot and On-Demand instances with millisecond-level fallback. Cluster Autoscaler requires separate node groups per purchase type, creating configuration sprawl.
Node provisioning in 55 seconds vs 3–4 minutes
Faster provisioning means less over-provisioning buffer needed. When scale-out takes 4 minutes, teams pad min capacity. Karpenter's speed lets you run leaner.
Karpenter v1.0+ - stable for production
Karpenter v1.0 stabilized the NodePool and EC2NodeClass APIs in 2024. It’s now the recommended approach for any AWS-heavy EKS deployment focused on cost optimization. Migration from Cluster Autoscaler typically takes 1–2 days for a standard cluster and involves no application changes - only node group configuration changes in Terraform.