AWS Auto Scaling · Cost Optimization Guide

AWS Auto Scaling Cost Optimization: Stop Paying for Idle Capacity

Most startups configure auto scaling once during launch and never revisit it. The result: EC2 fleets running at 10–20% CPU because scaling targets are set too conservatively. Here’s how target tracking, mixed instances, and scheduled scaling fix this.

Target 60–70% CPU, not 20–30%

Mixed instances: RI + Spot = ~35% savings

Scheduled scale-to-zero for dev/test

Karpenter: 30–60% node cost reduction on EKS

Why Wrong Scaling Configs Are So Expensive

Auto scaling configuration is set once and forgotten. But unlike a one-off over-provisioning mistake, a conservative scaling policy compounds - you pay for the excess capacity every hour, on every instance, indefinitely. Teams running at 30-40% utilization are paying for the cloud capacity they aren’t using.

Common mistake

Target CPU at 20–30%

Paying for 70–80% idle capacity on every instance

Target 60–70% CPU - AWS buffers headroom automatically

Common mistake

Static min/max with no dynamic policy

Fleet stays at max capacity around the clock regardless of traffic

Target tracking scales down during low-traffic windows

Common mistake

100% On-Demand in the ASG

No Spot discount applied even for stateless, fault-tolerant workloads

Mixed instances: RI baseline + Spot burst = up to 35% blended savings

Common mistake

Dev/test ASGs running 24/7

Non-prod environments consuming 100% of business-hours cost overnight

Scheduled scale-to-zero saves 65–70% on non-prod compute

Target Tracking: The Right CPU Targets

Target tracking is the correct default for most workloads - AWS adjusts capacity automatically to maintain your target metric, scaling in aggressively during low traffic. The critical question is what target to set.

Why teams set targets too low

Fear of spikes. A 30% CPU target feels safe because there’s 70% headroom. But AWS target tracking already builds in a buffer - it scales before you hit the target, using a scale-out alarm tuned to trigger before the target is breached. A 60% target doesn’t mean instances hit 60% before new ones launch.

Strategy	CPU Target	Best for	Trade-off
Optimize for availability	40% CPU	Unpredictable, bursty traffic where latency spikes are costly	Highest cost - maximum headroom maintained
Balance availability and cost	50% CPU	Standard API or web workloads with moderate traffic variance	Good balance - AWS recommended default starting point
Optimize for cost	60–70% CPU	Stable, predictable workloads with gradual ramp-up patterns	Lowest cost - less headroom, requires tuning warmup times

Target tracking vs step scaling for cost

Step scaling requires manually defined CloudWatch alarms and fixed scaling increments. A misconfigured step scaling policy adds too many instances at once and then leaves them running through the full cooldown period. Target tracking is proportional - it scales to the smallest fleet that keeps the metric at target. AWS recommends target tracking over step scaling for CPU and request-count metrics.

Where step scaling still makes sense: workloads with extreme spike characteristics (e.g. flash sales, cron job bursts) where you need to add 10+ instances instantly at a specific threshold rather than gradually.

Instance warmup time: the hidden over-provisioning cause

If your application takes 3 minutes to pull a container image and initialize, but your instance warmup is set to 30 seconds, AWS will keep launching new instances because it thinks the first ones haven’t started contributing. Set warmup time to match your actual application startup - including Docker pull, JVM init, or Node module load. Thrashing from incorrect warmup settings is one of the most common over-provisioning causes we find in audits.

Mixed Instances Policy: RI Baseline + Spot Burst

A mixed instances policy lets a single ASG run On-Demand (or RI-covered) instances as a stable baseline while using Spot for burst capacity. This is how you get Spot savings without accepting Spot risk on your entire fleet.

Configuration	Monthly (20× m5.xlarge baseline)	Risk profile	Savings
100% On-Demand, single type	$4,200	Full price, no flexibility	-
On-Demand only, multi-type	$4,200	Same cost, slightly better availability	~0%
RI baseline (30%) + On-Demand burst (70%)	$3,360	Commitment risk on baseline only	~20%
RI baseline (30%) + Spot burst (70%)	$2,730	Spot interruptions on burst capacity	~35%

Illustrative figures based on us-east-1 m5.xlarge On-Demand ($0.192/hr), 1-yr no-upfront RI (~20% off), and Spot average (~70% off On-Demand). Actual Spot savings vary by instance family and AZ.

Allocation strategy

Use price-capacity-optimized

This is AWS’s recommended default for Spot in mixed instances groups. It balances lowest price with pool capacity availability, reducing interruption risk vs the lowest-price strategy while keeping costs near the floor.

Instance type diversification

Specify 6–10 compatible types

Spot interruption risk is inversely proportional to pool diversity. m5.xlarge, m5a.xlarge, m5n.xlarge, m6i.xlarge, and m6a.xlarge are all ABI-compatible - mix them freely. The more pools, the lower the chance all of them are reclaimed simultaneously.

Scheduled Scaling for Dev/Test: The Easiest 65% Saving

Scheduled scaling pre-sets desired capacity at specific times. For non-production environments that don’t need 24/7 availability, it’s the single highest-ROI change available with zero performance trade-off.

Business hours only (8am–6pm weekdays)

50 hrs/week vs 168 hrs/week

~70% compute reduction

Scale to 0 overnight + weekends, min 1 during day

Instances terminated when idle

65–70% cost reduction

Prod-like scale 9am–5pm, minimal outside

Min=0 overnight, min=2 daytime

50–60% reduction

What to schedule

Set desired=0, min=0 at 6pm weekdays and all weekend
Set desired=2, min=1 at 8am weekdays
Use cron expressions in UTC - teams in multiple timezones need coordination
Pair with RDS stop/start schedules for full non-prod savings
Tag the ASG with Environment=staging to exclude from production monitoring alerts

Most startups we audit have staging environments running 24/7 at full capacity. A $5,000 staging bill becomes $1,500 with a scheduled scale-to-zero policy. This change takes under an hour to implement in Terraform.

Karpenter for EKS: Replace Cluster Autoscaler for 30–60% Node Savings

Cluster Autoscaler scales node groups up and down. Karpenter goes further - it consolidates underutilized nodes, selects the cheapest available instance type per workload, and integrates Spot natively without separate node groups. On a 50-node EKS cluster, the difference is typically $8K–$15K per month.

Bin-packing consolidation

Karpenter continuously replaces underutilized nodes with fewer, larger ones. Cluster Autoscaler only removes idle nodes - it doesn't consolidate partially-used ones. This alone delivers 30–60% node cost reduction on typical clusters.

800+ instance types via EC2 Fleet

Rather than choosing instance types per node group, Karpenter dynamically selects the cheapest available type that fits pending pods. This results in 20–40% better price-performance than fixed node groups.

Native Spot + On-Demand mixing

A single NodePool can mix Spot and On-Demand instances with millisecond-level fallback. Cluster Autoscaler requires separate node groups per purchase type, creating configuration sprawl.

Node provisioning in 55 seconds vs 3–4 minutes

Faster provisioning means less over-provisioning buffer needed. When scale-out takes 4 minutes, teams pad min capacity. Karpenter's speed lets you run leaner.

Karpenter v1.0+ - stable for production

Karpenter v1.0 stabilized the NodePool and EC2NodeClass APIs in 2024. It’s now the recommended approach for any AWS-heavy EKS deployment focused on cost optimization. Migration from Cluster Autoscaler typically takes 1–2 days for a standard cluster and involves no application changes - only node group configuration changes in Terraform.

AWS Auto Scaling Cost Optimization: Stop Paying for Idle Capacity

Why Wrong Scaling Configs Are So Expensive

Target Tracking: The Right CPU Targets

Mixed Instances Policy: RI Baseline + Spot Burst

Scheduled Scaling for Dev/Test: The Easiest 65% Saving

Karpenter for EKS: Replace Cluster Autoscaler for 30–60% Node Savings

Get an auto-scaling configuration review in your audit