AWS Compute Optimization · IaC Implementation
EC2 Rightsizing: Cut Compute Costs 30–40% Without Downtime
Over-provisioned EC2 instances are the #1 AWS cost driver in startup accounts. The fix isn't complicated - but it requires more than CloudWatch CPU metrics to do safely.
What Rightsizing Actually Means
The common mistake
Most teams look at CPU utilization in CloudWatch and see 8% average. They're afraid to downsize because "what if there's a spike?" So they keep the m5.2xlarge that's spending $280/month serving light background jobs.
CloudWatch CPU alone is not enough. Without memory, network, and disk I/O data, you're flying blind.
The right approach
Analyze all four dimensions:
- CPU utilization (p95, not average)
- Memory utilization (requires CloudWatch Agent)
- Network I/O (Mbps in/out)
- EBS I/O (read/write IOPS and throughput)
Why CloudWatch alone isn't enough
CloudWatch Agent must be installed on each instance to collect memory and disk metrics. AWS Compute Optimizer requires at least 14 days of memory data to make reliable rightsizing recommendations. Most startups skip this step, which leads to either missed savings (from being too conservative) or OOM incidents (from being too aggressive).
The Rightsizing Process
Analyze all dimensions - not just CPU
CloudWatch CPU utilization alone is not enough. A memory-hungry process can run at 10% CPU while using 90% RAM. You need CPU, memory, network I/O, and EBS I/O metrics to rightsize accurately.
Memory metrics require CloudWatch Agent to be installed. Most startups haven't done this. Without memory data, you risk downsizing to an instance that OOMs under load.
Recommend the right instance family
Different workloads need different instance types. General purpose (t3/t4g), compute-optimized (c6g), memory-optimized (r6g), and storage-optimized (i3en) all have distinct cost profiles.
Graviton (arm64) instances like t4g, c7g, and m7g are 20–40% cheaper than equivalent x86 instance types with equivalent or better performance for most workloads.
Implement via IaC - not console changes
Instance type changes delivered as Terraform or CDK pull requests allow your engineering team to review, test in staging, and merge on their schedule - with a full audit trail.
Console-based resizing creates configuration drift. IaC changes are reproducible, reviewable, and revertible. Every recommendation comes as a ready-to-merge PR.
Validate with canary or rolling deployment
After a resize, monitor the new instance for 48–72 hours before rolling out broadly. CloudWatch alarms on CPU, memory, and error rates act as automatic rollback triggers.
For ECS and EKS workloads, this is easier - new task definitions or node group configs can be validated with a small percentage of traffic before full rollout.
Graviton Migration: Rightsizing Accelerator
AWS Graviton (arm64) instances deliver 20–40% better price-performance than equivalent x86 instance types. For most web application and API workloads, it's the single highest-ROI infrastructure change available.
| Current (x86) | Graviton equivalent | x86 price | Graviton price | Saving |
|---|---|---|---|---|
| m5.xlarge | m7g.xlarge | $0.192/hr | $0.154/hr | 20% |
| c5.2xlarge | c7g.2xlarge | $0.340/hr | $0.290/hr | 15% |
| r5.xlarge | r7g.xlarge | $0.252/hr | $0.203/hr | 19% |
| t3.large | t4g.large | $0.0832/hr | $0.0672/hr | 19% |
Prices are us-east-1 On-Demand Linux. Graviton savings stack on top of Savings Plans discounts. Most workloads targeting Linux containers (ECS, EKS) or modern runtimes (Node.js, Python, Go, Java 11+) are Graviton-compatible with minimal changes.