AWS Cost Mistakes · Startup Guide
10 AWS Cost Mistakes Startups Make Every Month
These are not edge cases. They are the same patterns found in account after account - over-provisioned compute, no commitments, silent NAT Gateway charges, and no one looking at the bill. Together they typically account for 30–40% of total AWS spend.
Why the Same Mistakes Keep Showing Up
AWS makes provisioning fast and cheap to start. It charges nothing upfront and everything ongoing. The default settings are generous at small scale and expensive at Series A–C scale. Every team that moves fast in their first 18 months accumulates the same set of inefficiencies.
Right-sizing gets deferred because "prod is stable, don't touch it." Savings Plans get deferred because "we're not sure what we'll need next year." Tagging never gets retrofitted because it feels low-priority. None of these are engineering failures - they are the predictable result of a team focused on shipping product.
Published benchmark: a Series A SaaS startup ($47,200/month AWS) cut to $12,100/month - a 74% reduction - by addressing the same 10 mistake categories below. Source: ZeonEdge case study ↗.
Over-provisioned EC2 instances
$2,000–8,000/monthWhy it happens
Teams provision for peak load that arrives 2% of the time. Lift-and-shift migrations land on the same instance sizes as on-prem - and nobody revisits. AWS Compute Optimizer reports go unread.
How to fix
Enable Compute Optimizer and check CPU + memory for 14 days. Most startups run at 8–15% average CPU - downsize to the next instance family. Right-sizing alone cuts compute costs 30–60% with zero user impact.
Paying on-demand for workloads that never turn off
$3,000–10,000/monthWhy it happens
"We'll sort commitments later" becomes the permanent state. Confusion between Reserved Instances and Savings Plans leads to inaction. Meanwhile, EC2, RDS, and Fargate run 24/7 on full on-demand pricing.
How to fix
After rightsizing, buy 1-year no-upfront Compute Savings Plans for your steady-state baseline. A 36% discount on on-demand pricing applies automatically across EC2, Lambda, and Fargate. One startup saved $8,600/month from this alone.
Non-production environments running 24/7
$800–3,000/monthWhy it happens
Dev and staging environments are provisioned at production-equivalent size for convenience, then left running nights, weekends, and bank holidays. Nobody owns the cost and nobody notices until a budget review.
How to fix
Use AWS Instance Scheduler or a Lambda cron to stop non-production EC2 and RDS outside business hours (8am–8pm weekdays). This cuts non-production compute by ~65%. One startup went from $1,200 to $250/month on staging alone.
NAT Gateway used as a free pipe
$1,500–5,000/monthWhy it happens
Default VPC architectures route all outbound traffic through NAT Gateway. S3 and DynamoDB calls from private subnets add $0.045/GB in processing charges on top of NAT's $0.065/hour baseline. At any scale, this compounds fast.
How to fix
Add free S3 and DynamoDB Gateway Endpoints - these route traffic directly from your VPC without going through NAT. Add Interface Endpoints for ECR and CloudWatch Logs. One team saved $3,200/month from this single change.
Orphaned and zombie resources quietly billing
$500–4,000/monthWhy it happens
Cloud provisioning takes seconds; cleanup is nobody's job. Test EC2 instances from sprints, unattached EBS volumes from terminated instances, load balancers pointing at nothing, and Elastic IPs reserved for instances that no longer exist - all keep billing indefinitely.
How to fix
Run: `aws ec2 describe-volumes --filters Name=status,Values=available` for unattached EBS. Check ALB RequestCount in CloudWatch for 30 days - zero-traffic ALBs at $16–30/month each should be deleted. Enforce Owner and Environment tags to prevent future accumulation.
S3 stored entirely in Standard class
$300–2,000/monthWhy it happens
"S3 is cheap" is true at small scale. At scale, 80% of objects are typically not accessed in 90+ days. Without lifecycle policies, years of backups, logs, and media accumulate at $0.023/GB/month with no automated tiering.
How to fix
Enable S3 Intelligent-Tiering for files accessed unpredictably. Add a lifecycle rule to move files older than 90 days to Glacier Instant Retrieval (67% cheaper). Objects older than 180 days move to Glacier Deep Archive (96% cheaper).
Oversized RDS instances never revisited
$1,000–4,000/monthWhy it happens
Databases are provisioned conservatively - a db.r5.2xlarge "just in case" - and never touched again for fear of downtime. Performance Insights goes unchecked. Read replicas sit at near-zero query load.
How to fix
Check RDS Performance Insights for CPU and memory utilisation over 14 days. A db.r5.2xlarge running at 6% CPU and 18% memory can be safely downgraded to a db.r6g.xlarge (Graviton) - which is both cheaper and faster. Migrate storage from gp2 to gp3 for a free 20% saving.
No cost visibility - no tags, no budgets, no alerts
Costs 20–40% more than necessaryWhy it happens
Teams skip tagging in the early days and never retrofit it. Without Owner, Environment, and Service tags, there's no cost attribution by team or product. Nobody sets AWS Budgets alerts, so the first signal is a shocking invoice.
How to fix
Implement 3–5 mandatory tags: Owner, Environment, Service, CostCentre. Activate them as cost allocation tags in the Billing console. Set budget alerts at 80% actual and 100% forecasted. Enable AWS Cost Anomaly Detection - free, catches spikes within 24 hours.
CloudWatch logs growing forever
$300–2,000/monthWhy it happens
CloudWatch log groups default to "never expire". Verbose DEBUG-level logging in production generates gigabytes daily at $0.50/GB ingestion. ALB access logs routed to CloudWatch cost 20× more than routing them to S3.
How to fix
Set log retention on every log group - 30 days for production, 7 days for non-production. Switch logging level from DEBUG to INFO. Route ALB and VPC Flow logs to S3, not CloudWatch. Move custom metrics to self-hosted Prometheus on a t3.medium ($30/month).
Treating cost optimisation as a one-time project
Savings erode within 3–6 monthsWhy it happens
A one-time cost sprint saves 30–40%, then costs drift back as new resources are provisioned without cost awareness. Reserved Instances expire silently and revert to on-demand. No FinOps process means no one catches the creep until the bill is already up 40%.
How to fix
Set up monthly cost reviews using Cost Explorer by team or service. Schedule automated alerts for any spend increase greater than 20% month-over-month. Track Savings Plans coverage - below 70% means you're leaving money on the table.
What to Fix First
Not all mistakes are equal. This is the order that maximises savings per hour of engineering time.
Right-size EC2 and RDS using Compute Optimizer data
16% of total bill on average
Buy Compute Savings Plans for your rightsized baseline
Additional 18% on steady-state compute
Add S3 and DynamoDB Gateway Endpoints
Eliminates most NAT Gateway data processing charges
Schedule non-production environment shutdowns
65% reduction on dev and staging costs
Implement tagging, budgets, and anomaly detection
Prevents 20–40% drift within 6 months
Related guides