What are AWS zombie resources?

Zombie resources are AWS infrastructure that is running and billing but delivering no business value. Common examples: EC2 instances from 'quick tests' that ran for months, EBS volumes left behind when instances were terminated, old snapshots with no retention policy, load balancers routing traffic to nothing, and Elastic IPs attached to stopped instances.

How much do zombie AWS resources typically cost?

The amount varies by organisation size, but examples from real audits include: 23 test EC2 instances costing $3,400/month, dozens of abandoned EBS volumes at $2,100/month, and $8,000/month in Reserved Instances purchased for instance types no longer in use. Most startups with 20+ engineers have at least $500–2,000/month in pure zombie waste.

How do zombie resources accumulate?

They accumulate because cloud provisioning is fast and cloud cleanup is nobody's job. Engineers spin up instances for a sprint, a demo, or a load test - and then move on. Without tagging, lifecycle policies, or a cleanup culture, those resources run indefinitely. AWS never reminds you; it just invoices.

What about non-production environments running 24/7?

Development and staging environments running around the clock account for 5–8% of total cloud waste. Shutting them down outside business hours (nights and weekends) reduces their cost by approximately 75% with no engineering effort - just an AWS Instance Scheduler or Lambda-based cron job.

AWS Cost Optimization · Infrastructure Waste

AWS Zombie Resources: Find and Kill Orphaned Cloud Waste

Q: How do I find zombie resources in AWS?

Start with AWS Trusted Advisor's 'Low Utilization' checks. Then use the AWS CLI: 'aws ec2 describe-volumes --filters Name=status,Values=available' for unattached EBS, 'aws ec2 describe-addresses --filters Name=domain,Values=vpc' for unattached EIPs. For load balancers, check RequestCount in CloudWatch for 30 days. For EC2, check CPU utilization - anything consistently below 5% is a candidate.

Test EC2 instances, unattached EBS volumes, unused load balancers, idle Elastic IPs - resources nobody owns but AWS keeps billing for. Every startup with 20+ engineers has them. Most have far more than they think.

23 test instances found = $3,400/month

Non-prod 24/7 = 5–8% of total cloud waste

$8,000/month in unused Reserved Instances

Why Cloud Never Forgets to Bill

Cloud provisioning takes seconds. Cloud cleanup is nobody's job. The result is an accumulation of resources that were created for a specific purpose, that purpose ended, and the resources kept running.

AWS doesn't send a notification when a resource has been idle for 6 months. It doesn't ask if you still need that test instance from the Q2 sprint. It just invoices at the end of the month, every month, until someone explicitly terminates the resource.

The problem scales with team size. With 5 engineers, zombie waste is manageable. With 30 engineers all spinning up infrastructure independently, it compounds into thousands per month without anyone noticing the pattern.

The 7 Most Common Zombie Resource Types

Each type has a detection method you can run today and a fix that typically takes under an hour once the resources are identified.

🖥️

Test and development EC2 instances

Spun up for a sprint, a demo, or a load test. Engineer moves on. Instance runs indefinitely.

Detection

Check EC2 CPU utilisation < 5% for 14+ days. Filter by 'last launched' tag if present.

Real example

23 test instances found during an audit: $3,400/month for infrastructure nobody was using.

Fix

Tag all instances with Owner and Purpose at launch. Set up a weekly Trusted Advisor check for low-utilisation instances. Delete anything under 5% CPU with no production traffic.

💾

Unattached EBS volumes

Left behind when EC2 instances are terminated. Default EC2 termination does not delete attached EBS unless DeleteOnTermination is enabled.

Detection

aws ec2 describe-volumes --filters Name=status,Values=available

Real example

A client had 18 months of EC2 churn leaving behind $2,100/month in unattached gp2 volumes from instances that hadn't existed for a year.

Fix

Set DeleteOnTermination=true for EBS volumes by default. Audit all volumes with 'available' status - these are definitionally not in use. Snapshot before deleting if uncertain.

📸

Accumulated EBS snapshots

Automated backup policies create snapshots daily. Without a retention policy, they accumulate indefinitely. At $0.05/GB/month, 3 years of snapshots adds up.

Detection

aws ec2 describe-snapshots --owner-ids self - filter by creation date and cross-reference with existing AMIs and volumes.

Real example

Three years of daily snapshots on a 500GB volume = $900/month in snapshot storage, for data that was accessible from the live volume.

Fix

Set Data Lifecycle Manager policies: retain 7 daily, 4 weekly, 3 monthly. Delete orphaned snapshots not referenced by any AMI or volume.

⚖️

Unused load balancers

ALBs persist after the services they front are decommissioned. Staging environment teardowns often skip the load balancer. Costs $16–30/month each, regardless of traffic.

Detection

Check ALB RequestCount metric in CloudWatch over 30 days. Any ALB with < 100 requests/day and no active target group registrations is a candidate.

Real example

Seven ALBs from forgotten staging environments charging $175/month each = $1,225/month for load balancers routing traffic to nothing.

Fix

List all load balancers. Check RequestCount and healthy host count. Delete any with zero healthy targets and near-zero traffic. Remove associated security groups and DNS records.

🌐

Unattached Elastic IPs

Elastic IPs reserved for an instance but not currently attached to a running instance. AWS charges $0.005/hour - small individually, but they accumulate across an organisation.

Detection

aws ec2 describe-addresses --filters Name=domain,Values=vpc - look for addresses with no InstanceId or NetworkInterfaceId.

Real example

An audit found 22 unattached Elastic IPs from 2 years of infrastructure churn - $800/month for IP addresses reserved for instances that no longer existed.

Fix

Release all unattached Elastic IPs immediately. They are re-allocatable; holding them costs money. If you need a static IP, allocate it when the resource is ready.

⏰

Non-production environments running 24/7

Development and staging environments provisioned at production-equivalent size, running continuously including nights, weekends, and bank holidays.

Detection

Tag-based filtering in Cost Explorer with Environment=staging/dev. Check CloudWatch for EC2 instance hours during off-hours.

Real example

A startup's staging environment ran 24/7 at full production size: one RDS db.r5.xlarge + 4 EC2 instances = $1,200/month for an environment used 8 hours per weekday.

Fix

Use AWS Instance Scheduler or a Lambda cron to stop non-production EC2 and RDS instances outside business hours. Savings: ~75% on non-production compute costs.

Idle Lambda functions and API Gateway endpoints

Functions created for experiments, internal tools, or deprecated integrations. API Gateway stages left provisioned after the API is no longer used.

Detection

Check Lambda Invocations CloudWatch metric for zero invocations over 30 days. List API Gateway stages with zero request counts.

Real example

14 Lambda functions with zero invocations over 90 days - not a billing problem individually, but they added complexity, maintained IAM permissions, and represented a security surface area.

Fix

Delete Lambda functions with zero invocations for 30+ days (after confirming they're not event-driven with rare triggers). Remove associated API Gateway stages, IAM roles, and CloudWatch log groups.

Systematic Cleanup: The Right Order

Zombie cleanup done in the wrong order creates risk. This sequence minimises the chance of accidentally deleting something active.

Tag before you clean

Implement a tagging strategy first - Owner, Environment, Purpose, and CostCentre at minimum. Resources without these tags are candidates for cleanup review. Read-only tagging audit takes 1–2 days.

AWS tagging strategy guide →

Identify with data, not assumptions

Use CloudWatch metrics (CPU, NetworkIn, RequestCount) for 14-day lookback. Cross-reference with deployment records. Never delete based on name alone - 'test-instance-01' might be critical.

Stop before you terminate

For EC2 instances, stop them first and observe for 48–72 hours. If nothing breaks and nobody asks about them, terminate. This is especially important for instances without obvious owners.

Snapshot before you delete storage

Take a final snapshot of EBS volumes before deletion. Snapshots cost $0.05/GB/month and give you a 30-day safety net. The snapshot can be deleted once you're confident the volume was truly orphaned.

EBS cost optimization guide →

Automate prevention going forward

Set up AWS Config rules to flag unattached EBS volumes and EIPs after 7 days. Create a budget alert for any service spend increase > 20% month-over-month. Prevention is cheaper than cleanup.

Common questions

How long does a zombie resource cleanup take?

Identification takes 2–4 hours with the CLI commands above. Actual cleanup - stopping, waiting, and terminating - typically spans 1–2 weeks to safely work through the full inventory. Automation setup (tags, Config rules, scheduler) adds another day.

What if an engineer owns a resource and is on holiday?

This is why tagging with Owner is critical before cleanup starts. If a resource is tagged with an owner, contact them before acting. If a resource has no owner tag and shows no activity for 30+ days, the risk of it being critical is very low.

Can I automate zombie detection?

Yes. AWS Config rules, Trusted Advisor checks, and third-party tools like CloudHealth or custom Lambda functions can flag idle resources automatically. However, automated deletion is risky without human review - the goal is automated detection with human-approved cleanup.

AWS Zombie Resources: Find and Kill Orphaned Cloud Waste

Why Cloud Never Forgets to Bill

The 7 Most Common Zombie Resource Types

Test and development EC2 instances

Unattached EBS volumes

Accumulated EBS snapshots

Unused load balancers

Unattached Elastic IPs

Non-production environments running 24/7

Idle Lambda functions and API Gateway endpoints

Systematic Cleanup: The Right Order

Tag before you clean

Identify with data, not assumptions

Stop before you terminate

Snapshot before you delete storage

Automate prevention going forward

Common questions

Get a complete zombie resource audit in 7 days