AWS Spot Instances Guide · EC2 Cost Optimization
AWS EC2 Spot Instances Guide: Save Up to 90% on Compute
Spot instances offer up to 90% savings on EC2 in exchange for potential 2-minute interruptions. With proper diversification and interruption handling, most stateless and batch workloads run safely and cheaply on Spot.
4 Spot Instance Strategies
Start with batch workloads - they’re the easiest Spot migration with the highest savings impact.
Diversify instance types to minimize interruption risk
Spot interruption rates vary by instance type and AZ - diversifying across multiple instance families reduces the chance that all your Spot capacity is interrupted simultaneously. AWS recommends selecting 5–10 instance types per Spot Fleet or Auto Scaling Group.
How to implement
- Select instances of similar compute capacity across different families: c5, c5a, c5n, c6g, m5, m6g - all provide similar vCPU/RAM ratios
- Use the Spot Instance Advisor (aws.amazon.com/ec2/spot/instance-advisor) to find instances with < 5% interruption frequency
- Configure capacity-optimized allocation strategy in your Spot Fleet (not lowest-price - it leads to higher interruption rates)
- Spread across at least 3 AZs to avoid AZ-level capacity events
- Use attribute-based instance selection (ABS) in launch templates to automatically include instance types meeting your CPU/RAM requirements
Note: The capacity-optimized allocation strategy selects instances from the deepest Spot capacity pools, minimizing interruptions even at the cost of a slightly higher price than the absolute cheapest option.
Handle Spot interruptions gracefully
AWS provides a 2-minute warning before Spot interruption via the EC2 instance metadata service and EventBridge. Properly handling this warning allows your application to finish current work, drain connections, and checkpoint state before the instance is reclaimed.
How to implement
- Poll the interruption notice endpoint from your application: http://169.254.169.254/latest/meta-data/spot/termination-time
- On detection: stop accepting new requests, complete in-flight work, drain SQS messages back to queue
- Set connection draining on ALB target groups: register the instance for deregistration on interruption notice
- For ECS: enable ECS Spot draining - tasks are automatically migrated to other instances on interruption
- Subscribe to EC2 Spot interruption EventBridge events for centralized handling across your fleet
Note: Most web applications and API servers can handle graceful shutdown in under 90 seconds - well within the 2-minute window. The key is catching the signal and stopping new work immediately.
Use Spot for batch and async workloads first
Batch processing, data pipelines, ML training, image/video processing, and async job queues are ideal Spot workloads - they can be interrupted and restarted with minimal impact. Start here before attempting to run stateless services on Spot.
How to implement
- Migrate AWS Batch workloads to Spot compute environments: set computeResources type: SPOT
- Configure checkpointing for long-running jobs: save progress to S3 every 5–10 minutes
- For SQS-driven workers: set message visibility timeout to 2× the average job duration - interrupted jobs return to queue automatically
- Use EMR Spot for data processing: mix 1 on-demand master with Spot task nodes
- For Lambda: serverless functions are already interruption-tolerant - no changes needed
Note: AWS Batch automatically handles Spot interruptions - failed jobs are retried on new capacity. This makes Batch ideal for workloads that can tolerate retry overhead.
Implement Karpenter for Kubernetes Spot management
Karpenter's NodePool with Spot capacity type automatically selects the best Spot instance from across instance families, handles interruption draining, and replaces interrupted nodes with new capacity. It eliminates the need to manually manage Spot node groups.
How to implement
- Define a NodePool with capacityType: spot and multiple instance families in requirements
- Add on-demand fallback NodePool with lower weight - Karpenter tries Spot first, falls back to on-demand if unavailable
- Set node.kubernetes.io/lifecycle: spot in tolerations for Spot-eligible workloads
- Karpenter handles interruption draining automatically: cordon, drain, terminate on 2-minute warning
- Monitor Spot interruption rate: kubectl get events --field-selector reason=SpotInterruption
Note: Karpenter + Spot + Graviton is the standard cost-optimization stack for EKS. Combined, they typically reduce EKS node costs by 70–85% compared to on-demand x86 managed node groups.
Frequently Asked Questions
What is an EC2 Spot instance?
A Spot instance is unused EC2 capacity that AWS offers at up to 90% discount compared to on-demand pricing. In exchange, AWS can reclaim the instance with a 2-minute warning when it needs the capacity back. Spot instances use the same hardware as on-demand - there is no performance difference.
How often are Spot instances interrupted?
According to AWS Spot Instance Advisor, most instance types in popular regions have < 5% interruption frequency per month. Some popular types (c5.xlarge, m5.large) have < 2% monthly interruption rates. Diversifying across 5+ instance types and 3+ AZs reduces your effective interruption rate further.
Can I run production workloads on Spot instances?
Yes - with proper architecture. Stateless services with multiple replicas, proper pod disruption budgets, and graceful shutdown handling run successfully on Spot in production. Stateful workloads (databases, single-instance services) should not run on Spot without careful checkpointing.
What is the difference between Spot Fleet and Auto Scaling Group with Spot?
Spot Fleet is a standalone construct that manages a mix of Spot and on-demand instances across instance types and AZs. Auto Scaling Groups with mixed instance policies are more common for EC2-backed applications and integrate better with ALB, ECS, and other services. For Kubernetes: Karpenter handles Spot automatically without either construct.
How does Spot pricing work?
AWS sets Spot prices based on supply and demand in each AZ and instance family. Prices fluctuate but are typically stable for hours or days at a time. You no longer bid - you simply request Spot capacity and pay the current Spot price for the instance type and AZ. Spot prices are visible in the EC2 console under Spot Requests → Pricing History.