AWS Spot Instances Guide · EC2 Cost Optimization

AWS EC2 Spot Instances Guide: Save Up to 90% on Compute

Q: What is an EC2 Spot instance?

A Spot instance is unused EC2 capacity that AWS offers at up to 90% discount compared to on-demand pricing. In exchange, AWS can reclaim the instance with a 2-minute warning when it needs the capacity back. Spot instances use the same hardware as on-demand - there is no performance difference.

Q: How often are Spot instances interrupted?

According to AWS Spot Instance Advisor, most instance types in popular regions have < 5% interruption frequency per month. Some popular types (c5.xlarge, m5.large) have < 2% monthly interruption rates. Diversifying across 5+ instance types and 3+ AZs reduces your effective interruption rate further.

Q: Can I run production workloads on Spot instances?

Yes - with proper architecture. Stateless services with multiple replicas, proper pod disruption budgets, and graceful shutdown handling run successfully on Spot in production. Stateful workloads (databases, single-instance services) should not run on Spot without careful checkpointing.

Q: What is the difference between Spot Fleet and Auto Scaling Group with Spot?

Spot Fleet is a standalone construct that manages a mix of Spot and on-demand instances across instance types and AZs. Auto Scaling Groups with mixed instance policies are more common for EC2-backed applications and integrate better with ALB, ECS, and other services. For Kubernetes: Karpenter handles Spot automatically without either construct.

Q: How does Spot pricing work?

AWS sets Spot prices based on supply and demand in each AZ and instance family. Prices fluctuate but are typically stable for hours or days at a time. You no longer bid - you simply request Spot capacity and pay the current Spot price for the instance type and AZ. Spot prices are visible in the EC2 console under Spot Requests → Pricing History.

Spot instances offer up to 90% savings on EC2 in exchange for potential 2-minute interruptions. With proper diversification and interruption handling, most stateless and batch workloads run safely and cheaply on Spot.

Up to 90% cheaper than on-demand

< 5% monthly interruption rate (most types)

2-minute termination notice

Karpenter automates Spot for Kubernetes

4 Spot Instance Strategies

Start with batch workloads - they’re the easiest Spot migration with the highest savings impact.

Diversify instance types to minimize interruption risk

1–2 hours · Fleet configuration60–90% vs. on-demand

Spot interruption rates vary by instance type and AZ - diversifying across multiple instance families reduces the chance that all your Spot capacity is interrupted simultaneously. AWS recommends selecting 5–10 instance types per Spot Fleet or Auto Scaling Group.

How to implement

Select instances of similar compute capacity across different families: c5, c5a, c5n, c6g, m5, m6g - all provide similar vCPU/RAM ratios
Use the Spot Instance Advisor (aws.amazon.com/ec2/spot/instance-advisor) to find instances with < 5% interruption frequency
Configure capacity-optimized allocation strategy in your Spot Fleet (not lowest-price - it leads to higher interruption rates)
Spread across at least 3 AZs to avoid AZ-level capacity events
Use attribute-based instance selection (ABS) in launch templates to automatically include instance types meeting your CPU/RAM requirements

Note: The capacity-optimized allocation strategy selects instances from the deepest Spot capacity pools, minimizing interruptions even at the cost of a slightly higher price than the absolute cheapest option.

Handle Spot interruptions gracefully

2–8 hours · Application changeEnables Spot use for production workloads

AWS provides a 2-minute warning before Spot interruption via the EC2 instance metadata service and EventBridge. Properly handling this warning allows your application to finish current work, drain connections, and checkpoint state before the instance is reclaimed.

How to implement

Poll the interruption notice endpoint from your application: http://169.254.169.254/latest/meta-data/spot/termination-time
On detection: stop accepting new requests, complete in-flight work, drain SQS messages back to queue
Set connection draining on ALB target groups: register the instance for deregistration on interruption notice
For ECS: enable ECS Spot draining - tasks are automatically migrated to other instances on interruption
Subscribe to EC2 Spot interruption EventBridge events for centralized handling across your fleet

Note: Most web applications and API servers can handle graceful shutdown in under 90 seconds - well within the 2-minute window. The key is catching the signal and stopping new work immediately.

Use Spot for batch and async workloads first

4–8 hours · Workload migration60–90% on batch compute

Batch processing, data pipelines, ML training, image/video processing, and async job queues are ideal Spot workloads - they can be interrupted and restarted with minimal impact. Start here before attempting to run stateless services on Spot.

How to implement

Migrate AWS Batch workloads to Spot compute environments: set computeResources type: SPOT
Configure checkpointing for long-running jobs: save progress to S3 every 5–10 minutes
For SQS-driven workers: set message visibility timeout to 2× the average job duration - interrupted jobs return to queue automatically
Use EMR Spot for data processing: mix 1 on-demand master with Spot task nodes
For Lambda: serverless functions are already interruption-tolerant - no changes needed

Note: AWS Batch automatically handles Spot interruptions - failed jobs are retried on new capacity. This makes Batch ideal for workloads that can tolerate retry overhead.

Implement Karpenter for Kubernetes Spot management

1–2 days · Cluster change60–90% on EKS node costs

Karpenter's NodePool with Spot capacity type automatically selects the best Spot instance from across instance families, handles interruption draining, and replaces interrupted nodes with new capacity. It eliminates the need to manually manage Spot node groups.

How to implement

Define a NodePool with capacityType: spot and multiple instance families in requirements
Add on-demand fallback NodePool with lower weight - Karpenter tries Spot first, falls back to on-demand if unavailable
Set node.kubernetes.io/lifecycle: spot in tolerations for Spot-eligible workloads
Karpenter handles interruption draining automatically: cordon, drain, terminate on 2-minute warning
Monitor Spot interruption rate: kubectl get events --field-selector reason=SpotInterruption

Note: Karpenter + Spot + Graviton is the standard cost-optimization stack for EKS. Combined, they typically reduce EKS node costs by 70–85% compared to on-demand x86 managed node groups.

Frequently Asked Questions

What is an EC2 Spot instance?

A Spot instance is unused EC2 capacity that AWS offers at up to 90% discount compared to on-demand pricing. In exchange, AWS can reclaim the instance with a 2-minute warning when it needs the capacity back. Spot instances use the same hardware as on-demand - there is no performance difference.

How often are Spot instances interrupted?

According to AWS Spot Instance Advisor, most instance types in popular regions have < 5% interruption frequency per month. Some popular types (c5.xlarge, m5.large) have < 2% monthly interruption rates. Diversifying across 5+ instance types and 3+ AZs reduces your effective interruption rate further.

Can I run production workloads on Spot instances?

Yes - with proper architecture. Stateless services with multiple replicas, proper pod disruption budgets, and graceful shutdown handling run successfully on Spot in production. Stateful workloads (databases, single-instance services) should not run on Spot without careful checkpointing.

What is the difference between Spot Fleet and Auto Scaling Group with Spot?

Spot Fleet is a standalone construct that manages a mix of Spot and on-demand instances across instance types and AZs. Auto Scaling Groups with mixed instance policies are more common for EC2-backed applications and integrate better with ALB, ECS, and other services. For Kubernetes: Karpenter handles Spot automatically without either construct.

How does Spot pricing work?

AWS sets Spot prices based on supply and demand in each AZ and instance family. Prices fluctuate but are typically stable for hours or days at a time. You no longer bid - you simply request Spot capacity and pay the current Spot price for the instance type and AZ. Spot prices are visible in the EC2 console under Spot Requests → Pricing History.

AWS EC2 Spot Instances Guide: Save Up to 90% on Compute

4 Spot Instance Strategies

Diversify instance types to minimize interruption risk

Handle Spot interruptions gracefully

Use Spot for batch and async workloads first

Implement Karpenter for Kubernetes Spot management

Frequently Asked Questions

What is an EC2 Spot instance?

How often are Spot instances interrupted?

Can I run production workloads on Spot instances?

What is the difference between Spot Fleet and Auto Scaling Group with Spot?

How does Spot pricing work?

Want to know where Spot instances would save you money?