EKS Cost Optimization · Kubernetes on AWS Guide

EKS Cost Optimization: Reduce Kubernetes Costs 40–60%

EKS clusters accumulate waste at every layer: over-provisioned nodes, idle dev environments, cross-AZ traffic, and over-requested pods. Karpenter, Spot integration, and topology hints fix most of it without changing application code.

Spot saves up to 90% on node costs

Karpenter: 20–40% better bin-packing

Cross-AZ transfer: $0.02/GB round-trip

Dev clusters often run 70% idle

Where EKS Costs Hide

The hidden cost layers

Node cost: EC2 instances (worker nodes) - usually 70–80% of total EKS spend
Control plane: $0.10/hr per cluster - small but adds up with multiple clusters
Data transfer: Cross-AZ traffic between pods and services
Load balancers: ALBs per service add up ($0.0225/LCU-hour)
EBS volumes: Persistent volumes often over-provisioned

How to diagnose

1. Enable CloudWatch Container Insights on your cluster
2. Cost Explorer → Tag by cluster name → see cost per cluster
3. kubectl top nodes - compare allocated vs. actual usage
4. kubectl top pods -A - find over-provisioned namespaces
5. VPC Flow Logs → Athena → query cross-AZ traffic volume

5 EKS Cost Optimizations Ranked by Impact

Apply these in order - highest ROI first.

Replace Cluster Autoscaler with Karpenter

1–2 days · Helm chart deploymentSaves 20–40% on node costs

Karpenter provisions nodes in response to unschedulable pods - selecting the optimal instance type, size, and purchase option for each workload. Unlike Cluster Autoscaler (which scales fixed node groups), Karpenter can provision a single c6g.large for a small job rather than a full c6g.4xlarge node group. Combined with Spot instances and Graviton selection, it typically reduces node costs 20–40%.

How to implement

Install Karpenter via Helm: helm install karpenter oci://public.ecr.aws/karpenter/karpenter
Create a NodePool with weight-based instance type selection (prefer Spot, fall back to on-demand)
Define resource limits per NodePool to prevent runaway scaling
Set ttlSecondsAfterEmpty to reclaim idle nodes quickly (e.g., 30 seconds)
Migrate workloads namespace by namespace, monitoring cost per namespace in CloudWatch Container Insights

Note: Karpenter's bin-packing algorithm selects instance types that match actual pod resource requests. If pods are right-sized, nodes are right-sized. If pods are over-provisioned, Karpenter still over-provisions - fix pod requests first.

Integrate EC2 Spot instances for stateless workloads

4–8 hours · NodePool configurationSaves 60–90% on eligible node costs

EC2 Spot instances offer up to 90% discount compared to on-demand with a 2-minute interruption notice. Most stateless Kubernetes workloads (web servers, API handlers, batch jobs, background workers) handle interruption gracefully with proper pod disruption budgets. Karpenter handles Spot instance selection and diversification automatically.

How to implement

Define a Karpenter NodePool with capacity type: spot for stateless workloads
Diversify across instance families and sizes: c5, c6g, m5, m6g, r5, r6g
Set pod disruption budgets (PDB) on all stateless deployments: minAvailable: 1
Add node.kubernetes.io/lifecycle: spot toleration + affinity to workloads that can use Spot
Keep on-demand NodePool for stateful workloads (databases, queues, anything with persistent storage)

Note: Spot interruption rate for diversified instance families is typically 1–5% per instance-month. With 3+ replicas and proper PDBs, this is transparent to users.

Right-size pod resource requests and limits

4–8 hours · Metrics analysis + YAML changesSaves 20–50% on cluster node cost

Kubernetes schedulers use resource requests (not actual usage) for bin-packing. Over-provisioned requests waste node capacity and force the cluster to provision additional nodes. VPA (Vertical Pod Autoscaler) in recommendation mode analyzes actual usage and suggests right-sized requests.

How to implement

Install VPA in recommendation mode (no auto-apply): kubectl apply -f vpa-v1.0.0.yaml
Enable VPA for top workloads: kubectl apply -f vpa-deployment.yaml for each namespace
After 7 days, run: kubectl describe vpa <name> to see recommendations
Update resource requests to match VPA recommendations (start with lower-priority workloads)
Use Goldilocks (Fairwinds) as a Kubernetes dashboard for VPA recommendations across all namespaces

Note: Right-sizing pod requests is the prerequisite for effective bin-packing. If every pod requests 2 CPU when it uses 0.2, Karpenter will still provision large nodes.

Eliminate cross-AZ data transfer within the cluster

2–4 hours · Service mesh or topology hintsSaves $200–2,000/month

Inter-AZ traffic within EKS costs $0.01/GB each way. Kubernetes Services by default route traffic to any healthy pod across all AZs. A service receiving 10TB/month of inter-pod traffic generates $200/month in cross-AZ transfer charges that most teams don’t see in their bill.

How to implement

Enable Topology Aware Routing on Services: set service.kubernetes.io/topology-mode: Auto annotation
Kubernetes will prefer routing to pods in the same AZ, falling back to other AZs if no local endpoints exist
Deploy at least 1 replica per AZ for services that use topology hints
Monitor cross-AZ traffic reduction in VPC Flow Logs after enabling
For high-traffic service meshes (Istio, Linkerd), configure locality-weighted load balancing

Note: Topology Aware Routing is a 1-annotation change that can save hundreds per month on high-traffic clusters. It’s available in Kubernetes 1.23+ and requires EndpointSlices (default in EKS 1.21+).

Reclaim idle namespaces and dev/staging environments

2–4 hours · Policy enforcementSaves $500–5,000/month

Development and staging EKS namespaces often run full-time even though they’re used 8 hours a day. Scaling them to zero at night and weekends (70% of the time) cuts their node cost by 70%.

How to implement

Identify idle namespaces: kubectl top pods -A | sort by CPU/memory
Use Karpenter NodePool with schedule constraints to scale down dev namespaces: 6pm–8am weekdays, all weekend
Alternatively, use KEDA (Kubernetes Event-Driven Autoscaler) to scale deployments to zero on schedule
For shared dev clusters, enforce namespace resource quotas to prevent runaway resource requests
Move CI/CD workloads to Fargate or AWS CodeBuild for burst compute instead of keeping nodes warm

Note: A 4-node dev cluster at m6g.xlarge costs ~$500/month. Scaling to zero 70% of the time saves ~$350/month per cluster. Most companies have 2–5 such environments.

Frequently Asked Questions

How much does EKS cost?

EKS itself costs $0.10/hour per cluster (~$73/month) - just for the control plane. The real cost is EC2 node groups. A typical Series A startup runs 3–10 nodes ranging from m6g.large ($0.077/hr) to m6g.4xlarge ($0.616/hr). With Spot instances, node costs drop 60–90%.

What is Karpenter and why is it better than Cluster Autoscaler?

Karpenter provisions nodes dynamically from the full EC2 instance catalog, picking the optimal instance type for each pending pod. Cluster Autoscaler scales fixed node groups - if you defined c6g.xlarge node groups, it can only provision c6g.xlarge nodes. Karpenter’s flexibility results in 20–40% better bin-packing efficiency.

Is it safe to run production EKS workloads on Spot instances?

Yes, for stateless workloads with proper configuration: multiple replicas, pod disruption budgets, and instance type diversification. Spot interruption rates for diversified instance families are typically 1–5% per instance-month. Stateful workloads (databases, Kafka, Elasticsearch) should stay on on-demand.

What does cross-AZ data transfer cost in EKS?

$0.01/GB each way - so $0.02/GB for a request/response pair. A microservice handling 500GB/day of inter-service traffic across AZs incurs $10/day = $300/month in data transfer costs. Topology Aware Routing eliminates most of this.

How do I see which pods are wasting the most resources?

Enable CloudWatch Container Insights on your EKS cluster. It provides per-pod and per-namespace CPU and memory utilization. Alternatively, use kubectl top pods -A for a quick snapshot, or deploy VPA in recommendation mode for per-workload rightsizing suggestions.

EKS Cost Optimization: Reduce Kubernetes Costs 40–60%

Where EKS Costs Hide

5 EKS Cost Optimizations Ranked by Impact

Replace Cluster Autoscaler with Karpenter

Integrate EC2 Spot instances for stateless workloads

Right-size pod resource requests and limits

Eliminate cross-AZ data transfer within the cluster

Reclaim idle namespaces and dev/staging environments

Frequently Asked Questions

How much does EKS cost?

What is Karpenter and why is it better than Cluster Autoscaler?

Is it safe to run production EKS workloads on Spot instances?

What does cross-AZ data transfer cost in EKS?

How do I see which pods are wasting the most resources?

Running EKS? There’s almost certainly waste we can find.