IaC Cost Optimization · Terraform Guide
Terraform AWS Cost Optimization: Codify Cost-Efficient Infrastructure
Manual Console tweaks don’t survive the next terraform apply. This guide covers how to bake cost optimization directly into your Terraform codebase - environment conditionals, Spot configs, dev teardown, tag enforcement, and policy guardrails.
Using AWS CDK instead? The same patterns apply - environment-conditional sizing, Spot configs, and tag enforcement are all achievable with CDK constructs. Implementation on retainer covers both.
Why Console-based optimization fails
Organizations waste an average of 32% of cloud spend on over-provisioned and idle resources. Most of that waste is created through Console-first workflows that bypass code review entirely.
Console-based
Console changes
Invisible to version control. One engineer right-sizes an instance; the next Terraform apply reverts it.
No review gate
Expensive resources (db.r5.2xlarge, io2 volumes) get created without PR review or cost awareness.
Non-reproducible environments
Dev and staging drift to production-size resources because nobody manually maintains the differences.
Tagging gaps
Manual Console creation skips tags entirely. 30–50% of spend becomes unattributable.
IaC-based
Environment-conditional sizing
One locals block encodes all environment differences. Dev gets t3.small, prod gets m6i.xlarge - enforced by code.
Spot as the default
mixed_instances_policy in aws_autoscaling_group codifies 70% Spot / 30% on-demand as the team-wide standard.
Teardown automation in code
aws_lambda_function + aws_cloudwatch_event_rule deploy the scheduler alongside the dev environment itself.
Default tags at provider level
provider default_tags propagate to every resource. Tag compliance becomes structural, not aspirational.
7 Terraform cost optimization patterns
Each pattern references the actual Terraform resource types. Ordered from highest savings-to-effort ratio.
Environment-conditional resource sizing
Encode all environment-specific sizing decisions in a single locals map keyed by var.environment. Every resource references local.config rather than hardcoded values.
aws_instanceaws_db_instanceaws_elasticache_replication_groupTerraform pattern
locals {
sizing = {
dev = {
instance_type = "t3.small"
db_class = "db.t3.medium"
multi_az = false
desired_count = 1
}
staging = {
instance_type = "t3.medium"
db_class = "db.r6g.large"
multi_az = false
desired_count = 2
}
production = {
instance_type = "m6i.xlarge"
db_class = "db.r6g.xlarge"
multi_az = true
desired_count = 3
}
}
config = local.sizing[var.environment]
}Note: Combine with count = var.environment == "prod" ? 1 : 0 to skip expensive resources (NAT Gateways, multi-AZ RDS replicas) entirely in non-prod.
Codify Spot via mixed_instances_policy
Replace on_demand launch configs with aws_autoscaling_group mixed_instances_policy. Diversify across 5+ instance types so Spot interruptions don't drain your fleet.
aws_autoscaling_groupaws_launch_templateaws_eks_node_groupTerraform pattern
resource "aws_autoscaling_group" "app" {
mixed_instances_policy {
instances_distribution {
on_demand_base_capacity = 1
on_demand_percentage_above_base_capacity = 20
spot_allocation_strategy = "capacity-optimized"
}
launch_template {
launch_template_specification {
launch_template_id = aws_launch_template.app.id
version = "$Latest"
}
override { instance_type = "m6i.large" }
override { instance_type = "m6a.large" }
override { instance_type = "m5.large" }
override { instance_type = "m5a.large" }
override { instance_type = "c6i.large" }
}
}
}Note: For EKS: use aws_eks_node_group with capacity_type = "SPOT" and a list of 5–8 instance types. Add a taint so only fault-tolerant workloads land on Spot nodes.
Fargate Spot for ECS workloads
ECS supports FARGATE_SPOT as a built-in capacity provider. No ASG management needed. Set base = 1 on-demand task minimum, weight the rest toward Spot.
aws_ecs_cluster_capacity_providersaws_ecs_serviceTerraform pattern
resource "aws_ecs_cluster_capacity_providers" "app" {
cluster_name = aws_ecs_cluster.app.name
capacity_providers = ["FARGATE", "FARGATE_SPOT"]
default_capacity_provider_strategy {
base = 1
weight = 1
capacity_provider = "FARGATE"
}
default_capacity_provider_strategy {
base = 0
weight = 4
capacity_provider = "FARGATE_SPOT"
}
}Note: Enable ECS Spot draining in your launch template user data: ECS_ENABLE_SPOT_INSTANCE_DRAINING=true. Tasks get a 2-minute window to drain gracefully before Spot reclamation.
Dev environment teardown via EventBridge + Lambda
Deploy a Lambda-backed scheduler alongside your dev/staging environments. Tag resources with AutoShutdown = "true", then stop them at 7 PM and restart at 7 AM on weekdays. Dev environments that run 24/7 waste 70% of their cost on idle time.
aws_lambda_functionaws_cloudwatch_event_ruleaws_cloudwatch_event_targetaws_iam_roleTerraform pattern
resource "aws_cloudwatch_event_rule" "stop_dev" {
name = "stop-dev-${var.environment}"
schedule_expression = "cron(0 0 ? * MON-FRI *)" # 7 PM EST
count = var.environment != "production" ? 1 : 0
}
resource "aws_cloudwatch_event_target" "stop_dev" {
count = var.environment != "production" ? 1 : 0
rule = aws_cloudwatch_event_rule.stop_dev[0].name
arn = aws_lambda_function.scheduler.arn
input = jsonencode({ action = "stop", environment = var.environment })
}
resource "aws_cloudwatch_event_rule" "start_dev" {
name = "start-dev-${var.environment}"
schedule_expression = "cron(0 12 ? * MON-FRI *)" # 7 AM EST
count = var.environment != "production" ? 1 : 0
}Note: Use the AWS Instance Scheduler CloudFormation solution as an alternative if you want tag-based scheduling without maintaining Lambda code yourself. Deploy it via aws_cloudformation_stack in Terraform.
Enforce tagging via provider default_tags and variable validation
Set default_tags in your provider block so every aws_* resource inherits cost allocation tags automatically. Pair with variable validation to fail-fast on missing or invalid tag values at plan time, not after apply.
provider awsvariable (validation block)Terraform pattern
provider "aws" {
region = var.region
default_tags {
tags = {
Environment = var.environment
Project = var.project
Team = var.team
CostCenter = var.cost_center
ManagedBy = "terraform"
}
}
}
variable "environment" {
type = string
validation {
condition = contains(["dev", "staging", "production"], var.environment)
error_message = "environment must be dev, staging, or production."
}
}Note: Resource-level tags merge with provider default_tags, with resource-level taking precedence on conflicts. This means your provider block establishes the floor - teams can still add custom tags per resource.
Scheduled Auto Scaling for predictable traffic
If your traffic has predictable peaks and troughs (business hours, weekend drops), codify scale-down schedules via aws_autoscaling_schedule. Avoids running peak capacity 24/7 when off-hours load is a fraction.
aws_autoscaling_scheduleaws_appautoscaling_scheduled_actionTerraform pattern
resource "aws_autoscaling_schedule" "scale_down_nights" {
scheduled_action_name = "scale-down-nights"
min_size = 1
max_size = 3
desired_capacity = 1
recurrence = "0 22 * * MON-FRI" # 10 PM UTC
autoscaling_group_name = aws_autoscaling_group.app.name
}
resource "aws_autoscaling_schedule" "scale_up_mornings" {
scheduled_action_name = "scale-up-mornings"
min_size = 3
max_size = 20
desired_capacity = 5
recurrence = "0 7 * * MON-FRI" # 7 AM UTC
autoscaling_group_name = aws_autoscaling_group.app.name
}Note: Layer scheduled scaling with target tracking for best results: scheduled scaling handles the predictable baseline, target tracking handles unexpected spikes within the scheduled window.
Policy-as-code cost guardrails with OPA or Sentinel
OPA (open source, any Terraform setup) and Sentinel (HCP Terraform) both evaluate terraform plan output before apply. Write policies that deny expensive instance types in dev, enforce maximum monthly cost deltas, or require budget tags on all resources.
OPA with conftestHCP Terraform Sentinel policiesaws_budgets_budgetTerraform pattern
# OPA policy (Rego) - deny oversized instances in dev
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_instance"
resource.change.after.tags.Environment == "dev"
expensive := {
"m5.4xlarge", "m5.8xlarge", "r5.2xlarge",
"r5.4xlarge", "c5.4xlarge"
}
expensive[resource.change.after.instance_type]
msg := sprintf(
"Instance type %v is not allowed in dev environment",
[resource.change.after.instance_type]
)
}Note: Pair OPA with Infracost in CI to get cost delta estimates on every PR. Use aws_budgets_budget with notification_threshold to alert teams when monthly spend exceeds a defined ceiling, all codified in Terraform.
Wiring it into your CI/CD workflow
Cost-aware IaC only works when cost feedback is part of the PR review loop. Two integrations cover most of it:
Infracost on every PR
Run infracost breakdown --path plan.json on each terraform plan output and post the monthly cost delta as a PR comment. Engineers see cost impact before merge, not after billing.
OPA/Sentinel guardrails
Policy checks run during terraform plan via conftest (OPA) or Sentinel enforcement levels in HCP Terraform. Violations fail the plan before any resource is created.
Where to start if your codebase isn’t cost-aware yet
Most Terraform codebases have hardcoded instance types, no environment conditionals, and no tagging strategy. Start here:
- 1
Add provider default_tags
Zero cost, zero risk, instant 100% tag coverage. One PR, ten minutes.
- 2
Refactor sizing into a locals map
Move all instance_type, db_instance_class, and desired_count values into a locals block keyed by environment. Immediately shrinks non-prod.
- 3
Migrate ASGs to mixed_instances_policy
Replace on-demand-only ASGs with mixed instance policies. Start with batch or async workloads - safest first Spot migration.
- 4
Deploy dev teardown scheduler
Add an aws_cloudwatch_event_rule + Lambda module to dev/staging workspaces. Immediate 60–70% non-prod savings.
- 5
Add Infracost to CI
Cost deltas on every PR. Engineers start making cost-conscious decisions without being asked.