IaC Cost Optimization · Terraform Guide

Terraform AWS Cost Optimization: Codify Cost-Efficient Infrastructure

Manual Console tweaks don’t survive the next terraform apply. This guide covers how to bake cost optimization directly into your Terraform codebase - environment conditionals, Spot configs, dev teardown, tag enforcement, and policy guardrails.

Using AWS CDK instead? The same patterns apply - environment-conditional sizing, Spot configs, and tag enforcement are all achievable with CDK constructs. Implementation on retainer covers both.

60–80% compute savings via Spot

60–70% non-prod savings from scheduled teardown

100% tag coverage with provider default_tags

Cost guardrails enforced at plan time

Why Console-based optimization fails

Organizations waste an average of 32% of cloud spend on over-provisioned and idle resources. Most of that waste is created through Console-first workflows that bypass code review entirely.

Console-based

Console changes

Invisible to version control. One engineer right-sizes an instance; the next Terraform apply reverts it.

No review gate

Expensive resources (db.r5.2xlarge, io2 volumes) get created without PR review or cost awareness.

Non-reproducible environments

Dev and staging drift to production-size resources because nobody manually maintains the differences.

Tagging gaps

Manual Console creation skips tags entirely. 30–50% of spend becomes unattributable.

IaC-based

Environment-conditional sizing

One locals block encodes all environment differences. Dev gets t3.small, prod gets m6i.xlarge - enforced by code.

Spot as the default

mixed_instances_policy in aws_autoscaling_group codifies 70% Spot / 30% on-demand as the team-wide standard.

Teardown automation in code

aws_lambda_function + aws_cloudwatch_event_rule deploy the scheduler alongside the dev environment itself.

Default tags at provider level

provider default_tags propagate to every resource. Tag compliance becomes structural, not aspirational.

7 Terraform cost optimization patterns

Each pattern references the actual Terraform resource types. Ordered from highest savings-to-effort ratio.

Environment-conditional resource sizing

1–2 hours · locals refactor40–60% on non-prod

Encode all environment-specific sizing decisions in a single locals map keyed by var.environment. Every resource references local.config rather than hardcoded values.

Resources:aws_instanceaws_db_instanceaws_elasticache_replication_group

Terraform pattern

locals {
  sizing = {
    dev = {
      instance_type  = "t3.small"
      db_class       = "db.t3.medium"
      multi_az       = false
      desired_count  = 1
    }
    staging = {
      instance_type  = "t3.medium"
      db_class       = "db.r6g.large"
      multi_az       = false
      desired_count  = 2
    }
    production = {
      instance_type  = "m6i.xlarge"
      db_class       = "db.r6g.xlarge"
      multi_az       = true
      desired_count  = 3
    }
  }
  config = local.sizing[var.environment]
}

Note: Combine with count = var.environment == "prod" ? 1 : 0 to skip expensive resources (NAT Gateways, multi-AZ RDS replicas) entirely in non-prod.

Codify Spot via mixed_instances_policy

2–4 hours · ASG refactor60–80% on compute

Replace on_demand launch configs with aws_autoscaling_group mixed_instances_policy. Diversify across 5+ instance types so Spot interruptions don't drain your fleet.

Resources:aws_autoscaling_groupaws_launch_templateaws_eks_node_group

Terraform pattern

resource "aws_autoscaling_group" "app" {
  mixed_instances_policy {
    instances_distribution {
      on_demand_base_capacity                  = 1
      on_demand_percentage_above_base_capacity = 20
      spot_allocation_strategy                 = "capacity-optimized"
    }
    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.app.id
        version            = "$Latest"
      }
      override { instance_type = "m6i.large" }
      override { instance_type = "m6a.large" }
      override { instance_type = "m5.large"  }
      override { instance_type = "m5a.large" }
      override { instance_type = "c6i.large" }
    }
  }
}

Note: For EKS: use aws_eks_node_group with capacity_type = "SPOT" and a list of 5–8 instance types. Add a taint so only fault-tolerant workloads land on Spot nodes.

Fargate Spot for ECS workloads

30 minutes · capacity provider updateUp to 70% on Fargate costs

ECS supports FARGATE_SPOT as a built-in capacity provider. No ASG management needed. Set base = 1 on-demand task minimum, weight the rest toward Spot.

Resources:aws_ecs_cluster_capacity_providersaws_ecs_service

Terraform pattern

resource "aws_ecs_cluster_capacity_providers" "app" {
  cluster_name       = aws_ecs_cluster.app.name
  capacity_providers = ["FARGATE", "FARGATE_SPOT"]

  default_capacity_provider_strategy {
    base              = 1
    weight            = 1
    capacity_provider = "FARGATE"
  }

  default_capacity_provider_strategy {
    base              = 0
    weight            = 4
    capacity_provider = "FARGATE_SPOT"
  }
}

Note: Enable ECS Spot draining in your launch template user data: ECS_ENABLE_SPOT_INSTANCE_DRAINING=true. Tasks get a 2-minute window to drain gracefully before Spot reclamation.

Dev environment teardown via EventBridge + Lambda

3–4 hours · new module60–70% on non-prod compute

Deploy a Lambda-backed scheduler alongside your dev/staging environments. Tag resources with AutoShutdown = "true", then stop them at 7 PM and restart at 7 AM on weekdays. Dev environments that run 24/7 waste 70% of their cost on idle time.

Resources:aws_lambda_functionaws_cloudwatch_event_ruleaws_cloudwatch_event_targetaws_iam_role

Terraform pattern

resource "aws_cloudwatch_event_rule" "stop_dev" {
  name                = "stop-dev-${var.environment}"
  schedule_expression = "cron(0 0 ? * MON-FRI *)"  # 7 PM EST
  count               = var.environment != "production" ? 1 : 0
}

resource "aws_cloudwatch_event_target" "stop_dev" {
  count     = var.environment != "production" ? 1 : 0
  rule      = aws_cloudwatch_event_rule.stop_dev[0].name
  arn       = aws_lambda_function.scheduler.arn
  input     = jsonencode({ action = "stop", environment = var.environment })
}

resource "aws_cloudwatch_event_rule" "start_dev" {
  name                = "start-dev-${var.environment}"
  schedule_expression = "cron(0 12 ? * MON-FRI *)"  # 7 AM EST
  count               = var.environment != "production" ? 1 : 0
}

Note: Use the AWS Instance Scheduler CloudFormation solution as an alternative if you want tag-based scheduling without maintaining Lambda code yourself. Deploy it via aws_cloudformation_stack in Terraform.

Enforce tagging via provider default_tags and variable validation

1 hour · provider + variable updateCost attribution for 100% of spend

Set default_tags in your provider block so every aws_* resource inherits cost allocation tags automatically. Pair with variable validation to fail-fast on missing or invalid tag values at plan time, not after apply.

Resources:provider awsvariable (validation block)

Terraform pattern

provider "aws" {
  region = var.region

  default_tags {
    tags = {
      Environment = var.environment
      Project     = var.project
      Team        = var.team
      CostCenter  = var.cost_center
      ManagedBy   = "terraform"
    }
  }
}

variable "environment" {
  type = string
  validation {
    condition     = contains(["dev", "staging", "production"], var.environment)
    error_message = "environment must be dev, staging, or production."
  }
}

Note: Resource-level tags merge with provider default_tags, with resource-level taking precedence on conflicts. This means your provider block establishes the floor - teams can still add custom tags per resource.

Scheduled Auto Scaling for predictable traffic

1–2 hours · scaling policy addition20–35% on compute

If your traffic has predictable peaks and troughs (business hours, weekend drops), codify scale-down schedules via aws_autoscaling_schedule. Avoids running peak capacity 24/7 when off-hours load is a fraction.

Resources:aws_autoscaling_scheduleaws_appautoscaling_scheduled_action

Terraform pattern

resource "aws_autoscaling_schedule" "scale_down_nights" {
  scheduled_action_name  = "scale-down-nights"
  min_size               = 1
  max_size               = 3
  desired_capacity       = 1
  recurrence             = "0 22 * * MON-FRI"  # 10 PM UTC
  autoscaling_group_name = aws_autoscaling_group.app.name
}

resource "aws_autoscaling_schedule" "scale_up_mornings" {
  scheduled_action_name  = "scale-up-mornings"
  min_size               = 3
  max_size               = 20
  desired_capacity       = 5
  recurrence             = "0 7 * * MON-FRI"   # 7 AM UTC
  autoscaling_group_name = aws_autoscaling_group.app.name
}

Note: Layer scheduled scaling with target tracking for best results: scheduled scaling handles the predictable baseline, target tracking handles unexpected spikes within the scheduled window.

Policy-as-code cost guardrails with OPA or Sentinel

Half day · CI/CD integrationPrevents expensive resource creation

OPA (open source, any Terraform setup) and Sentinel (HCP Terraform) both evaluate terraform plan output before apply. Write policies that deny expensive instance types in dev, enforce maximum monthly cost deltas, or require budget tags on all resources.

Resources:OPA with conftestHCP Terraform Sentinel policiesaws_budgets_budget

Terraform pattern

# OPA policy (Rego) - deny oversized instances in dev
deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_instance"
  resource.change.after.tags.Environment == "dev"
  expensive := {
    "m5.4xlarge", "m5.8xlarge", "r5.2xlarge",
    "r5.4xlarge", "c5.4xlarge"
  }
  expensive[resource.change.after.instance_type]
  msg := sprintf(
    "Instance type %v is not allowed in dev environment",
    [resource.change.after.instance_type]
  )
}

Note: Pair OPA with Infracost in CI to get cost delta estimates on every PR. Use aws_budgets_budget with notification_threshold to alert teams when monthly spend exceeds a defined ceiling, all codified in Terraform.

Wiring it into your CI/CD workflow

Cost-aware IaC only works when cost feedback is part of the PR review loop. Two integrations cover most of it:

Infracost on every PR

Run infracost breakdown --path plan.json on each terraform plan output and post the monthly cost delta as a PR comment. Engineers see cost impact before merge, not after billing.

Monthly and delta cost shown inline on PRs

Blocks merge if cost increase exceeds threshold (optional)

Integrates with GitHub Actions, GitLab CI, Atlantis

OPA/Sentinel guardrails

Policy checks run during terraform plan via conftest (OPA) or Sentinel enforcement levels in HCP Terraform. Violations fail the plan before any resource is created.

Deny expensive instance types in dev/staging

Enforce mandatory cost tags on all resources

Require approval for changes above cost threshold

Where to start if your codebase isn’t cost-aware yet

Most Terraform codebases have hardcoded instance types, no environment conditionals, and no tagging strategy. Start here:

1
Add provider default_tags
Zero cost, zero risk, instant 100% tag coverage. One PR, ten minutes.
2
Refactor sizing into a locals map
Move all instance_type, db_instance_class, and desired_count values into a locals block keyed by environment. Immediately shrinks non-prod.
3
Migrate ASGs to mixed_instances_policy
Replace on-demand-only ASGs with mixed instance policies. Start with batch or async workloads - safest first Spot migration.
4
Deploy dev teardown scheduler
Add an aws_cloudwatch_event_rule + Lambda module to dev/staging workspaces. Immediate 60–70% non-prod savings.
5
Add Infracost to CI
Cost deltas on every PR. Engineers start making cost-conscious decisions without being asked.

Terraform AWS Cost Optimization: Codify Cost-Efficient Infrastructure

Why Console-based optimization fails

7 Terraform cost optimization patterns

Environment-conditional resource sizing

Codify Spot via mixed_instances_policy

Fargate Spot for ECS workloads

Dev environment teardown via EventBridge + Lambda

Enforce tagging via provider default_tags and variable validation

Scheduled Auto Scaling for predictable traffic

Policy-as-code cost guardrails with OPA or Sentinel

Wiring it into your CI/CD workflow

Where to start if your codebase isn’t cost-aware yet

Want a Terraform-native IaC cost audit?