
Production Terraform Disaster Recovery Lab
Lab Goal Build a “production-ish” AWS stack with Terraform, then simulate an accidental terraform apply that deletes/changes networking and breaks traffic. You will: Detect outage fast Identify what changed and why Restore service safely Fix / repair Terraform state (imports, state surgery if needed) Add guardrails so it can’t happen again Explain state drift with real examples Architecture VPC : public + private subnets across 2 AZs, NAT, IGW EKS : private nodes, cluster endpoint public/private (your choice) ALB : created by AWS Load Balancer Controller via Kubernetes Ingress RDS : MySQL in private subnets (not publicly accessible) Terraform Remote State : S3 backend + DynamoDB lock Optional: CI/CD gate (GitHub Actions or Jenkins) that prevents apply on main without approvals Important note: In real production, ALB should be created by Kubernetes ingress controller (not Terraform) OR managed by Terraform consistently. This lab teaches both and shows what happens when you mix ownership
Continue reading on Dev.to Tutorial
Opens in a new tab




