
Debugging ECS deployment failures: the complete playbook
Debugging ECS deployment failures: the complete playbook Step 1: Check service events aws ecs describe-services --cluster production --services my-service --query 'services[0].events[:10]' Step 2: Inspect stopped tasks aws ecs list-tasks --cluster production --family my-service --desired-status STOPPED aws ecs describe-tasks --cluster production --tasks TASK_ARN --query 'tasks[0].{stop:stopCode,reason:stoppedReason,containers:containers[*].{name:name,exit:lastStatus}}' Common patterns and fixes Exit 1 → App crash → check CloudWatch logs Exit 137 → OOM killed → increase task memory Exit 127 → Wrong CMD in Dockerfile Health check failures : resource "aws_ecs_service" "service" { health_check_grace_period_seconds = 120 # Default is 0 } resource "aws_lb_target_group" "service" { health_check { path = "/health" ; unhealthy_threshold = 5 } deregistration_delay = 30 } CannotPullContainerError : resource "aws_iam_role_policy_attachment" "ecr" { role = aws_iam_role . ecs_execution . name policy
Continue reading on Dev.to DevOps
Opens in a new tab




