
How We Fixed Intermittent ECS Image-Not-Found Errors in AWS CDK
At one point, our ECS deployments started failing in a way that felt random. Sometimes a deployment would work perfectly. Sometimes the service would try to roll forward and fail because the container image it expected was no longer available. Nothing was wrong with the application code. The problem was in the deployment asset flow. We were using AWS CDK to deploy container-based workloads, and like many teams, we were relying on CDK’s default bootstrap ECR repository for Docker image assets. That was convenient at first, but it became a problem once repository retention rules were tightened for cost control. In environments with frequent deployments, older intermediate images were being cleaned up faster than our deployment flow could safely tolerate. The result was intermittent ECS deploy failures caused by missing images. The Root Cause AWS CDK Docker assets are published during the asset publishing phase , which happens before CloudFormation starts deploying stacks. That means two
Continue reading on Dev.to DevOps
Opens in a new tab




