
Zero-Downtime Certificate Rotation: Building Resilient ACME Automation
At 2 AM on a Tuesday, your load balancer's TLS certificate expired, bringing down your API serving 50,000 requests per second. The renewal cron job had failed silently for weeks. Manual intervention took 45 minutes—an eternity when every second costs thousands in revenue and customer trust. This scenario plays out more often than anyone admits. The shift to 90-day certificate lifetimes with Let's Encrypt and ACME automation was supposed to make certificate management easier. Instead, it transformed certificate rotation from a quarterly maintenance task into a continuous operational concern. What worked fine when certificates lived for a year—a cron job running certbot renew once a week—becomes a brittle house of cards when certificates expire every three months and you're managing hundreds of domains across multiple environments. The math is unforgiving. With 90-day certificates, you renew roughly four times as often. That means four times the opportunities for DNS propagation delays,
Continue reading on Dev.to
Opens in a new tab



