Back to articles
Ordered Retries in Kafka: The Bugs You'll Find in Production

Ordered Retries in Kafka: The Bugs You'll Find in Production

via Dev.toMyroslav Vivcharyk

In Part 1 , we introduced a "middle ground" for Kafka retries: when a message fails, lock its key, send the message to a retry topic, and let the main partition continue processing other keys. If you need a recap, the Confluent blog covers the pattern. Simple on a whiteboard. In production, things break in ways the diagrams don't show. This article covers implementation gaps that tutorials skip. The code examples are from kafka-resilience — simplified for clarity, but the edge cases are real. The Architecture Recap The goal: when a message fails, block only that key. Other keys keep flowing. To make this work across multiple consumer instances, we need three topics: Main Topic — Your business events. The consumer checks if a key is locked before processing. Retry Topic — Messages land here either because they failed or because a predecessor with the same key failed. A separate consumer processes them with backoff. When a message succeeds here, it releases the lock. Lock Topic (compacte

Continue reading on Dev.to

Opens in a new tab

Read Full Article
2 views

Related Articles