Backlog Black Hole: Engineering a Semantic Triage Engine at Scale
Our bug tracker manages more than 150 million issues. It’s growing at 20% compounding annually. Roughly 25% of issues are duplicates. That is approximately 35 million issues and growing. Due to the large amount of duplicates in the system, it takes an enormous amount of time to go over them. This results in huge productivity loss. Last quarter, this led to hundreds of duplicate issues being triaged separately, even when they shared the same root cause. Engineers spent days re-investigating problems that had already been diagnosed elsewhere. Keyword search helps, but most of the time, it lacks in surfacing issues that are not an exact match, but are semantically similar. As ticket volume increased, manual triage became an absolute mess and cumbersome. Incoming issues were categorized independently by different teams, with no reliable mechanism to detect semantic overlap. We observed multiple reports describing the same failure using different surface language, such as transport-layer ti
Continue reading on DZone
Opens in a new tab




