
Failing to Train DeBERTa to Detect Patent Antecedent Basis Errors
Patent claims have a simple rule: introduce "a thing" before referring to "the thing." I fine-tuned DeBERTa-v3 on synthetic antecedent basis errors and hit 90% F1 on my test set. Then I evaluated on real USPTO examiner rejections from the PEDANTIC dataset and watched that number collapse to 14.5% F1, 8% recall. The model catches 8 out of 100 real errors. This writeup covers what I built, why it failed, and what the failure reveals about the gap between synthetic and real patent data. The problem Antecedent basis errors are one of the most common reasons for 112(b) rejections. They're also one of the most annoying—purely mechanical mistakes that slip through because patent claims get long, dependencies get tangled, and things get edited over time. You introduce "a sensor" in claim 1, then three claims later you write "the detector" meaning the same thing. Or you delete a clause during revision and forget that it was the antecedent for something downstream. "A device comprising a process
Continue reading on Dev.to Python
Opens in a new tab


