Failing to Train DeBERTa to Detect Patent Antecedent Basis Errors

Patent claims have a simple rule: introduce "a thing" before referring to "the thing." I fine-tuned DeBERTa-v3 on synthetic antecedent basis errors and hit 90% F1 on my test set. Then I evaluated on real USPTO examiner rejections from the PEDANTIC dataset and watched that number collapse to 14.5% F1, 8% recall. The model catches 8 out of 100 real errors. This writeup covers what I built, why it failed, and what the failure reveals about the gap between synthetic and real patent data. The problem Antecedent basis errors are one of the most common reasons for 112(b) rejections. They're also one of the most annoying—purely mechanical mistakes that slip through because patent claims get long, dependencies get tangled, and things get edited over time. You introduce "a sensor" in claim 1, then three claims later you write "the detector" meaning the same thing. Or you delete a clause during revision and forget that it was the antecedent for something downstream. "A device comprising a process

Failing to Train DeBERTa to Detect Patent Antecedent Basis Errors

Related Articles

Rolling Your Own DRM: A Case Study in Why You Shouldn’t

.NET 10 vs .NET 8: Why ASP.NET Developers Should Upgrade

Lines of code are useful

Stuck on a Programming Assignment in Maryland? Here’s What Actually Helps

Tuft & Needle Promo Codes: 20% Off | March 2026

Related Articles

News
Rolling Your Own DRM: A Case Study in Why You Shouldn’t
Medium Programming • 4h ago

News
.NET 10 vs .NET 8: Why ASP.NET Developers Should Upgrade
Medium Programming • 4h ago

News
Lines of code are useful
Lobsters • 4h ago

News
Stuck on a Programming Assignment in Maryland? Here’s What Actually Helps
Medium Programming • 4h ago

News
Tuft & Needle Promo Codes: 20% Off | March 2026
Wired • 5h ago