The model looked great on validation until one real invoice broke four assumptions

An empirical note on what synthetic invoice data taught a Gemma fine-tune, what it hid, and how one real document exposed the gap. I fine-tuned a small Gemma model to parse Indian invoices because I wanted a path that was cheaper, more private, and easier to deploy than calling a hosted API for every document. The training metrics looked excellent. Then I ran the model on one real invoice. It got the total right, the supplier right, the address right, and still failed in four ways that would make the output unusable in a real finance workflow. That invoice was more useful than another few hundred synthetic examples. None of the headline conclusions here are new to anyone with ML experience: synthetic data has domain gap synthetic validation can be overly optimistic real data changes what you trust What felt worth documenting was the concrete shape of the failure: which fields broke first which assumptions in the synthetic distribution caused it what the training curves looked like befo

The model looked great on validation until one real invoice broke four assumptions

Related Articles

Welcome Thread - v372

ShadCN UI in 2026: the component library that changed how we build UIs

Why OpenClaw Agents Lose Their Minds Mid-Session (And What It Takes to Fix It)

Logos Privacy Builders Bootcamp

#05 Frozen Pipes

Related Articles

How-To
Welcome Thread - v372
Dev.to • 16h ago

How-To
ShadCN UI in 2026: the component library that changed how we build UIs
Dev.to • 22h ago

How-To
Why OpenClaw Agents Lose Their Minds Mid-Session (And What It Takes to Fix It)
Dev.to • 23h ago

How-To
Logos Privacy Builders Bootcamp
Reddit Programming • 1d ago

How-To
#05 Frozen Pipes
Dev.to • 1d ago