
OCR on Patent Figures with DeepSeek-OCR
12 approaches to extracting text and reference numbers from patent figure sheets, tested against 8 sheets from US11423567B2 (a facial recognition depth mapping system). Flowcharts, dense instrument screenshots, architectural diagrams with tiny scattered reference numbers. The figures Patent figures have text at multiple orientations (some sheets are rotated 90 degrees), tiny reference numbers like "41" or "7025" scattered among drawings, dense data screens with white text on dark backgrounds, structural elements (boxes, arrows, lines) that look like text to a machine, and "Figure X" labels often printed sideways. Sheet 01 from US11423567B2. The whole thing is rotated 90 degrees, with labels like "BP", "DR", "1", and "D" scattered around the drawing. DeepSeek-OCR is a 3.3B parameter vision model that runs locally. It has a grounding mode that returns bounding boxes alongside text—the prompt <|grounding|>OCR this image. produces output like <|ref|>camera 110</ref><|det|>[[412, 8, 455, 63
Continue reading on Dev.to Python
Opens in a new tab

