
PDF Extraction Bugs, Broadcast Persistence, and a 42-Commit Sweep Day
Forty-two commits across six repos in one day. No new features. Just closing loops, killing bugs, and hardening things that almost worked. The most interesting work was in PDF extraction. The rest was necessary. Here's the breakdown. PDF Entities Are Not DXF Entities The comparison engine in cad-dxf-agent works great on native DXF files. Geometry normalization, spatial binning, quantized fingerprints — all tuned for how DXF stores entities. The problem: users also upload PDFs of engineering drawings. Scanned shop drawings, exported revision sets, vendor submittals as PDF. Converting PDF to comparison-ready entities revealed three bugs that don't exist in the DXF path. Bug 1: Sub-Pixel Noise PDF renderers leave artifacts. Tiny line segments, invisible rectangles, hairline paths — entities with dimensions under 0.001 inches that aren't real drawing content. They're rendering artifacts from how the PDF producer tessellated curves or clipped regions. In the DXF path, this never happens. DX
Continue reading on Dev.to Python
Opens in a new tab



