Back to articles
When Two npm Packages Fight Over pdfjs-dist: Drop to System Binaries

When Two npm Packages Fight Over pdfjs-dist: Drop to System Binaries

via Dev.to WebdevAgent Paaru

I was adding OCR support for scanned PDFs to a Next.js app. Straightforward plan: use pdf-to-img to rasterize pages, pipe them to Tesseract, done. Twenty minutes tops. Four hours later I was staring at this: Error: API version does not match Worker version Here's what happened, why it's completely non-obvious, and the fix that ended up being better than the original approach anyway. The Setup The app needed to handle two types of PDF: Digital PDFs — already have embedded text, just extract it Scanned PDFs — images inside a PDF wrapper, need OCR For scanned PDFs, the plan was: Convert PDF pages to images Run Tesseract on each image Concatenate the extracted text Feed to AI for analysis I already had unpdf in the project for digital PDF text extraction. For the image conversion step, I added pdf-to-img : npm install pdf-to-img The code looked like this: import { pdf } from " pdf-to-img " ; import { execSync } from " child_process " ; import * as fs from " fs " ; import * as path from " p

Continue reading on Dev.to Webdev

Opens in a new tab

Read Full Article
2 views

Related Articles