
Editing PDFs Is Harder Than It Should Be and Here Is Why
PDF was designed as a final output format not an editing format. The internal structure stores text as positioned glyphs not as flowing paragraphs. When you edit text in a PDF you are repositioning individual character groups not editing a document. How PDF stores text In HTML, "Hello World" is a text node that flows with the document. In PDF, the same text might be stored as: BT /F1 12 Tf 100 700 Td (Hello World) Tj ET This says: begin text block, use font F1 at 12pt, move to position (100, 700), draw the string "Hello World", end text block. Every text element has an absolute position. There is no concept of "paragraphs" or "lines" in the PDF specification. A paragraph is just multiple positioned text chunks that happen to be near each other vertically. Why editing is hard When you delete a word from a PDF "paragraph," the remaining words do not reflow. Each word stays at its original position. The editor must recalculate positions for every subsequent word in the visual paragraph, w
Continue reading on Dev.to Webdev
Opens in a new tab




