![How a Pushdown Automaton becomes a Parser [part 3]](/_next/image?url=https%3A%2F%2Fmedia2.dev.to%2Fdynamic%2Fimage%2Fwidth%3D800%252Cheight%3D%252Cfit%3Dscale-down%252Cgravity%3Dauto%252Cformat%3Dauto%2Fhttps%253A%252F%252Fdev-to-uploads.s3.amazonaws.com%252Fuploads%252Farticles%252Fyx2slvhbf7secutqlz23.png&w=1200&q=75)
How a Pushdown Automaton becomes a Parser [part 3]
From Tokens to Trees: Four Paths to a Full Parser In part 2 , we built a pushdown automaton transducer , with 11 operations, 6 components, one stack. Our PDA Transducer turns nested <div> tags into a flat stream of tokens. In this post, we will explore what do we need to add to get a simple but practical parser that outputs a DOM tree. Q: Is the transducer enough to parse HTML? No. The transducer takes <div><div>hi</div></div> and emits: [(OPEN, "div"), (OPEN, "div"), (TEXT, "hi"), (CLOSE, "div"), (CLOSE, "div")] That's a flat list. It tells you what was in the input, but not how things relate to each other . A parser needs to produce a tree , a structure where the outer <div> is the parent and the inner <div> is its child: div └── div └── "hi" Our transducer can't do this. Its only writable memory is the stack, and the stack is consumed during validation . Every PUSH is matched by a POP to check nesting. By the time the transducer is done, the stack is empty. There's nowhere to store
Continue reading on Dev.to
Opens in a new tab




