Back to articles
How a Pushdown Automaton becomes a Parser [part 3]

How a Pushdown Automaton becomes a Parser [part 3]

via Dev.toJocer Franquiz

From Tokens to Trees: Four Paths to a Full Parser In part 2 , we built a pushdown automaton transducer , with 11 operations, 6 components, one stack. Our PDA Transducer turns nested <div> tags into a flat stream of tokens. In this post, we will explore what do we need to add to get a simple but practical parser that outputs a DOM tree. Q: Is the transducer enough to parse HTML? No. The transducer takes <div><div>hi</div></div> and emits: [(OPEN, "div"), (OPEN, "div"), (TEXT, "hi"), (CLOSE, "div"), (CLOSE, "div")] That's a flat list. It tells you what was in the input, but not how things relate to each other . A parser needs to produce a tree , a structure where the outer <div> is the parent and the inner <div> is its child: div └── div └── "hi" Our transducer can't do this. Its only writable memory is the stack, and the stack is consumed during validation . Every PUSH is matched by a POP to check nesting. By the time the transducer is done, the stack is empty. There's nowhere to store

Continue reading on Dev.to

Opens in a new tab

Read Full Article
3 views

Related Articles