Understanding Attention Mechanisms – Part 2: Comparing Encoder and Decoder Outputs

In the previous article , we explored the main idea of attention and the modifications it requires in an encoder–decoder model. Now, we will explore that idea further . An encoder–decoder model can be as simple as an embedding layer attached to a single LSTM . If we want a more advanced encoder, we can add additional LSTM cells . Now, we initialize the long-term and short-term memory in the LSTMs of the encoder with zeros . If our input sentence, which we want to translate into Spanish, is "Let's go" , we can feed a 1 for "Let's" into the embedding layer , unroll the network, and then feed a 1 for "go" into the embedding layer . This process creates the context vector , which we use to initialize a separate set of LSTM cells in the decoder . All of the input is compressed into the context vector . But the idea of attention is that each step in the decoder should have direct access to the inputs . So, let’s understand how attention connects the inputs to each step of the decoder . In th

Understanding Attention Mechanisms – Part 2: Comparing Encoder and Decoder Outputs

Related Articles

how to make programming terrible for everyone

Rob Pike’s 5 Rules: The Secret to Building Systems That Actually Survive Production

Bipolar and Sleep Deprivation: What Actually Happens

Learn how to develop like a pro for free

I didn't have to drill these renter-friendly smart lights into my wall - and I love them for it

Related Articles

How-To
how to make programming terrible for everyone
Lobsters • 2h ago

How-To
Rob Pike’s 5 Rules: The Secret to Building Systems That Actually Survive Production
Medium Programming • 2h ago

How-To
Bipolar and Sleep Deprivation: What Actually Happens
Dev.to • 3h ago

How-To
Learn how to develop like a pro for free
Medium Programming • 4h ago

How-To
I didn't have to drill these renter-friendly smart lights into my wall - and I love them for it
ZDNet • 5h ago