
Understanding Attention Mechanisms – Part 2: Comparing Encoder and Decoder Outputs
In the previous article , we explored the main idea of attention and the modifications it requires in an encoder–decoder model. Now, we will explore that idea further . An encoder–decoder model can be as simple as an embedding layer attached to a single LSTM . If we want a more advanced encoder, we can add additional LSTM cells . Now, we initialize the long-term and short-term memory in the LSTMs of the encoder with zeros . If our input sentence, which we want to translate into Spanish, is "Let's go" , we can feed a 1 for "Let's" into the embedding layer , unroll the network, and then feed a 1 for "go" into the embedding layer . This process creates the context vector , which we use to initialize a separate set of LSTM cells in the decoder . All of the input is compressed into the context vector . But the idea of attention is that each step in the decoder should have direct access to the inputs . So, let’s understand how attention connects the inputs to each step of the decoder . In th
Continue reading on Dev.to
Opens in a new tab




