Back to articles
Understanding Attention Mechanisms – Part 3: From Cosine Similarity to Dot Product

Understanding Attention Mechanisms – Part 3: From Cosine Similarity to Dot Product

via Dev.toRijul Rajesh

In the previous article , we explored the comparison between encoder and decoder outputs. In this article, we will be checking the math on how the calculation is done, and how it can be further simplified. The output values for the two LSTM cells in the encoder for the word "Let’s" are -0.76 and 0.75 . The output values from the two LSTM cells in the decoder for the <EOS> token are 0.91 and 0.38 . We can represent this as: A = Encoder B = Decoder Cell #1 Cell #2 -0.76 0.75 0.91 0.38 Now, we plug these values into the cosine similarity equation . This gives us a result of -0.39 . To simplify this further, a common approach is to compute only the numerator . The denominator mainly scales the value between -1 and 1 , so in some cases, we can ignore it for simplicity. Since we are dealing with a fixed number of cells , this simplification works well. This is also known as the dot product . When we calculate only the dot product, we get: (-0.76 × 0.91) + (0.75 × 0.38) = -0.41 We will explor

Continue reading on Dev.to

Opens in a new tab

Read Full Article
8 views

Related Articles