Understanding Attention Mechanisms – Part 3: From Cosine Similarity to Dot Product

In the previous article , we explored the comparison between encoder and decoder outputs. In this article, we will be checking the math on how the calculation is done, and how it can be further simplified. The output values for the two LSTM cells in the encoder for the word "Let’s" are -0.76 and 0.75 . The output values from the two LSTM cells in the decoder for the <EOS> token are 0.91 and 0.38 . We can represent this as: A = Encoder B = Decoder Cell #1 Cell #2 -0.76 0.75 0.91 0.38 Now, we plug these values into the cosine similarity equation . This gives us a result of -0.39 . To simplify this further, a common approach is to compute only the numerator . The denominator mainly scales the value between -1 and 1 , so in some cases, we can ignore it for simplicity. Since we are dealing with a fixed number of cells , this simplification works well. This is also known as the dot product . When we calculate only the dot product, we get: (-0.76 × 0.91) + (0.75 × 0.38) = -0.41 We will explor

Understanding Attention Mechanisms – Part 3: From Cosine Similarity to Dot Product

Related Articles

How Do Concrete Vaults Actually Work?

Mark Zuckerberg texted Elon Musk to offer help with DOGE

When All You Can Do Is All or Nothing, Do Nothing

“# Epilogue of the Five Nations Chronicle (Part 7)

How Programming Paradigms Are Born

Related Articles

News
How Do Concrete Vaults Actually Work?
Medium Programming • 2h ago

News
Mark Zuckerberg texted Elon Musk to offer help with DOGE
TechCrunch • 2h ago

News
When All You Can Do Is All or Nothing, Do Nothing
Lobsters • 2h ago

News
“# Epilogue of the Five Nations Chronicle (Part 7)
Medium Programming • 2h ago

News
How Programming Paradigms Are Born
Medium Programming • 3h ago