Understanding Attention Mechanisms – Part 1: Why Long Sentences Break Encoder–Decoders

In the previous articles , we understood Seq2Seq models. Now, on the path toward transformers, we need to understand one more concept before reaching there: Attention. The encoder in a basic encoder–decoder, by unrolling the LSTMs, compresses the entire input sentence into a single context vector . This works fine for short phrases like "Let's go" . But if we had a bigger input vocabulary with thousands of words, then we could input longer and more complicated sentences, like "Don't eat the delicious-looking and smelling pasta" . For longer phrases, even with LSTMs, words that are input early on can be forgotten . In this case, if we forget the first word "Don't" , then it becomes: "eat the delicious-looking and smelling pasta" So, sometimes it is important to remember the first word . Basic RNNs had problems with long-term memory because they ran both long- and short-term information through a single path . The main idea of Long Short-Term Memory (LSTM) units is that they solve this p

Understanding Attention Mechanisms – Part 1: Why Long Sentences Break Encoder–Decoders

Related Articles

Anker’s wireless charging pad offers Qi2 speeds for $15

Everything you didn’t want to know about social media…

The best wireless chargers are so much better than cords - and they're on sale

Using FireWire on a Raspberry Pi

ssereload(1) introduction

Related Articles

News
Anker’s wireless charging pad offers Qi2 speeds for $15
The Verge • 1h ago

News
Everything you didn’t want to know about social media…
Medium Programming • 1h ago

News
The best wireless chargers are so much better than cords - and they're on sale
ZDNet • 1h ago

News
Using FireWire on a Raspberry Pi
Lobsters • 1h ago

News
ssereload(1) introduction
Lobsters • 1h ago