![[Learning notes] reading "Attention is all you need" paper](/_next/image?url=https%3A%2F%2Fmedia2.dev.to%2Fdynamic%2Fimage%2Fwidth%3D1200%2Cheight%3D627%2Cfit%3Dcover%2Cgravity%3Dauto%2Cformat%3Dauto%2Fhttps%253A%252F%252Fdev-to-uploads.s3.amazonaws.com%252Fuploads%252Farticles%252Fcob1nmdqzf7d2gbvknu4.png&w=1200&q=75)
[Learning notes] reading "Attention is all you need" paper
Abstract Encoder : is just the part we studied about RNN that reads the input sequence and "digest" it, the part responsible for creating and updating the hidden state, in a way it's like a person reading something and keeping the "gist" of it in mind Decoder : is the part responsible for using that "gist", that hidden state, the mathematical vector, and use it to produce an output Attention mechanism: hidden state in case RNN ig Dispensing with: getting rid of Pros of transformers: Better results Parallelization Less time to train "Speedometer"reading for AI translation ability: BLEU (Bilingual Evaluation Understudy): so it seems to be some math formula, used as a metric to grade a machine's translation, comparing it with a translation written by a professional, a human ofc 0.0 would mean there was no matching and the model produced a horribly wrong result 100.0 or 1.0 would mean it was a perfect match, but ofc this can never be the case, we can say the exact same thing using lots of
Continue reading on Dev.to
Opens in a new tab



