
Understanding Transformers Part 2: Positional Encoding with Sine and Cosine
In the previous article , we converted words into embeddings. Now let’s see how transformers add position to those numbers. The numbers that represent word order in a transformer come from a sequence of sine and cosine waves. Each curve is responsible for generating position values for a specific dimension of the word embedding. Understanding the Idea Think of each embedding dimension as getting its value from a different wave. For example: The green curve provides the positional values for the first embedding dimension of every word. For the first word in the sentence, which lies at the far left of the graph (position 0 on the x-axis): The value taken from the green curve is 0 (the y-axis value at that position). The orange curve provides the positional values for the second embedding dimension . At the same position (first word): The value from the orange curve is 1 . The blue curve provides the positional values for the third embedding dimension . For the first word: The value is 0
Continue reading on Dev.to
Opens in a new tab

