Understanding LSTMs – Part 4: How LSTM Decides What to Forget

In the previous article , we completed the first part of the LSTM and obtained the result from the calculation. Let us continue. Earlier, when the input was 1, we obtained the following result: Now, if we change the input to a relatively large negative number, such as −10, then after calculating the x-axis value, the output of the sigmoid activation function will be close to 0. The long-term memory will be completely forgotten, because anything multiplied by 0 is 0. Since the sigmoid activation function converts any input into a value between 0 and 1, its output determines what percentage of the long-term memory is retained. So, the first stage of the LSTM determines what percentage of the long-term memory is remembered. This part is called the forget gate. Now that we understand what the first stage does, let us explore the second stage. In the second stage, the block on the right combines the short-term memory and the input to create a potential long-term memory. The block on the lef

Understanding LSTMs – Part 4: How LSTM Decides What to Forget

Related Articles

Switzerland — Best Crypto Exchange (2026)

Cursor Your Dream, Part 2: How to Move From First Prompt to First Working App

The Difference between `let`, `var` and `const`

Circulation Metrics Framework for Living Systems

Red Rooms makes online poker as thrilling as its serial killer

Related Articles

How-To
Switzerland — Best Crypto Exchange (2026)
Dev.to Beginners • 2d ago

How-To
Cursor Your Dream, Part 2: How to Move From First Prompt to First Working App
Hackernoon • 2d ago

How-To
The Difference between `let`, `var` and `const`
Medium Programming • 2d ago

How-To
Circulation Metrics Framework for Living Systems
Medium Programming • 2d ago

How-To
Red Rooms makes online poker as thrilling as its serial killer
The Verge • 2d ago