LSTM network contains three gates: input gate, forget gate and output gate. The structure of lstm gates is below:
There is a problem: what is the effect of each gate in LSTM? which one is the most important gate in LSTM network?
In this tutorial, we will discuss this topic.
In paper:
An Empirical Exploration of Recurrent Network Architectures
We can find:
Forget gate:
The forget gate turns out to be of the greatest importance. When the forget gate is removed, the LSTM exhibits drastically inferior performance on the ARITH and the XML problems, although it is relatively unimportant in language modelling, consistent with Mikolov et al. (2014)
Which means lstm forget gate is the most important gate in lstm network, it will affect the performance of lstm greatly.
adding a positive bias to the forget gate greatly improves the performance of the LSTM.
Which means we should add a forget bias for lstm forget gate.
Add forget_bias for Your Custom LSTM Using TensorFlow: A Beginner Guide
Input gate:
The second most significant gate turns out to be the input gate.
Output gate:
The output gate was the least important for the performance of the LSTM.