There are many models that have improved LSTM, GRU (Gated Recurrent Unit) is one of them. In this tutorial, we will introduce GRU and compare it with LSTM.
What is GRU?
The structure of GRU likes:
Compare GRU with LSTM
The formula of GRU and LSTM are below:
LSTM | GRU |
We can find: GRU is also a LSTM, it only get different output from LSTM.
We can tell you this conclusion one by one.
1. Look at the output equation lstm.
\(h^t = o_t \odot tanh(c_t)\)
\(h^t\) is the lstm cell output. However, how about we use \(tanh(c_t)\) to be the output of the lstm cell?
2. Move the gate \(o_t\) to the input of next lstm cell
As to input in lstm cell, it is computed as:
\(g^t = tanh(W_{gx}x_t+W_{ch}h_{t-1}+b_g)\)
We use the output gate \(o_t\) to control the \(h_{t-1}\). The modified input can be:
\(g^t = tanh(W_{gx}x_t+W_{ch}( o_t \odot h_{t-1})+b_g)\)
This is the input of GRU.
However, GRU only consider the \(o_t\) when computing input. I think if we use \( o_t \odot h_{t-1}\) to replace \(h_{t-1}\) and add a tanh for the output of GRU. GRU is also a lstm.
There is one problem, if we do not use \(o_t\) (it is rt in GRU), the performance of GRU will be decreased?
The answer is not, you can read this tutorial.
Can We Remove Reset Gate in GRU? Can It Decrease the Performance of GRU? – Deep Learning Tutorial