As to GRU, there is a reset gate in it.
Here rt is the rest gate of GRU. We have use zt to forget information that passes into current GRU cell, can we remove the reset gate rt?
The answer is yes.
Why can we remove reset gate in GRU?
Jos van der Westhuizen and Joan Lasenby proposed a JANET in paper ‘The unreasonable effectiveness of the forget gate‘. The JANET is an improved GRU which has removed reset gate rt.
Compare GRU and JANET
GRU | JANET |
From the formula of JANET, we can find if we remove rt in GRU, it will be converted to JANET.
From paper we can find the performance of JANET is little better than LSTM on synthetic memory tasks and on the MNIST, pMNIST, and MIT-BIH arrhythmia datasets.
Which means if we remove the reset gate in GRU, the performance of GRU will not be decreased or may be improved on some tasks.