Advanced LSTM is a variation of LSTM, which is proposed in paper ADVANCED LSTM: A STUDY ABOUT BETTER TIME DEPENDENCY MODELING IN EMOTION RECOGNITION. In this tutorial, we will compare it with Conventional LSTM, which will help us to understand it.
Advanced LSTM
The structure of advanced lstm is:
We can find the output \(C(t+1)\) and \(h(t+1)\) of \(t+1\) step are computed based the ouputs of previous 3 steps, which is the main difference between advanced lstm and conventional lstm.
Difference between advanced lstm and conventional lstm
The output of conventional lstm is computed based on previous step. However, the advanced lstm is based on previous T steps. For example, T = 3.
The equation of advanced lstm as follows:
Because we will use previous T steps to compute current output in advaced lstm, we should determine each weight of per previous step, which means we will use two attention layers to compute \(C’\) and \(h’\).
Warning
If you plan to use advanced lstm to build your model, you must notice:
- Weight \(W\) is the same in \(W_{h_T}\) and \(W_{C_T}\)
- You can not compute \(C’\) and \(h’\) in each step. You should compute them every 3 or 4 steps. You may get a worse result if you compute \(C’\) and \(h’\) in each step. Because advaced lstm will capture more duplicated contex in that situation.