Nested LSTM network is one of improved LSTM model, which has better performance than classic LSTM. In this tutorial, we will introduce it for lstm network beginners.
The classic LSTM network
The classic lstm network is defined as:
We should notice: the equations of LSTM above do not contain peephole connections. In order to see LSTM with peephole connections, you can view:
Understnd LSTM Peephole Connections: A Beginner Guide – LSTM Networks Tutorial
In order to improve LSTM, we can improve the equations of it.
As to nested lstm, it will improve equation (3) ct.
As to equation (3) of LSTM, it can regard as:
ct = f(ct-1, xt, ht-1)
In classic LSTM, f is add function.
How does nested lstm improve classic lstm?
See equation (3) in classic LSTM, we can split it as below:
It means:
If f is add() function, ct will be:
which is same to the hidden state of classic LSTM network.
However, if f is other functions? such as GRU, Stack LSTM or LSTM?
If f is LSTM cell, the classic LSTM will be converted Nested LSTM.
We have known a lstm cell receive three inputs( ht-1, xt and ct-1) , the first hidden state is 0, and return two outputs (ht,ct)
As to nested lstm, we can set:
We can set the first hidden state of nested lstm to be zero
and
Then we can get the final output of ht in nested lstm.
Here is the structure of nested lstm.
Meanwhile, we can stack multi layer inner lstm to get ct in nested lstm.
You should notice: \(\sigma_c\) is identity function, not tanh function in paper.
https://arxiv.org/pdf/1801.10308.pdf