Understand Nested LSTM Network: A Beginner Guide

Nested LSTM network is one of improved LSTM model, which has better performance than classic LSTM. In this tutorial, we will introduce it for lstm network beginners.

The classic LSTM network

The classic lstm network is defined as:

We should notice: the equations of LSTM above do not contain peephole connections. In order to see LSTM with peephole connections, you can view:

Understnd LSTM Peephole Connections: A Beginner Guide – LSTM Networks Tutorial

In order to improve LSTM, we can improve the equations of it.

As to nested lstm, it will improve equation (3) c_t.

As to equation (3) of LSTM, it can regard as:

c_t = f(c_t-1, x_t, h_t-1)

In classic LSTM, f is add function.

How does nested lstm improve classic lstm?

See equation (3) in classic LSTM, we can split it as below:

It means:

If f is add() function, c_t will be:

which is same to the hidden state of classic LSTM network.

However, if f is other functions? such as GRU, Stack LSTM or LSTM?

If f is LSTM cell, the classic LSTM will be converted Nested LSTM.

We have known a lstm cell receive three inputs( h_t-1, x_t and c_t-1) , the first hidden state is 0, and return two outputs (h_t,c_t)

As to nested lstm, we can set:

We can set the first hidden state of nested lstm to be zero

and

Then we can get the final output of h_t in nested lstm.

Here is the structure of nested lstm.

Meanwhile, we can stack multi layer inner lstm to get c_t in nested lstm.

You should notice: \(\sigma_c\) is identity function, not tanh function in paper.

https://arxiv.org/pdf/1801.10308.pdf

Understand Nested LSTM Network: A Beginner Guide – LSTM Network Tutorial

The classic LSTM network

How does nested lstm improve classic lstm?

Leave a Reply Cancel reply