Highway LSTM is a variants of LSTM, it adds highway networks inside an LSTM. In this tutorial, we will introduce it for LSTM beginners.
Highway Networks
Highway LSTM integrates highway networks in lstm, in order to understan it, you should learn what is highway network. Here is an tutorial:
A Beginner Introduction to Highway Networks – Machine Learning Tutorial
Highway LSTM
Highway LSTM is proposed in paper: LANGUAGE MODELING WITH HIGHWAY LSTM
There are three kinds of high lstms, we will introduce them one by one.
HW-LSTM-C
HW-LSTM-C is defined as:
It adds a highway network to the previous state \(c_{t-1}\) of lstm.
Here tanh() is a feedward network.
HW-LSTM-H
HW-LSTM-H is defined as:
Similar to HW-LSTM-C, it adds a highway network to the output \(h_{t}\) of lstm.
HW-LSTM-CH
HW-LSTM-CH combines HW-LSTM-C and HW-LSTM-H, it is defined as:
Which highway lstm has good performance?
As to experiments in this paper, we can find:
- HW-LSTM-C is almost same to baseline LSTM
It means it is not useful to add a feedword network to \(c\) of lstm.
- HW-LSTM-H has the best performance
It means it is useful to add a feedword network to output \(h\) of lstm.
However, this paper does not compare LSTMP, we can not be sure the efficiency of HW-LSTM-H is caused by highway network or the tanh() projection.
Understand LSTMP (LSTM with Recurrent Projection Layer): Comparing with LSTM