A Beginner Introduction to Highway LSTM – LSTM Notes

By | December 23, 2020

Highway LSTM is a variants of LSTM, it adds highway networks inside an LSTM. In this tutorial, we will introduce it for LSTM beginners.

Highway Networks

Highway LSTM integrates highway networks in lstm, in order to understan it, you should learn what is highway network. Here is an tutorial:

A Beginner Introduction to Highway Networks – Machine Learning Tutorial

Highway LSTM

Highway LSTM is proposed in paper: LANGUAGE MODELING WITH HIGHWAY LSTM

There are three kinds of high lstms, we will introduce them one by one.

HW-LSTM-C

HW-LSTM-C is defined as:

the equations of HW-LSTM-C highway lstm

It adds a highway network to the previous state \(c_{t-1}\) of lstm.

Here tanh() is a feedward network.

HW-LSTM-H

HW-LSTM-H is defined as:

the equations of HW-LSTM-H highway lstm

Similar to HW-LSTM-C, it adds a highway network to the output \(h_{t}\) of lstm.

HW-LSTM-CH

HW-LSTM-CH combines HW-LSTM-C and HW-LSTM-H, it is defined as:

the equations of HW-LSTM-CH highway lstm

Which highway lstm has good performance?

As to experiments in this paper, we can find:

the performance of highway lstm

  • HW-LSTM-C is almost same to baseline LSTM

It means it is not useful to add a feedword network to \(c\) of lstm.

  • HW-LSTM-H has the best performance

It means it is useful to add a feedword network to output \(h\) of lstm.

However, this paper does not compare LSTMP, we can not be sure the efficiency of HW-LSTM-H is caused by highway network or the tanh() projection.

Understand LSTMP (LSTM with Recurrent Projection Layer): Comparing with LSTM

Leave a Reply