Highway Networks is proposed in paper: Highway Networks. It is proposed based on LSTM. In this tutorial, we will introduce it for machine learning beginners.
First, we can compare feedforward and recurrent network.
For example:
As to feedward network, the depth of network increases, the gradient may disappear. In order to fix this problem, we can use residual network.
However, as to RNN. We can use lstm to solve the gradient vanishing problem. LSTM use some gates to implement it. Based on this idea, we also can add some gates to deep feedward network to solve gradient vanishing problem.
The structure of LSTM
The structure of LSTM is below:
The most important gate of LSTM is forget gate.
Understand the Effect of LSTM Input Gate, Forget Gate and Output Gate – LSTM Network Tutorial
It is:
Can we add a forget gate to feedward network to save previous hidden or output?
The answer is Yes.
Understand Long Short-Term Memory Network(LSTM) – LSTM Tutorial
Long Short-Term Memory Network Tutorials and Examples for Beginners
Highway Networks
Highway Networks adds a forget gate in feedward network to save previous hidden or output. It looks like:
It is defined as:
\(g_T = \sigma(W_Tx + b_T )\)
\(g_C = \sigma(W_Cx + b_C )\)
\(y = x \odot g_C + tanh(Wx + b) \odot g_T\)
or
\(g_T = \sigma(W_Tx + b_T )\)
\(y = x \odot (1-g_T) + tanh(Wx + b) \odot g_T\)
Here \(tanh(Wx+b)\) is a feedward network, \(x\) is the input.
Notice
Highway Networks is useful to deep and same structure network. Otherwise, it may be worse than feedward network.