An Explain to Why not Use Relu Activation Functionin in RNN or LSTM?

We often use tanh activation function in rnn or lstm. However, we can not use relu in these model. Why? In this tutorial, we will explain it to you.

As to rnn

The output can be defined as:

\[h_t = f(Wx_t+Uh_{t-1}+b)\]

Where \(f\) is the activation function. If f = relu, we may get vary large value in \(h_t\).

In paper:

A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

We also can find this sentence:

At first sight, ReLUs seem inappropriate for RNNs because they can have very large outputs so they might be expected to be far more likely to explode than units that have bounded values.

However, if \(W\) and \(U\) is identity matrix, relu can be used in rnn.

An Explain to Why not Use Relu Activation Functionin in RNN or LSTM? – Machine Learning Tutorial

Leave a Reply Cancel reply