Ultimate Guide to IndRNN: Handling Longer Sequence Than LSTM

IndRNN is proposed in paper: Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN. It can handle longer sequence than LSTM.

How Long Sequence Can be Processed Effectively by LSTM? – LSTM Tutorial

In this tutorial, we will use some examples to introduce this model.

There is a tensorflow project on IndRNN in github. It is:

https://github.com/batzner/indrnn

You can view this project to understand this model.

The core of IndRNN

IndRNN modifies the computation of hidden state in RNN.

\(h_t=\sigma(Wx_t+u\odot h_{t-1}+b)\)

Here\(\odot\) is the hadamard product.\(\sigma\) is the activation function, for example relu().

How about variables in IndRNN

In order to know how many variables in IndRNN, we will use an example to discuss.

Look at example below:

import tensorflow as tf
import numpy as np
from ind_rnn_cell import IndRNNCell
TIME_STEPS = 784
LAST_LAYER_LOWER_BOUND = pow(0.5, 1 / TIME_STEPS)
RECURRENT_MAX = pow(2, 1 / TIME_STEPS)
#5*4*10
batch = 64
inputs = tf.Variable(tf.truncated_normal([64, 40, 200], stddev=0.1), name="S")
input_init = tf.random_uniform_initializer(-0.001, 0.001)
recurrent_init = tf.random_uniform_initializer(LAST_LAYER_LOWER_BOUND,
                                                   RECURRENT_MAX)
cell = IndRNNCell(100,
                      recurrent_max_abs=RECURRENT_MAX,
                      input_kernel_initializer=input_init,
                      recurrent_kernel_initializer=recurrent_init)
# Unroll the layer
layer_output, _ = tf.nn.dynamic_rnn(cell, inputs,
                                        dtype=tf.float32,
                                        scope="rnn")

init = tf.global_variables_initializer()
init_local = tf.local_variables_initializer()
with tf.Session() as sess:
    sess.run([init, init_local])
    np.set_printoptions(precision=4, suppress=True)
    a =sess.run(layer_output)
    print(a.shape)
    for n in tf.trainable_variables():
        print(n)

In this example, the layer input is inputs, which is batch_size * sequence_length*dim, for example: 64*40*200.

Run this code, you will find:

the layer_output is: (64, 40, 100), it is the same to LSTM output.

The trainable variables in IndRNN are:

<tf.Variable 'rnn/ind_rnn_cell/input_kernel:0' shape=(200, 100) dtype=float32_ref>
<tf.Variable 'rnn/ind_rnn_cell/recurrent_kernel:0' shape=(100,) dtype=float32_ref>
<tf.Variable 'rnn/ind_rnn_cell/bias:0' shape=(100,) dtype=float32_ref>

We can find: there are only three variables in a IndRNN model. In this example, IndRNN will handle 40 length sequence, however, there are not 40*3 variables in it.

How about recurrent_max_abs?

recurrent_max_abs limits the maximum value in hidden weight \(u\).

For example:

    # Clip the absolute values of the recurrent weights to the specified maximum
    if self._recurrent_max_abs:
      self._recurrent_kernel = clip_ops.clip_by_value(self._recurrent_kernel,
                                                      -self._recurrent_max_abs,
                                                      self._recurrent_max_abs)

We should notice: the longer sequence , the smaller \(u\).

Because:

LAST_LAYER_LOWER_BOUND = pow(0.5, 1 / TIME_STEPS)

Here TIME_STEPS is the sequence length.

How about kernel initialization of IndRNN in multiple layers?

Here is an example:

RECURRENT_MAX = pow(2, 1 / TIME_STEPS)

for layer in range(1, NUM_LAYERS + 1):
    # Init only the last layer's recurrent weights around 1
    recurrent_init_lower = 0 if layer < NUM_LAYERS else LAST_LAYER_LOWER_BOUND
    recurrent_init = tf.random_uniform_initializer(recurrent_init_lower,
                                                   RECURRENT_MAX)

We can find the kernel initialization of last IndRNN is [LAST_LAYER_LOWER_BOUND, RECURRENT_MAX]

How about learning rate?

The default is:

LEARNING_RATE_INIT = 0.0002

Compare LSTM with IndRNN

Here is an comparative result.

Ultimate Guide to IndRNN: Handling Longer Sequence Than LSTM – LSTM Tutorial