Understand LSTM Weight and Bias Initialization When Initializer is None in TensorFlow

We have created our own custom lstm network using tensorflow. We initialize all lstm weights and biases by this:

    # initialize matrix
    def init_matrix(self, shape):
        return tf.random_normal(shape, stddev=0.1)
    # initialize vector
    def init_vector(self, shape):
        return tf.zeros(shape)

This tutorial contains detail content.

Build Your Own LSTM Model Using TensorFlow: Steps to Create a Customized LSTM – TensorFlow Tutorial

In order to make the performance of our custom lstm network be the same to tf.nn.rnn_cell.LSTMCell(), we should initialize weights and biases in our custom lstm like tf.nn.rnn_cell.LSTMCell().

LSTM biases in TensorFlow

Check the source code of RNN or LSTMCell in tensorflow, We can find how lstm biases are initialized in tensorflow.

The source code is here:

https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/ops/rnn_cell_impl.py

Here is an example:

    self._bias = self.add_variable(
        _BIAS_VARIABLE_NAME,
        shape=[4 * self._num_units],
        initializer=init_ops.zeros_initializer(dtype=self.dtype))

We can find lstm biases are initialized to zero in tensorflow.

LSTM Weights in TensorFlow

We also can find how lstm weights are initialized in tensorflow.

    self._kernel = self.add_variable(
        _WEIGHTS_VARIABLE_NAME,
        shape=[input_depth + h_depth, 4 * self._num_units])

LSTM weights or LSTM kernel are initialized by self.add_variable() function.

The source code of self.add_variable() function is here:

https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/layers/base.py

It is defined as:

    def add_variable(self, name, shape, dtype=None,
                   initializer=None, regularizer=None,
                   trainable=True, constraint=None,
                   partitioner=None):

In this function we can find:

It means tensorflow use tf.get_variable() to create or return weight tensors in lstm network.

When initializer = None, tensorflow will use tf.glorot_uniform_initializer() to initialize weights in lstm.

Here is the tutorial:

Understand How tf.get_variable() Initialize a Tensor When Initializer is None: A Beginner Guide

So, we should modify init_matrix() in our custom lstm to:

def init_matrix(self, shape):        
        return tf.contrib.layers.xavier_initializer()(shape)

Notice: All code above is from tensorflow 1.8.

As to tensorflow 1.10, tensorflow/python/layers/base.py will add weights as follow:

variable = super(Layer, self).add_weight(
            name,
            shape,
            dtype=dtypes.as_dtype(dtype),
            initializer=initializer or scope.initializer,
            trainable=trainable,
            constraint=constraint,
            partitioner=partitioner,
            use_resource=use_resource,
            synchronization=synchronization,
            aggregation=aggregation,
            getter=vs.get_variable)

Parent class Layer is from:

 from tensorflow.python.keras.engine import base_layer

We will check file tensorflow/python/keras/engine/base_layer.py, we can find the code below:

    # Initialize variable when no initializer provided
    if initializer is None:
      # If dtype is DT_FLOAT, provide a uniform unit scaling initializer
      if dtype.is_floating:
        initializer = initializers.glorot_uniform()
      # If dtype is DT_INT/DT_UINT, provide a default value `zero`
      # If dtype is DT_BOOL, provide a default value `FALSE`
      elif dtype.is_integer or dtype.is_unsigned or dtype.is_bool:
        initializer = initializers.zeros()
      # NOTES:Do we need to support for handling DT_STRING and DT_COMPLEX here?

If initializer is None, tensorflow will use initializers.glorot_uniform() to initilize a weight.

However, we view the keras LSTM, we can find:

__init__(
    units,
    activation='tanh',
    recurrent_activation='hard_sigmoid',
    use_bias=True,
    kernel_initializer='glorot_uniform',
    recurrent_initializer='orthogonal',
    bias_initializer='zeros',
    unit_forget_bias=True,
    kernel_regularizer=None,
    recurrent_regularizer=None,
    bias_regularizer=None,
    activity_regularizer=None,
    kernel_constraint=None,
    recurrent_constraint=None,
    bias_constraint=None,
    dropout=0.0,
    recurrent_dropout=0.0,
    implementation=1,
    return_sequences=False,
    return_state=False,
    go_backwards=False,
    stateful=False,
    unroll=False,
    **kwargs
)

The source code is here:

https://www.github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/keras/_impl/keras/layers/recurrent.py

In keras LSTM

    kernel_initializer='glorot_uniform',
    recurrent_initializer='orthogonal',

kernel_initializer: Initializer for the kernel weights matrix, used for the linear transformation of the inputs

recurrent_initializer: Initializer for the recurrent_kernel weights matrix, used for the linear transformation of the recurrent state

It means W_xt shoud be initialized by tf.contrib.layers.xavier_initializer(), W_ht should be initialized by tf.orthogonal_initializer() in tensorflow.

Understand LSTM Weight and Bias Initialization When Initializer is None in TensorFlow – TensorFlow Tutorial

LSTM biases in TensorFlow

LSTM Weights in TensorFlow

In keras LSTM

Leave a Reply Cancel reply