Understand LSTM Weight and Bias Initialization When Initializer is None in TensorFlow – TensorFlow Tutorial

By | July 22, 2020

We have created our own custom lstm network using tensorflow. We initialize all lstm weights and biases by this:

    # initialize matrix
    def init_matrix(self, shape):
        return tf.random_normal(shape, stddev=0.1)
    # initialize vector
    def init_vector(self, shape):
        return tf.zeros(shape)

This tutorial contains detail content.

Build Your Own LSTM Model Using TensorFlow: Steps to Create a Customized LSTM – TensorFlow Tutorial

In order to make the performance of our custom lstm network be the same to tf.nn.rnn_cell.LSTMCell(), we should initialize weights and biases in our custom lstm like tf.nn.rnn_cell.LSTMCell().

LSTM biases in TensorFlow

Check the source code of RNN or LSTMCell in tensorflow, We can find how lstm biases are initialized in tensorflow.

The source code is here:

https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/ops/rnn_cell_impl.py

Here is an example:

    self._bias = self.add_variable(
        _BIAS_VARIABLE_NAME,
        shape=[4 * self._num_units],
        initializer=init_ops.zeros_initializer(dtype=self.dtype))

We can find lstm biases are initialized to zero in tensorflow.

LSTM Weights in TensorFlow

We also can find how lstm weights are initialized in tensorflow.

    self._kernel = self.add_variable(
        _WEIGHTS_VARIABLE_NAME,
        shape=[input_depth + h_depth, 4 * self._num_units])

LSTM weights or LSTM kernel are initialized by self.add_variable() function.

The source code of self.add_variable() function is here:

https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/layers/base.py

It is defined as:

    def add_variable(self, name, shape, dtype=None,
                   initializer=None, regularizer=None,
                   trainable=True, constraint=None,
                   partitioner=None):

In this function we can find:

how lstm weights and biases are initialized when initializer is none in tensorflow

It means tensorflow use tf.get_variable() to create or return weight tensors in lstm network.

When initializer = None, tensorflow will use tf.glorot_uniform_initializer() to initialize weights in lstm.

Here is the tutorial:

Understand How tf.get_variable() Initialize a Tensor When Initializer is None: A Beginner Guide

So, we should modify init_matrix() in our custom lstm to:

def init_matrix(self, shape):        
        return tf.contrib.layers.xavier_initializer()(shape)

Notice: All code above is from tensorflow 1.8.

As to tensorflow 1.10, tensorflow/python/layers/base.py will add weights as follow:

variable = super(Layer, self).add_weight(
            name,
            shape,
            dtype=dtypes.as_dtype(dtype),
            initializer=initializer or scope.initializer,
            trainable=trainable,
            constraint=constraint,
            partitioner=partitioner,
            use_resource=use_resource,
            synchronization=synchronization,
            aggregation=aggregation,
            getter=vs.get_variable)

Parent class Layer is from:

 from tensorflow.python.keras.engine import base_layer

We will check file tensorflow/python/keras/engine/base_layer.py, we can find the code below:

    # Initialize variable when no initializer provided
    if initializer is None:
      # If dtype is DT_FLOAT, provide a uniform unit scaling initializer
      if dtype.is_floating:
        initializer = initializers.glorot_uniform()
      # If dtype is DT_INT/DT_UINT, provide a default value `zero`
      # If dtype is DT_BOOL, provide a default value `FALSE`
      elif dtype.is_integer or dtype.is_unsigned or dtype.is_bool:
        initializer = initializers.zeros()
      # NOTES:Do we need to support for handling DT_STRING and DT_COMPLEX here?

If initializer is None, tensorflow will use initializers.glorot_uniform() to initilize a weight.

However, we view the keras LSTM, we can find:

__init__(
    units,
    activation='tanh',
    recurrent_activation='hard_sigmoid',
    use_bias=True,
    kernel_initializer='glorot_uniform',
    recurrent_initializer='orthogonal',
    bias_initializer='zeros',
    unit_forget_bias=True,
    kernel_regularizer=None,
    recurrent_regularizer=None,
    bias_regularizer=None,
    activity_regularizer=None,
    kernel_constraint=None,
    recurrent_constraint=None,
    bias_constraint=None,
    dropout=0.0,
    recurrent_dropout=0.0,
    implementation=1,
    return_sequences=False,
    return_state=False,
    go_backwards=False,
    stateful=False,
    unroll=False,
    **kwargs
)

The source code is here:

https://www.github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/keras/_impl/keras/layers/recurrent.py

In keras LSTM

    kernel_initializer='glorot_uniform',
    recurrent_initializer='orthogonal',

kernel_initializer: Initializer for the kernel weights matrix, used for the linear transformation of the inputs

recurrent_initializer: Initializer for the recurrent_kernel weights matrix, used for the linear transformation of the recurrent state

It means Wxt shoud be initialized by tf.contrib.layers.xavier_initializer(), Wht should be initialized by tf.orthogonal_initializer()  in tensorflow.

Leave a Reply