We have created our own custom lstm network using tensorflow. We initialize all lstm weights and biases by this:
# initialize matrix def init_matrix(self, shape): return tf.random_normal(shape, stddev=0.1) # initialize vector def init_vector(self, shape): return tf.zeros(shape)
This tutorial contains detail content.
Build Your Own LSTM Model Using TensorFlow: Steps to Create a Customized LSTM – TensorFlow Tutorial
In order to make the performance of our custom lstm network be the same to tf.nn.rnn_cell.LSTMCell(), we should initialize weights and biases in our custom lstm like tf.nn.rnn_cell.LSTMCell().
LSTM biases in TensorFlow
Check the source code of RNN or LSTMCell in tensorflow, We can find how lstm biases are initialized in tensorflow.
The source code is here:
Here is an example:
self._bias = self.add_variable( _BIAS_VARIABLE_NAME, shape=[4 * self._num_units], initializer=init_ops.zeros_initializer(dtype=self.dtype))
We can find lstm biases are initialized to zero in tensorflow.
LSTM Weights in TensorFlow
We also can find how lstm weights are initialized in tensorflow.
self._kernel = self.add_variable( _WEIGHTS_VARIABLE_NAME, shape=[input_depth + h_depth, 4 * self._num_units])
LSTM weights or LSTM kernel are initialized by self.add_variable() function.
The source code of self.add_variable() function is here:
It is defined as:
def add_variable(self, name, shape, dtype=None, initializer=None, regularizer=None, trainable=True, constraint=None, partitioner=None):
In this function we can find:
It means tensorflow use tf.get_variable() to create or return weight tensors in lstm network.
When initializer = None, tensorflow will use tf.glorot_uniform_initializer() to initialize weights in lstm.
Here is the tutorial:
Understand How tf.get_variable() Initialize a Tensor When Initializer is None: A Beginner Guide
So, we should modify init_matrix() in our custom lstm to:
def init_matrix(self, shape): return tf.contrib.layers.xavier_initializer()(shape)
Notice: All code above is from tensorflow 1.8.
As to tensorflow 1.10, tensorflow/python/layers/base.py will add weights as follow:
variable = super(Layer, self).add_weight( name, shape, dtype=dtypes.as_dtype(dtype), initializer=initializer or scope.initializer, trainable=trainable, constraint=constraint, partitioner=partitioner, use_resource=use_resource, synchronization=synchronization, aggregation=aggregation, getter=vs.get_variable)
Parent class Layer is from:
from tensorflow.python.keras.engine import base_layer
We will check file tensorflow/python/keras/engine/base_layer.py, we can find the code below:
# Initialize variable when no initializer provided if initializer is None: # If dtype is DT_FLOAT, provide a uniform unit scaling initializer if dtype.is_floating: initializer = initializers.glorot_uniform() # If dtype is DT_INT/DT_UINT, provide a default value `zero` # If dtype is DT_BOOL, provide a default value `FALSE` elif dtype.is_integer or dtype.is_unsigned or dtype.is_bool: initializer = initializers.zeros() # NOTES:Do we need to support for handling DT_STRING and DT_COMPLEX here?
If initializer is None, tensorflow will use initializers.glorot_uniform() to initilize a weight.
However, we view the keras LSTM, we can find:
__init__( units, activation='tanh', recurrent_activation='hard_sigmoid', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', unit_forget_bias=True, kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0, implementation=1, return_sequences=False, return_state=False, go_backwards=False, stateful=False, unroll=False, **kwargs )
The source code is here:
In keras LSTM
kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal',
kernel_initializer: Initializer for the kernel weights matrix, used for the linear transformation of the inputs
recurrent_initializer: Initializer for the recurrent_kernel weights matrix, used for the linear transformation of the recurrent state
It means Wxt shoud be initialized by tf.contrib.layers.xavier_initializer(), Wht should be initialized by tf.orthogonal_initializer() in tensorflow.