LSTM, GRU are RNNs. In this tutorial, we will introduce how to build a custom one by inheriting RNNCell if you plan to build a new type of RNN.
We also can build a custom RNN and do not use RNNCell, here is the tutorial:
Build Your Own LSTM Model Using TensorFlow: Steps to Create a Customized LSTM – TensorFlow Tutorial
Build a Custom BiLSTM Model Using TensorFlow: A Step Guide – TensorFlow Tutorial
How to inherit RNNCell?
The structure of a custom rnn inheriting RNNCell is following:
import tensorflow as tf class CustomCell(tf.nn.rnn_cell.RNNCell): def __init__(self, num_units, reuse = None, name = "custom_cell"): super(CustomCell, self).__init__(_reuse=reuse, name=name) self._num_units = num_units # the dimension of rnn cell @property def state_size(self): return self._num_units @property def output_size(self): return self._num_units def build(self, inputs_shape): #inputs_shape is batch_size * dim #for example: the inputs is batch_size * timestep * dim #inputs_shape = [batch_size, dim] #we can create some variables in this method, and these variables can be used in call() self.built = True def call(self, inputs, state): # call body # how to use previous rnn cell output and state to generate new output and hidden new_h = inputs new_c = state return new_h, new_c
We should notice two important methods:
build(): the parameter is inputs_shape, we should create some variables in this function. This function is run before call() function is run.
We should set self.built = True.
call(): parameters are inputs and state, they are previous rnn cell output and state, we will generate current output and state in this current rnn cell.
We can find there are 2 parameters in call(), it also return two parameters.
We also can evaluate this custom rnn as follows:
size = 100 inputs = tf.Variable(tf.truncated_normal([3, 20, 100], stddev=0.1), name="inputs") input_lengths = tf.Variable(tf.truncated_normal([3, 20], stddev=0.1), name="inputs_length") _fw_cell =CustomCell(size, name='encoder_fw_') _bw_cell =CustomCell(size, name='encoder_bw') with tf.variable_scope("Custom_BiLSTM"): outputs, (fw_state, bw_state) = tf.nn.bidirectional_dynamic_rnn( _fw_cell, _bw_cell, inputs, sequence_length=None, dtype=tf.float32, swap_memory=True) outputs = tf.concat(outputs, axis=2) # Concat and return forward + backward outputs init = tf.global_variables_initializer() init_local = tf.local_variables_initializer() with tf.Session() as sess: sess.run([init, init_local]) np.set_printoptions(precision=4, suppress=True) f =sess.run([inputs, outputs])
Run this code, we will see:
f shape= (3, 20, 100) (3, 20, 200) [array([[[ 0.1194, -0.1482, -0.0366, ..., 0.0245, 0.0814, -0.0542], [ 0.0706, 0.0103, 0.14 , ..., -0.1292, 0.1306, 0.0329], [ 0.0911, 0.0186, 0.0709, ..., -0.0629, -0.1679, 0.0624], ..., [ 0.0242, 0.1694, -0.1566, ..., 0.0322, 0.0864, -0.0159], [ 0.1084, 0.0702, 0.0162, ..., 0.0331, 0.0174, 0.0541], [-0.103 , 0.006 , -0.0532, ..., 0.0865, -0.0875, -0.0121]]], dtype=float32), array([[[ 0.1194, -0.1482, -0.0366, ..., 0.0245, 0.0814, -0.0542], [ 0.0706, 0.0103, 0.14 , ..., -0.1292, 0.1306, 0.0329], [ 0.0911, 0.0186, 0.0709, ..., -0.0629, -0.1679, 0.0624], ..., [ 0.0242, 0.1694, -0.1566, ..., 0.0322, 0.0864, -0.0159], [ 0.1084, 0.0702, 0.0162, ..., 0.0331, 0.0174, 0.0541], [-0.103 , 0.006 , -0.0532, ..., 0.0865, -0.0875, -0.0121]]], dtype=float32)]
Improve Custom RNN
We can create some variables in build() function to improve custom RNN, for example:
def build(self, inputs_shape): print(inputs_shape) print(type(inputs_shape)) #inputs_shape is batch_size * dim #for example: the inputs is batch_size * timestep * dim #inputs_shape = [batch_size, dim] #we can create some variables in this method, and these variables can be used in call() inputs_dim = inputs_shape[-1].value self._w = self.add_variable(name="weight", shape = [inputs_dim, self._num_units], initializer = tf.glorot_normal_initializer(), dtype = tf.float32) self._b = self.add_variable(name="bias", shape=[self._num_units], initializer=tf.glorot_normal_initializer(), dtype=tf.float32) self.built = True
You should notice: we can not use tf.Variable() in build() to create tensorflow variables, otherwise, you will get a value error:
To fix this error, you can view:
Then we can evalute it and will get the result:
f shape= (3, 20, 100) (3, 20, 200) [array([[[ 0.1409, 0.107 , 0.0258, ..., 0.0281, -0.0612, 0.0525], [ 0.0652, -0.0202, 0.0169, ..., -0.1956, 0.0543, -0.0334], [-0.0559, 0.1613, 0.0257, ..., 0.0858, 0.1105, -0.0963], ..., [-0.0716, -0.0563, -0.0451, ..., 0.1238, -0.0111, -0.0465], [-0.0484, 0.0344, -0.0566, ..., -0.1707, -0.0705, 0.01 ], [-0.0037, -0.0209, -0.0565, ..., 0.0233, 0.0548, 0.1174]]], dtype=float32), array([[[-0.0311, -0.246 , -0.0412, ..., -0.0077, -0.0249, 0.0986], [-0.1012, -0.1396, -0.0219, ..., -0.0893, 0.128 , 0.1197], [ 0.0678, -0.2054, -0.0608, ..., -0.0068, -0.0627, 0.1589], ..., [ 0.0128, -0.1721, 0.0594, ..., 0.1566, 0.1048, -0.104 ], [-0.0187, -0.2965, 0.0667, ..., -0.0013, 0.0132, 0.046 ], [ 0.0082, -0.1131, 0.0417, ..., 0.068 , 0.2191, 0.0546]]], dtype=float32)]