In tensorflow, we can use tf.nn.dynamic_rnn() and tf.nn.bidirectional_dynamic_rnn() to build a lstm and bilstm model to training. However, if you want to improve lstm and bilsm, you should implement them by your own tensorflow code. Here are some examples:
Custom LSTM
Build Your Own LSTM Model Using TensorFlow: Steps to Create a Customized LSTM
Use Your Own Customized LSTM to Classify MNIST Handwritten Digits
Custom BiLSTM
Build a Custom BiLSTM Model Using TensorFlow: A Step Guide
However, you may find this problem: The performance of your custom lstm or bilstm model are worse than tf.nn.dynamic_rnn() and tf.nn.bidirectional_dynamic_rnn(). Why? and How to fix this problem? In this tutorial, we will discuss this topic.
Why are your custom lstm or bilstm model worse than tf.nn.dynamic_rnn() and tf.nn.bidirectional_dynamic_rnn()?
To create your custom lstm or bilstm, there are some tips you must notice:
1. You should initialize weights and biases of lstm or bilstm correctly.
You should initialize weights and biases of lstm or bilstm as tf.nn.dynamic_rnn() and tf.nn.bidirectional_dynamic_rnn(). To understand how to do you can refer to this tutorial:
Understand LSTM Weight and Bias Initialization When Initializer is None in TensorFlow
2. You should add a forget bias for the forget gate
Adding a forget bias for the forget gate can speed up the training of lstm or bilstm, if you have not added. The performance may be decreased. To understand how to add forget bias for lstm or bilstm, you can read this tutorial:
Add forget_bias for Your Custom LSTM Using TensorFlow: A Beginner Guide – TensorFlow Tutorial
3. You should process the variable length sequence
The variable length sequence with invalid inputs can affect the performance of LSTM and BiLSTM. LSTM and BiLSTM should process it correctly. You can learn how to process the variable length sequence by these tutorials:
4. You should return the output of backward cell in BiLSTM correctly
As to the backward cell of BiLSTM, we should reverse the sequence by its length. However, we shoud notice: we also must reverse the ouput of it before returning. Here is an example:
def output(self): if self.revers: if self.sequence_length is not None: # it is a tensor self.outputs = tf.reverse_sequence(self.outputs, seq_lengths=self.sequence_length, seq_axis = 1, batch_axis = 0) else: self.outputs = tf.reverse(self.outputs, axis = [1]) return self.outputs
We should reverse the self.outpus then return it.
To know how to reverse a tensor, you can read these tutorials:
Understand TensorFlow tf.reverse_sequence(): Reverse a Tensor by Length
Understand TensorFlow tf.reverse():Reverse a Tensor Based on Axis