Why Your Custom LSTM or BiLSTM is Worse than tf.nn.dynamic_rnn() and tf.nn.bidirectional_dynamic

In tensorflow, we can use tf.nn.dynamic_rnn() and tf.nn.bidirectional_dynamic_rnn() to build a lstm and bilstm model to training. However, if you want to improve lstm and bilsm, you should implement them by your own tensorflow code. Here are some examples:

Custom LSTM

Build Your Own LSTM Model Using TensorFlow: Steps to Create a Customized LSTM

Use Your Own Customized LSTM to Classify MNIST Handwritten Digits

Custom BiLSTM

Build a Custom BiLSTM Model Using TensorFlow: A Step Guide

However, you may find this problem: The performance of your custom lstm or bilstm model are worse than tf.nn.dynamic_rnn() and tf.nn.bidirectional_dynamic_rnn(). Why? and How to fix this problem? In this tutorial, we will discuss this topic.

Why are your custom lstm or bilstm model worse than tf.nn.dynamic_rnn() and tf.nn.bidirectional_dynamic_rnn()?

To create your custom lstm or bilstm, there are some tips you must notice:

1. You should initialize weights and biases of lstm or bilstm correctly.

You should initialize weights and biases of lstm or bilstm as tf.nn.dynamic_rnn() and tf.nn.bidirectional_dynamic_rnn(). To understand how to do you can refer to this tutorial:

Understand LSTM Weight and Bias Initialization When Initializer is None in TensorFlow

2. You should add a forget bias for the forget gate

Adding a forget bias for the forget gate can speed up the training of lstm or bilstm, if you have not added. The performance may be decreased. To understand how to add forget bias for lstm or bilstm, you can read this tutorial:

Add forget_bias for Your Custom LSTM Using TensorFlow: A Beginner Guide – TensorFlow Tutorial

3. You should process the variable length sequence

The variable length sequence with invalid inputs can affect the performance of LSTM and BiLSTM. LSTM and BiLSTM should process it correctly. You can learn how to process the variable length sequence by these tutorials:

An Introduction to How TensorFlow Bidirectional Dynamic RNN Process Variable Length Sequence – LSTM Tutorial

4. You should return the output of backward cell in BiLSTM correctly

As to the backward cell of BiLSTM, we should reverse the sequence by its length. However, we shoud notice: we also must reverse the ouput of it before returning. Here is an example:

    def output(self):
        if self.revers:
            if self.sequence_length is not None: # it is a tensor
                self.outputs = tf.reverse_sequence(self.outputs, seq_lengths=self.sequence_length, seq_axis = 1, batch_axis = 0)
            else:
                self.outputs = tf.reverse(self.outputs, axis = [1])
        return self.outputs

We should reverse the self.outpus then return it.

To know how to reverse a tensor, you can read these tutorials:

Understand TensorFlow tf.reverse_sequence(): Reverse a Tensor by Length

Understand TensorFlow tf.reverse():Reverse a Tensor Based on Axis

Why Your Custom LSTM or BiLSTM is Worse than tf.nn.dynamic_rnn() and tf.nn.bidirectional_dynamic_rnn()

Why are your custom lstm or bilstm model worse than tf.nn.dynamic_rnn() and tf.nn.bidirectional_dynamic_rnn()?

1. You should initialize weights and biases of lstm or bilstm correctly.

2. You should add a forget bias for the forget gate

3. You should process the variable length sequence

4. You should return the output of backward cell in BiLSTM correctly

Leave a Reply Cancel reply