We often use RNN/GRU/LSTM/BiLSTM to encode sequence. In order to get the output of these models. We can average outputs or use attention to compute. In this tutorial, we will introduce how to average their outputs.
Preliminary
Look at rnn model below:
The output is\([h_0, h_1,…, h_t\).
How to get these outputs?
As to BiLSTM, we can use code below:
outputs, state = tf.nn.bidirectional_dynamic_rnn( cell_fw=tf.nn.rnn_cell.LSTMCell(self.hidden_size, forget_bias=1.0), cell_bw=tf.nn.rnn_cell.LSTMCell(self.hidden_size, forget_bias=1.0), inputs=inputs, sequence_length=word_in_sen_len, dtype=tf.float32, scope='bilstm_doc_word' ) outputs = tf.concat(outputs, 2)
In this code, inputs is a tensor with the shape [batch_size, time_step, dim]
Here is a tutorial on how to use tf.nn.bidirectional_dynamic_rnn().
How to average the outputs?
To average the outputs, we should notice not all outputs are valid.
As to example above, if \(t = 50\), however, only \([h_0, h_1, h_2]\) are valid. We should average the outputs as:
\(\frac{h_0+h_1+h_2}{3}\)
It means we should use sequence_length to create a mske to hidden invalid outputs.
We will create a function to implement it.
def avg_lstm(inputs, length): inputs = tf.cast(inputs, tf.float32) batch_size = tf.shape(inputs)[0] time_step = tf.shape(inputs)[1] length = tf.reshape(length, [-1]) mask = tf.reshape(tf.cast(tf.sequence_mask(length, time_step), tf.float32), tf.shape(inputs)) inputs *= mask _sum = tf.reduce_sum(inputs, reduction_indices=1) length = tf.reshape(length, [-1, time_step]) length = tf.cast(length , tf.float32) + 1e-9 return _sum / length
Then, we can get the average output easily.
output = avg_lstm(outputs, word_in_sen_len)
In order to know how to use tf.sequence_mask(), you can view this tutorial: