Smoothing normalization is proposed in paper: Attention-Based Models for Speech Recognition. In this tutorial, we will introduce how to implement it in tensorflow.
Smoothing Normalization
It is defined as:
\(a_{i, j} = sigmoid(e_{i, j}) / sum_j(sigmoid(e_{i, j}))\)
Here \(e_{i,j}\) is attention score.
How to implement smoothing normalization in tensorflow?
It is easy to implement, here is an example code:
def _smoothing_normalization(e): """Applies a smoothing normalization function instead of softmax Introduced in: J. K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Ben- gio, “Attention-based models for speech recognition,” in Ad- vances in Neural Information Processing Systems, 2015, pp. 577–585. ############################################################################ Smoothing normalization function a_{i, j} = sigmoid(e_{i, j}) / sum_j(sigmoid(e_{i, j})) ############################################################################ Args: e: matrix [batch_size, max_time(memory_time)]: expected to be energy (score) values of an attention mechanism Returns: matrix [batch_size, max_time]: [0, 1] normalized alignments with possible attendance to multiple memory time steps. """ return tf.nn.sigmoid(e) / tf.reduce_sum(tf.nn.sigmoid(e), axis=-1, keepdims=True)