An Introduction to Bahdanau Attention for Beginners

In this tutorial, we will introduce what is bahdanau attention mechanism and how to implement it. This attention is proposed in paper: Neural Machine Translation by Jointly Learning to Align and Translate

What is Bahdanau Attention?

In order to understand bahdanau attention, we should know how to get the output in seq2seq decoder.

To get decoder output \(y_t\), we can do as follows:

\(y_t = g(y_{t-1}, s_t, c_t)\)

Here \(g\) can be a GRUCell or LSTMCell. \(s_t\) is an RNN hidden state for time \(t\).

\(s_t\) can be computed as follows:

\(s_t = f(s_{t-1}, y_{t-1}, c_t)\)

Here \(f\) function can be a gate mechanism in LSTM. Meanwhile, to compute \(c_t\), we can use bahdanau attention.

Bahdanau attention can determine the different importance of each word in seq2seq encoder for current output \(y_t\).

We can use equations below to compute \(c_t\).

We should notice, in order to compute \(e_{ij}\), we can compute it based on seq2seq decoder previous hidden state \(s_{i-1}\) and encoder \(h_j\).

Here \(a\) function can be:

\(e_{i,j} = V^Ttanh(W \cdot s_{i-1}+U \cdot h_j + b)\)