An Introduction to Bahdanau Attention for Beginners – Deep Learning Tutorial

By | March 30, 2022

In this tutorial, we will introduce what is bahdanau attention mechanism and how to implement it. This attention is proposed in paper: Neural Machine Translation by Jointly Learning to Align and Translate

What is Bahdanau Attention?

In order to understand bahdanau attention, we should know how to get the output in seq2seq decoder.

an example of seq2seq

To get decoder output \(y_t\), we can do as follows:

\(y_t = g(y_{t-1}, s_t, c_t)\)

Here \(g\) can be a GRUCell or LSTMCell. \(s_t\) is an RNN hidden state for time \(t\).

\(s_t\) can be computed as follows:

\(s_t = f(s_{t-1}, y_{t-1}, c_t)\)

Here \(f\) function can be a gate mechanism in LSTM. Meanwhile, to compute \(c_t\), we can use bahdanau attention.

Bahdanau attention can determine the different importance of each word in seq2seq encoder for current output \(y_t\).

How to compute \(c_t\)?

We can use equations below to compute \(c_t\).

An Introduction to Bahdanau Attention for Beginners - Deep Learning Tutorial

We should notice, in order to compute \(e_{ij}\), we can compute it based on seq2seq decoder previous hidden state \(s_{i-1}\) and encoder \(h_j\).

Here \(a\) function can be:

\(e_{i,j} = V^Ttanh(W \cdot s_{i-1}+U \cdot h_j + b)\)

Leave a Reply