Understand End-To-End Memory Networks - Part 1 - A Simple Tutorial for NLP Beginners

End-to-End Memory Networks: https://arxiv.org/abs/1503.08895

End-to-End Memory Networks is often used to solve this kind of problem:

Two input: X and Q
One output: O

For example, in question and answer problem:
X represents sentences, such as “this milk is nice.“, “i do not like this food.”
Q represents query phrases, such as “How about this milk?“,”Do you like this food?”
O represetns the probility of one sentence to one query phrase.

The basic strucuture of one layer End-to-End Memory Networks is:

The Memory Network can map inputs X and Q to output O with a f function.

O = f(X,Q)

To understand Memory Network, we analysis this network step by step.

1. How to map inputs X into a vector?

Given x_i represents each word in sentence “this milk is nice“, we can use a A(V * d) matrix to map each x_i to a memory vector, such as

where V is the size of word vocabulary, d is the vector dimension, x_i is the position index of each word in vocabulary.

Note:

(1) I checked some codes, here matrix A is a variable, it should be traind in model, which means the vector of each word x_i is not pretrained( such as by word2vec or glove)

The vector of each word x_i is learned by training model and they are sured after completing the model training

self.A = tf.Variable(tf.random_normal([self.nwords, self.edim], stddev=self.init_std))

(2) If you use pretraind vector of each word x_i created by word2vec. you can define variable A matrix and compute:
m_i = Ax_i
where x_iis a pretrained vector of each word.

2. How to map Q to a vector?

Like input X, we also can use a B(V * d) matrix to get u vector

where B is the same with A, it is also variable and learned by training netwok.

Note:

(1) q is a phrase, it also contains some words, to get u vector, you can average all vectors of words in q

(2) words in q are pretrained. We also can use B matrix to map q to u like

u = Bq

where q are vectors of words in phrase Q. to convert u to a vector, we can average, concentrate result.

3.One sentence contains some words, each word contrubutes different weight to the same q, how to compute these weight?

We can use a softmax function to compute.

where p_i represents the attention weight of each word in sentence X to the same input phrase Q.

Understand End-To-End Memory Networks – Part 1 – A Simple Tutorial for NLP Beginners

Note:

Note:

Leave a Reply Cancel reply