Understand End-To-End Memory Networks – Part 2 – A Simple Tutorial for NLP Beginners

By | May 30, 2019

4. How to compute output O?

In memory network, output o is computed as:

where ci is the vector of each word in sentence X.

5.How to compute ci?

Like get mi, we can compute ci with a matrix C(V*d)


(1) matrix C is like matrix A, it is a variable and trained in model. It represents vector of each word in vocabrary.

self.C = tf.Variable(tf.random_normal([self.nwords, self.edim], stddev=self.init_std)) # Embedding C for sentences

(2) If you use pretrained word vector, we can map ci like
ci = Cxi

6.How to compute model output and make classification?

It is computed as:

(1) why use o+u?

Because when test this network, we inputs X and Q

o contains feature belong to X based on Q

u contains feature belong to Q

so use o+u can enhance the result, however, we also can only use o, which contains information X and Q.

7.Relathon between matrix A and C?

In network, matrix A represents vector of each word with attention weight on Q

C is only vector of each word, do not contain other information and featrue.

In paper: https://arxiv.org/abs/1503.08895

A ≠ C

The key problem is pi, in this paper, matrix A contains word vector and word attention weight information based on Q.

but if pi is computed as


M is the attention matrix

Here we can set A=C in this model.

8.How to use multiple layers?

You must use new matrix A,B,C. Compute u as: