End-to-End Memory Networks: https://arxiv.org/abs/1503.08895

End-to-End Memory Networks is often used to solve this kind of problem:

**Two input:** *X* and *Q*

**One output:** *O*

For example, in question and answer problem:

*X* represents sentences, such as “*this milk is nice.*“, “*i do not like this food.*”

*Q* represents query phrases, such as “*How about this milk?*“,”*Do you like this food?*”

*O* represetns the probility of one sentence to one query phrase.

The basic strucuture of one layer End-to-End Memory Networks is:

The Memory Network can map inputs *X* and *Q* to output* O* with a *f* function.

*O = f(X,Q)*

To understand Memory Network, we analysis this network step by step.

**1. How to map inputs X into a vector?**

Given *x _{i} *represents each word in sentence “

*this milk is nice*“, we can use a

*A(V * d)*matrix to map each

*x*to a memory vector, such as

_{i}where *V* is the size of word vocabulary, *d* is the vector dimension, *x _{i}* is the position index of each word in vocabulary.

## Note:

(1) I checked some codes, here matrix* A* is a variable, it should be traind in model, which means the vector of each word *x _{i}* is not pretrained( such as by word2vec or glove)

The vector of each word *x _{i}* is learned by training model and they are sured after completing the model training

self.A = tf.Variable(tf.random_normal([self.nwords, self.edim], stddev=self.init_std))

(2) If you use pretraind vector of each word *x _{i}* created by word2vec. you can define variable

*A*matrix and compute:

*m*

_{i}= Ax_{i}where

*x*is a pretrained vector of each word.

_{i }**2. How to map Q to a vector?**

Like input *X*, we also can use a *B(V * d)* matrix to get *u* vector

where *B* is the same with *A*, it is also variable and learned by training netwok.

## Note:

(1) *q* is a phrase, it also contains some words, to get *u* vector, you can average all vectors of words in *q*

(2) words in *q* are pretrained. We also can use *B* matrix to map *q* to *u* like

*u = Bq*

where *q* are vectors of words in phrase *Q*. to convert *u* to a vector, we can average, concentrate result.

**3.One sentence contains some words, each word contrubutes different weight to the same q, how to compute these weight?**

We can use a *softmax* function to compute.

where *p _{i}* represents the attention weight of each word in sentence

*X*to the same input phrase Q.