**4. How to compute output O?**

In memory network, output *o* is computed as:

where *c _{i}* is the vector of each word in sentence

*X*.

**5.How to compute c_{i}?**

Like get *m _{i}*, we can compute

*c*with a matrix

_{i}*C(V*d)*

## Notice:

(1) matrix *C* is like matrix *A*, it is a variable and trained in model. It represents vector of each word in vocabrary.

self.C = tf.Variable(tf.random_normal([self.nwords, self.edim], stddev=self.init_std)) # Embedding C for sentences

(2) If you use pretrained word vector, we can map *c _{i}* like

*c*

_{i}= Cx_{i}## 6.How to compute model output and make classification?

It is computed as:

## (1) why use o+u?

Because when test this network, we inputs *X* and *Q*

*o* contains feature belong to *X* based on *Q*

*u* contains feature belong to *Q*

so use* o+u* can enhance the result, however, we also can only use *o*, which contains information *X* and *Q*.

## 7.Relathon between matrix A and C?

In network, matrix *A* represents vector of each word with attention weight on *Q*

*C* is only vector of each word, do not contain other information and featrue.

In paper: https://arxiv.org/abs/1503.08895

*A ≠ C*

The key problem is *p _{i}*, in this paper, matrix

*A*contains word vector and word attention weight information based on

*Q*.

but if *p _{i}* is computed as

*p _{i}=softmax(u^{T}Mm_{i}) *

*M* is the attention matrix

Here we can set* A=C* in this model.

## 8.How to use multiple layers?

You must use new matrix *A,B,C*. Compute *u* as: