There are many models that have improved LSTM, GRU (Gated Recurrent Unit) is one of them. In this tutorial, we will introduce GRU and compare it with LSTM.

**What is GRU?**

The structure of GRU likes:

**Compare GRU with LSTM**

The formula of GRU and LSTM are below:

LSTM |
GRU |

As to us, if we remove the output gate of LSTM and modify memory gate of LSTM, LSTM will be converted to GRU.

For example:

1. If we remove the output gate of LSTM, then

h^{t} = tanh(c^{t})

Remove tanh(), then

h^{t} = c^{t} = f^{t}•c^{t-1} + i^{t}•g^{t} = f^{t}•h^{t-1} + i^{t}•g^{t}

Consider the GRU, we set f^{t} = z^{t}, then

h^{t} = z^{t}•h^{t-1} + i^{t}•g^{t}

2. Modify the memory gate of LSTM

As to LSTM, we use a memory gate i^{t} to control how much information will be used in current lstm cell.

In current lstm cell, i^{t}•g^{t} is the contiribution to x^{t} based on h^{t-1}.

However, we can set i^{t} = 1- z^{t} as memory gate, which can simplify the LSTM.

h^{t} = z^{t}•h^{t-1} + (1 – z^{t})•g^{t}

3. As to g^{t} in LSTM and n^{t} in GRU, we can find there is a r^{t} gate in GRU, which is used to control how much information is passed into current GRU cell. I think this r^{t} shoud be removed, because we have usez^{t} to control the h^{t-1}.

There is one problem**, if we remove r ^{t} in GRU, the performance of GRU will be decreased?**

The answer is not, you can read this tutorial.

Can We Remove Reset Gate in GRU? Can It Decrease the Performance of GRU? – Deep Learning Tutorial