Understand GRU (Gated Recurrent Unit): Difference Between GRU and LSTM – Deep Learning Tutorial

By | July 20, 2020

There are many models that have improved LSTM, GRU (Gated Recurrent Unit) is one of them. In this tutorial, we will introduce GRU and compare it with LSTM.

What is GRU?

The structure of GRU likes:

The structure of GRU

Compare GRU with LSTM

The formula of GRU and LSTM are below:

LSTM GRU
The formula of LSTM The formula of GRU

As to us, if we remove the output gate of LSTM and modify memory gate of LSTM, LSTM will be converted to GRU.

For example:

1. If we remove the output gate of LSTM, then

ht = tanh(ct)

Remove tanh(), then

ht = ct = ft•ct-1 + it•gt = ft•ht-1 + it•gt

Consider the GRU, we set ft = zt, then

ht = zt•ht-1 + it•gt

2. Modify the memory gate of LSTM

As to LSTM, we use a memory gate it to control how much information will be used in current lstm cell.

In current lstm cell, it•gt is the contiribution to xt based on ht-1.

However, we can set it = 1- zt as memory gate, which can simplify the LSTM.

ht = zt•ht-1 + (1 –  zt)•gt

3. As to gt in LSTM and nt in GRU, we can find there is a rt gate in GRU, which is used to control how much information is passed into current GRU cell. I think this rt shoud be removed, because we have usezt to control the ht-1.

There is one problem, if we remove rt in GRU, the performance of GRU will be decreased?

The answer is not, you can read this tutorial.

Can We Remove Reset Gate in GRU? Can It Decrease the Performance of GRU? – Deep Learning Tutorial

Leave a Reply

Your email address will not be published. Required fields are marked *