Understand GRU (Gated Recurrent Unit): Difference Between GRU and LSTM – Deep Learning Tutorial

By | December 28, 2020

There are many models that have improved LSTM, GRU (Gated Recurrent Unit) is one of them. In this tutorial, we will introduce GRU and compare it with LSTM.

What is GRU?

The structure of GRU likes:

The structure of GRU

Compare GRU with LSTM

The formula of GRU and LSTM are below:

The formula of LSTM The formula of GRU

We can find: GRU is also a LSTM, it only get different output from LSTM.

We can tell you this conclusion one by one.

1. Look at the output equation lstm.

\(h^t = o_t \odot tanh(c_t)\)

\(h^t\) is the lstm cell output. However, how about we use \(tanh(c_t)\) to be the output of the lstm cell?

2. Move the gate \(o_t\) to the input of next lstm cell

As to input in lstm cell, it is computed as:

\(g^t = tanh(W_{gx}x_t+W_{ch}h_{t-1}+b_g)\)

We use the output gate \(o_t\) to control the \(h_{t-1}\). The modified input can be:

\(g^t = tanh(W_{gx}x_t+W_{ch}( o_t \odot h_{t-1})+b_g)\)

This is the input of GRU.

However, GRU only consider the \(o_t\) when computing input. I think if we use \( o_t \odot h_{t-1}\) to replace \(h_{t-1}\)  and add a tanh for the output of GRU. GRU is also a lstm.

There is one problem, if we do not use \(o_t\) (it is rt in GRU), the performance of GRU will be decreased?

The answer is not, you can read this tutorial.

Can We Remove Reset Gate in GRU? Can It Decrease the Performance of GRU? – Deep Learning Tutorial