There are many models that have improved LSTM, GRU (Gated Recurrent Unit) is one of them. In this tutorial, we will introduce GRU and compare it with LSTM.
What is GRU?
The structure of GRU likes:
Compare GRU with LSTM
The formula of GRU and LSTM are below:
As to us, if we remove the output gate of LSTM and modify memory gate of LSTM, LSTM will be converted to GRU.
1. If we remove the output gate of LSTM, then
ht = tanh(ct)
Remove tanh(), then
ht = ct = ft•ct-1 + it•gt = ft•ht-1 + it•gt
Consider the GRU, we set ft = zt, then
ht = zt•ht-1 + it•gt
2. Modify the memory gate of LSTM
As to LSTM, we use a memory gate it to control how much information will be used in current lstm cell.
In current lstm cell, it•gt is the contiribution to xt based on ht-1.
However, we can set it = 1- zt as memory gate, which can simplify the LSTM.
ht = zt•ht-1 + (1 – zt)•gt
3. As to gt in LSTM and nt in GRU, we can find there is a rt gate in GRU, which is used to control how much information is passed into current GRU cell. I think this rt shoud be removed, because we have usezt to control the ht-1.
There is one problem, if we remove rt in GRU, the performance of GRU will be decreased?
The answer is not, you can read this tutorial.