Understand Nested LSTM Network: A Beginner Guide – LSTM Network Tutorial

By | October 20, 2020

Nested LSTM network is one of improved LSTM model, which has better performance than classic LSTM. In this tutorial, we will introduce it for lstm network beginners.

The classic LSTM network

The classic lstm network is defined as:

The formula of LSTM Network

We should notice: the equations of LSTM above do not contain peephole connections. In order to see LSTM with peephole connections, you can view:

Understnd LSTM Peephole Connections: A Beginner Guide – LSTM Networks Tutorial

In order to improve LSTM, we can improve the equations of it.

As to nested lstm, it will improve equation (3) ct.

As to equation (3) of LSTM, it can regard as:

ct = f(ct-1, xt, ht-1)

In classic LSTM, f is add function.

How does nested lstm improve classic lstm?

See equation (3) in classic LSTM, we can split it as below:

improve lstm forget gate and input gate

It means:

the equation of lstm hidden state

If f is add() function, ct will be:

lstm hidden state formula

which is same to the hidden state of classic LSTM network.

However, if f is other functions? such as GRU, Stack LSTM or LSTM?

If f is LSTM cell, the classic LSTM will be converted Nested LSTM.

We have known a lstm cell receive three inputs( ht-1, xt and ct-1) , the first hidden state is 0, and return two outputs (ht,ct)

As to nested lstm, we can set:

the formula of nested lstm

We can set the first hidden state of nested lstm to be zero


the improved hidden state of lstm network

Then we can get the final output of ht in nested lstm.

Here is the structure of nested lstm.

the structure of Nested LSTM Network

Meanwhile, we can stack multi layer inner lstm to get ct in nested lstm.

You should notice: \(\sigma_c\) is identity function, not tanh function in paper.


Leave a Reply

Your email address will not be published. Required fields are marked *