Nested LSTM network is one of improved LSTM model, which has better performance than classic LSTM. In this tutorial, we will introduce it for lstm network beginners.

## The classic LSTM network

The classic lstm network is defined as:

We should notice: the equations of LSTM above do not contain peephole connections. In order to see LSTM with peephole connections, you can view:

Understnd LSTM Peephole Connections: A Beginner Guide – LSTM Networks Tutorial

In order to improve LSTM, we can improve the equations of it.

As to nested lstm, it will improve equation (3) c_{t}.

As to equation (3) of LSTM, it can regard as:

c_{t} = f(c_{t-1}, x_{t}, h_{t-1})

In classic LSTM, f is add function.

## How does nested lstm improve classic lstm?

See equation (3) in classic LSTM, we can split it as below:

It means:

If f is add() function, c_{t} will be:

which is same to the hidden state of classic LSTM network.

However, if f is other functions? such as GRU, Stack LSTM or LSTM?

**If f is LSTM cell, the classic LSTM will be converted Nested LSTM.**

We have known a lstm cell receive three inputs( h_{t-1}, x_{t} and c_{t-1}) , the first hidden state is 0, and return two outputs (h_{t},c_{t})

As to nested lstm, we can set:

We can set the first hidden state of nested lstm to be zero

and

Then we can get the final output of h_{t} in nested lstm.

Here is the structure of nested lstm.

Meanwhile, we can stack multi layer inner lstm to get c_{t} in nested lstm.

You should notice: \(\sigma_c\) is identity function, not tanh function in paper.

https://arxiv.org/pdf/1801.10308.pdf