# Ultimate Guide to IndRNN: Handling Longer Sequence Than LSTM – LSTM Tutorial

By | November 12, 2021

IndRNN is proposed in paper: Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN. It can handle longer sequence than LSTM.

How Long Sequence Can be Processed Effectively by LSTM? – LSTM Tutorial

In this tutorial, we will use some examples to introduce this model.

There is a tensorflow project on IndRNN in github. It is:

https://github.com/batzner/indrnn

You can view this project to understand this model.

## The core of IndRNN

IndRNN modifies the computation of hidden state in RNN.

$$h_t=\sigma(Wx_t+u\odot h_{t-1}+b)$$

Here$$\odot$$ is the hadamard product.$$\sigma$$ is the activation function, for example relu().

## How about variables in IndRNN

In order to know how many variables in IndRNN, we will use an example to discuss.

Look at example below:

import tensorflow as tf
import numpy as np
from ind_rnn_cell import IndRNNCell
TIME_STEPS = 784
LAST_LAYER_LOWER_BOUND = pow(0.5, 1 / TIME_STEPS)
RECURRENT_MAX = pow(2, 1 / TIME_STEPS)
#5*4*10
batch = 64
inputs = tf.Variable(tf.truncated_normal([64, 40, 200], stddev=0.1), name="S")
input_init = tf.random_uniform_initializer(-0.001, 0.001)
recurrent_init = tf.random_uniform_initializer(LAST_LAYER_LOWER_BOUND,
RECURRENT_MAX)
cell = IndRNNCell(100,
recurrent_max_abs=RECURRENT_MAX,
input_kernel_initializer=input_init,
recurrent_kernel_initializer=recurrent_init)
# Unroll the layer
layer_output, _ = tf.nn.dynamic_rnn(cell, inputs,
dtype=tf.float32,
scope="rnn")

init = tf.global_variables_initializer()
init_local = tf.local_variables_initializer()
with tf.Session() as sess:
sess.run([init, init_local])
np.set_printoptions(precision=4, suppress=True)
a =sess.run(layer_output)
print(a.shape)
for n in tf.trainable_variables():
print(n)

In this example, the layer input is inputs, which is batch_size * sequence_length*dim, for example: 64*40*200.

Run this code, you will find:

the layer_output is: (64, 40, 100), it is the same to LSTM output.

The trainable variables in IndRNN are:

<tf.Variable 'rnn/ind_rnn_cell/input_kernel:0' shape=(200, 100) dtype=float32_ref>
<tf.Variable 'rnn/ind_rnn_cell/recurrent_kernel:0' shape=(100,) dtype=float32_ref>
<tf.Variable 'rnn/ind_rnn_cell/bias:0' shape=(100,) dtype=float32_ref>

We can find: there are only three variables in a IndRNN model. In this example, IndRNN will handle 40 length sequence, however, there are not 40*3 variables in it.

recurrent_max_abs limits the maximum value in hidden weight $$u$$.

For example:

    # Clip the absolute values of the recurrent weights to the specified maximum
if self._recurrent_max_abs:
self._recurrent_kernel = clip_ops.clip_by_value(self._recurrent_kernel,
-self._recurrent_max_abs,
self._recurrent_max_abs)

We should notice: the longer sequence , the smaller $$u$$.

Because:

LAST_LAYER_LOWER_BOUND = pow(0.5, 1 / TIME_STEPS)

Here TIME_STEPS is the sequence length.

## How about kernel initialization of IndRNN in multiple layers?

Here is an example:

RECURRENT_MAX = pow(2, 1 / TIME_STEPS)

for layer in range(1, NUM_LAYERS + 1):
# Init only the last layer's recurrent weights around 1
recurrent_init_lower = 0 if layer < NUM_LAYERS else LAST_LAYER_LOWER_BOUND
recurrent_init = tf.random_uniform_initializer(recurrent_init_lower,
RECURRENT_MAX)

We can find the kernel initialization of last IndRNN is [LAST_LAYER_LOWER_BOUND, RECURRENT_MAX]

LEARNING_RATE_INIT = 0.0002 