An Improvement to Densely Connected Bidirectional LSTM – LSTM Tutorial

By | November 3, 2021

Densely Connected Bidirectional LSTM (DC-BiLSTM) is proposed in paper:  Densely Connected Bidirectional LSTM with Applications to Sentence Classification. It can allow us to build deep neural networks with Bi-LSTM. In this tutorial, we will introduce it and give an improvement for it.

Preliminary

In order to build deep neural networks, we can use highway networks or residual networks.

A Beginner Introduction to Highway Networks – Machine Learning Tutorial

However, DC-BiLSTM avoid the vanishing-gradient and overfitting problem by concatenating input and output. It also can be used to build deep networks.

The structure of DC-BiLSTM

The structure of DC-BiLSTM is below:

The input and output of Bi-LSTM in DC-BiLSTM

As to a Bi-LSTM, the input of it is the concatenation of all previous Bi-LSTM outputs including the word embedding. The output of a Bi-LSTM is the same to conventional Bi-LSTM.

For example: as to third Bi-LSTM layer, its input is

$${[e(w_1, h^1_1, h^2_1], [e(w_2, h^1_2, h^2_2], …. , [e(w_s, h^1_s, h^2_s]}$$

How to improve DC-BiLSTM

As to input of Bi-LSTM in DC-BiLSTM, we can use attention mechanism to improve this input.

For example: as to the input of third Bi-LSTM, each input node contains 3 prevoius outputs. We do not concatenate them while using an attention method to merge them.

It looks like: