Densely Connected Bidirectional LSTM (DC-BiLSTM) is proposed in paper: Densely Connected Bidirectional LSTM with Applications to Sentence Classification. It can allow us to build deep neural networks with Bi-LSTM. In this tutorial, we will introduce it and give an improvement for it.
Preliminary
In order to build deep neural networks, we can use highway networks or residual networks.
A Beginner Introduction to Highway Networks – Machine Learning Tutorial
However, DC-BiLSTM avoid the vanishing-gradient and overfitting problem by concatenating input and output. It also can be used to build deep networks.
The structure of DC-BiLSTM
The structure of DC-BiLSTM is below:
The input and output of Bi-LSTM in DC-BiLSTM
As to a Bi-LSTM, the input of it is the concatenation of all previous Bi-LSTM outputs including the word embedding. The output of a Bi-LSTM is the same to conventional Bi-LSTM.
For example: as to third Bi-LSTM layer, its input is
\({[e(w_1, h^1_1, h^2_1], [e(w_2, h^1_2, h^2_2], …. , [e(w_s, h^1_s, h^2_s]}\)
How to improve DC-BiLSTM
As to input of Bi-LSTM in DC-BiLSTM, we can use attention mechanism to improve this input.
For example: as to the input of third Bi-LSTM, each input node contains 3 prevoius outputs. We do not concatenate them while using an attention method to merge them.
It looks like: