A Beginner Introduction to Highway Networks – Machine Learning Tutorial

By | December 23, 2020

Highway Networks is proposed in paper: Highway Networks. It is proposed based on LSTM. In this tutorial, we will introduce it for machine learning beginners.

First, we can compare feedforward and recurrent network.

For example:

compare feedforward and recurrent network

As to feedward network, the depth of network increases, the gradient may disappear. In order to fix this problem, we can use residual network.

However, as to RNN. We can use lstm to solve the gradient vanishing problem. LSTM use some gates to implement it. Based on this idea, we also can add some gates to deep feedward network to solve gradient vanishing problem.

The structure of LSTM

The structure of LSTM is below:

lstm inpute gate, forget gate and output gate

The most important gate of LSTM is forget gate.

Understand the Effect of LSTM Input Gate, Forget Gate and Output Gate – LSTM Network Tutorial

It is:

The formula of LSTM Network

Can we add a forget gate to feedward network to save previous hidden or output?

The answer is Yes.

Understand Long Short-Term Memory Network(LSTM) – LSTM Tutorial

Long Short-Term Memory Network Tutorials and Examples for Beginners

Highway Networks

Highway Networks adds a forget gate in feedward network to save previous hidden or output. It looks like:

The structure of highway networks

It is defined as:

\(g_T = \sigma(W_Tx + b_T )\)
\(g_C = \sigma(W_Cx + b_C )\)
\(y = x \odot g_C + tanh(Wx + b) \odot g_T\)


\(g_T = \sigma(W_Tx + b_T )\)
\(y = x \odot (1-g_T) + tanh(Wx + b) \odot g_T\)

Here \(tanh(Wx+b)\) is a feedward network, \(x\) is the input.


Highway Networks is useful to deep and same structure network. Otherwise, it may be worse than feedward network.

Leave a Reply

Your email address will not be published. Required fields are marked *