# An Explain to Layer Normalization in Neural Networks – Machine Learning Tutorial

By | October 10, 2020

Layer Normalization is proposed by Jimmy Ba  et al. in 2016. Here is the paper:

https://arxiv.org/abs/1607.06450

In this tutorial, we will introduce it for machine learning beginners.

## Layer Normalization in Neural Networks

Layer Normalization is common used right now, for example, as to multi-head attention network.

Layer Normalization is applied in each layer.

## What is Layer Normalization?

Layer Normalization can be viewed as:

It means yi = LN(xi)

In neural networks, The l-th layer can be computed as:

where wil is the weight matrix of l-th layer, bil is the bias, f is the activation function.

In order to normalize the l-th layer,  we can normalize ail as follows:

where H denotes the number of hidden units in a layer. ε can be 0 or 1e-12. gl is a gain parameters. Θ is the element-wise multiplication between two vector.

You should notice: gl may be ignored if you do not want to scale normalization.

## Layer Normalization in RNN

In RNN, the t-th time step can be normalized as:

## How to implement layer normalizatin in tensorflow?

We can use tf.contrib.layers.layer_norm() to implement it.