Batch normalization is proposed in paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In this tutorial, we will explain it for machine learning beginners.
What is Batch Normalization?
Batch Normalization aims to normalize a batch samples based on a normal distribution.
For example: There are 64 samples in a train step. Each sample is 1* 200, which mean we have a 64 * 200 matrix.
We can normalize this batch samples using batch normalization method.
where \(\mu\) is the mean of samples, \(\sigma^2\) is the variance of samples, \(\lambda\) is the scale and \(\beta\) is the shift.
In order to know how to compute \(\mu\) and \(\sigma^2\), you can read:
Batch Normalization implemented in pytorch and tensorflow
Batch Normalization implemented differently in pytorch and tensorflow, we compare them with table below:
How to use batch normalization?
As to batch normalization, we should get the value of four variables. They are:
|Variable||Description||How to get in tensorflow|
|\(\mu\)||The mean of batch samples||tf.nn.moments()|
|\(\sigma^2\)||The variance of samples||tf.nn.moments()|
|\(\lambda\)||The scale||Learned by training|
|\(\beta\)||The shift parameter||Learned by training|
We should notice:
if \(\lambda = 1\) and \(\beta = 0\), batch normalization is standardization
if \(\lambda = \sigma\) and \(\beta = \mu\), It means we will do not use batch normalization.
In order to use batch normalization in our model, we can view this tutorial: