Batch normalization is proposed in paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In this tutorial, we will explain it for machine learning beginners.
What is Batch Normalization?
Batch Normalization aims to normalize a batch samples based on a normal distribution.
For example: There are 64 samples in a train step. Each sample is 1* 200, which mean we have a 64 * 200 matrix.
We can normalize this batch samples using batch normalization method.
\(y_i=\lambda(\frac{x_i-\mu}{\sqrt{\sigma^2+\epsilon}})+\beta\)
where \(\mu\) is the mean of samples, \(\sigma^2\) is the variance of samples, \(\lambda\) is the scale and \(\beta\) is the shift.
In order to know how to compute \(\mu\) and \(\sigma^2\), you can read:
Understand the Mean and Variance Computed in Batch Normalization – Machine Learning Tutorial
Batch Normalization implemented in pytorch and tensorflow
Batch Normalization implemented differently in pytorch and tensorflow, we compare them with table below:
pytorch | tensorflow | |
equation | \(y_i=\lambda(\frac{x_i-\mu}{\sqrt{\sigma^2}+\epsilon})+\beta\) | \(y_i=\lambda(\frac{x_i-\mu}{\sqrt{\sigma^2+\epsilon}})+\beta\) |
\(\epsilon\) | 1e-5 | 1e-3 |
momentun | 0.1 | 0.99 |
How to use batch normalization?
As to batch normalization, we should get the value of four variables. They are:
Variable | Description | How to get in tensorflow |
\(\mu\) | The mean of batch samples | tf.nn.moments() |
\(\sigma^2\) | The variance of samples | tf.nn.moments() |
\(\lambda\) | The scale | Learned by training |
\(\beta\) | The shift parameter | Learned by training |
We should notice:
if \(\lambda = 1\) and \(\beta = 0\), batch normalization is standardization
if \(\lambda = \sigma\) and \(\beta = \mu\), It means we will do not use batch normalization.
In order to use batch normalization in our model, we can view this tutorial: