Layer Normalization is proposed by Jimmy Ba et al. in 2016. Here is the paper:
https://arxiv.org/abs/1607.06450
In this tutorial, we will introduce it for machine learning beginners.
Layer Normalization in Neural Networks
Layer Normalization is common used right now, for example, as to multi-head attention network.
Layer Normalization is applied in each layer.
What is Layer Normalization?
Layer Normalization can be viewed as:
It means yi = LN(xi)
In neural networks, The l-th layer can be computed as:
where wil is the weight matrix of l-th layer, bil is the bias, f is the activation function.
In order to normalize the l-th layer, we can normalize ail as follows:
where H denotes the number of hidden units in a layer. ε can be 0 or 1e-12. gl is a gain parameters. Θ is the element-wise multiplication between two vector.
You should notice: gl may be ignored if you do not want to scale normalization.
Layer Normalization in RNN
In RNN, the t-th time step can be normalized as:
How to implement layer normalizatin in tensorflow?
We can use tf.contrib.layers.layer_norm() to implement it.