Layer Normalization is proposed in paper “Layer Normalization” in 2016, which aims to fix the problem of the effect of batch normalization is dependent on the mini-batch size and it is not obvious how to apply it to recurrent neural networks. In this tutorial, we will introduce what is layer normalization and how to use it.
Layer Normalization
Layer Normalization is defined as:
\(y_i=\lambda(\frac{x_i-\mu}{\sqrt{\sigma^2+\epsilon}})+\beta\)
It is similar to batch normalization. However, as to input \(x\), the normalize axis is different.
Here is an example to normalize the output of BiLSTM using layer normalization.
Normalize the Output of BiLSTM Using Layer Normalization
How to implement layer normalization in tensorflow?
There are two ways to implement:
- Use tf.contrib.layers.layer_norm() function
- Use tf.nn.batch_normalization() function
We will use an example to show you how to do.
import tensorflow as tf x1 = tf.convert_to_tensor( [[[18.369314, 2.6570225, 20.402943], [10.403599, 2.7813416, 20.794857]], [[19.0327, 2.6398268, 6.3894367], [3.921237, 10.761424, 2.7887821]], [[11.466338, 20.210938, 8.242946], [22.77081, 11.555874, 11.183836]], [[8.976935, 10.204252, 11.20231], [-7.356888, 6.2725096, 1.1952505]]]) mean_x, std_x = tf.nn.moments(x1, axes = 2, keep_dims=True) v1 = tf.nn.batch_normalization(x1, mean_x, std_x, None, None, 1e-12) v2 = tf.contrib.layers.layer_norm(inputs=x1, begin_norm_axis=-1, begin_params_axis=-1) with tf.Session() as sess1: sess1.run(tf.global_variables_initializer()) print(sess1.run(v1)) print(sess1.run(v2))
In this code, v1 is computed by tf.nn.batch_normalization(), v2 is computed by tf.contrib.layers.layer_norm(), we can find the results are the same.
[[[ 0.574993 -1.4064413 0.8314482 ] [-0.12501884 -1.1574404 1.2824591 ]] [[ 1.3801125 -0.95738953 -0.422723 ] [-0.5402142 1.4019756 -0.86176133]] [[-0.36398554 1.3654773 -1.0014919 ] [ 1.4136491 -0.67222667 -0.7414224 ]] [[-1.2645674 0.08396816 1.1806011 ] [-1.3146634 1.108713 0.20595042]]] [[[ 0.574993 -1.4064413 0.8314482 ] [-0.12501884 -1.1574404 1.2824591 ]] [[ 1.3801125 -0.95738953 -0.422723 ] [-0.5402142 1.4019756 -0.86176133]] [[-0.36398554 1.3654773 -1.0014919 ] [ 1.4136491 -0.67222667 -0.7414224 ]] [[-1.2645674 0.08396816 1.1806011 ] [-1.3146634 1.108713 0.20595042]]]
As to tf.contrib.layers.layer_norm() source code, we can find:
tf.contrib.layers.layer_norm() calls tf.nn.batch_normalization() to normalize a layer.
https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/contrib/layers/python/layers/layers.py