An Explain to Layer Normalization in Neural Networks – Machine Learning Tutorial

By | October 10, 2020

Layer Normalization is proposed by Jimmy Ba  et al. in 2016. Here is the paper:

https://arxiv.org/abs/1607.06450

In this tutorial, we will introduce it for machine learning beginners.

Layer Normalization in Neural Networks

Layer Normalization is common used right now, for example, as to multi-head attention network.

Layer Normalization in multi-head attention networks

Layer Normalization is applied in each layer.

What is Layer Normalization?

Layer Normalization can be viewed as:

Layer Normalization Structure

It means yi = LN(xi)

In neural networks, The l-th layer can be computed as:

The equation of l-th layer neural networks

where wil is the weight matrix of l-th layer, bil is the bias, f is the activation function.

In order to normalize the l-th layer,  we can normalize ail as follows:

The equation of layer normalization

where H denotes the number of hidden units in a layer. ε can be 0 or 1e-12. gl is a gain parameters. Θ is the element-wise multiplication between two vector.

You should notice: gl may be ignored if you do not want to scale normalization.

Layer Normalization in RNN

In RNN, the t-th time step can be normalized as:

Layer Normalization in RNN

How to implement layer normalizatin in tensorflow?

We can use tf.contrib.layers.layer_norm() to implement it.

Leave a Reply