Layer Normalization Effect in RNN, CNN and Feed-Forward Networks

Layer normalization is usually used in RNN and Batch Normalization is used in CNN. Here is a comparison between them.

Batch Normalization Vs Layer Normalization: The Difference Explained

In this tutorial, we will discuss the different effect of layer normalization in RNN, CNN and Feed-Forward Networks.

This comparative result can be foud in paper: Layer Normalization

Layer Normalization in RNN

From some NLP tasks with short and long zation on longer sequences, we can find: layer normalization not only trains faster but converges to a better validation result over both the baseline and BN, which means we should use layer normalization in rnn networks and batch normalization is not a good choice for rnn networks.

Layer Normalization in CNN

layer normalization offers a speedup over the baseline model without normalization, but batch normalization outperforms the other methods

It means batch normalization is better than layer normalization in CNN networks.

Layer Normalization in FFN

From permutation invariant MNIST classification experiments, we can find:

layer normalization is robust to the batch-sizes and exhibits a faster training convergence comparing to batch normalization
that is applied to all layers.

When batch size is large.

baseline<batch norm < layer norm

When batch size is small

batch norm < base line < layer norm