Layer normalization is usually used in RNN and Batch Normalization is used in CNN. Here is a comparison between them.
Batch Normalization Vs Layer Normalization: The Difference Explained
In this tutorial, we will discuss the different effect of layer normalization in RNN, CNN and Feed-Forward Networks.
This comparative result can be foud in paper: Layer Normalization
Layer Normalization in RNN
From some NLP tasks with short and long zation on longer sequences, we can find: layer normalization not only trains faster but converges to a better validation result over both the baseline and BN, which means we should use layer normalization in rnn networks and batch normalization is not a good choice for rnn networks.
Layer Normalization in CNN
layer normalization offers a speedup over the baseline model without normalization, but batch normalization outperforms the other methods
It means batch normalization is better than layer normalization in CNN networks.
Layer Normalization in FFN
From permutation invariant MNIST classification experiments, we can find:
layer normalization is robust to the batch-sizes and exhibits a faster training convergence comparing to batch normalization
that is applied to all layers.
When batch size is large.
baseline<batch norm < layer norm
When batch size is small
batch norm < base line < layer norm