Dropout vs Batch Normalization – Which is Better for Multilayered Neural Network

By | March 22, 2022

Dropout and batch normalization are two well-recognized approaches to tackle overfitting in multilayered neural networks, which one is better? In this tutorial, we will discuss this topic.

Paper Dropout vs. batch normalization: an empirical study of their impact to deep learning gives us some answers.

This paper use MLP and CNNs to evaluate the different performance of dropout and batch normalization.

As to MLP

Performance of dropout and batch normalization in MLP

We can find:

  • Batch normalization is worst, which means it is not a good idea to use batch normalization in MLP.
  • Training with dropout and batch normalization is slower, as expected. However, batch normalization turned out to be significantly slower, increasing training time by over
    80%

It is a good choice to use dropout in MLP?

Here also a comparative result.

MLP-NDNB: standard MLP

Performance of MLP

MLP-WDNB: MLP with dropout and without batch normalization

Performance of dropout in MLP

We can find dropout can improve the performance of MLP, which mean it is a good idea to use dropout in MLP.

As to CNNs

Here is the result.

Performance of dropout and batch normalization in CNN

We can find:

Only use batch normalization can improve the performance of CNNs, which means it is not a good idea to use dropout in CNNs.

From this paper, we can know:

For CNNs, the empirical study showed that:
Adding batch normalization improved accuracy without other observable side effects. Since it can be added without major structural changes to the network architecture,
adding batch normalization should be one of the first steps taken to optimize a CNN.
Increasing the learning rate, as recommended in the batch normalization paper
improves accuracy by 2% to 3%. Because this is a simple step to take, it should be done in the initial optimization steps, before investing time in more complex optimizations.
Adding dropout reduced accuracy significantly. This could be a deficiency of the experiments conducted here because other sources reported improvements when dropout was used. At a minimum, it is a cautionary sign that using dropout in CNNs require careful consideration. As a practical suggestion, one should consider removing all dropout layers from the network, re-validate and confirm that dropout does not deteriorate the peformance.

Leave a Reply