Understand Dropout - Place it Before or After Activation Function in Dense Layer?

In deep learning, we usually place a dropout layer after a dense layer. However, here is a problem? Dropout layer is placed before or after activation function.

Dropout vs non-linear activation function

Following this page: https://stats.stackexchange.com/questions/240305/where-should-i-place-dropout-layers-in-a-neural-network

we can find:

If the activation function is non-linear activation function, we should place dropout layer after the activation function.

Dropout vs linear activation function

However, if the activate function is linear, such as relu.

From page: https://sebastianraschka.com/faq/docs/dropout-activation.html, we can find:

(a): Fully connected, linear activation -> ReLU -> Dropout -> …
(b): Fully connected, linear activation -> Dropout -> ReLU -> …

(a): Fully connected, linear activation -> ReLU -> Dropout -> …
(b): Fully connected, linear activation -> Dropout -> ReLU -> …

(a): Fully connected, linear activation -> ReLU -> Dropout -> …
(b): Fully connected, linear activation -> Dropout -> ReLU -> …

The results are the same, which means dropout layer can be placed before or after relu activation function.

To implement dropout layer, you can read:

Understand TensorFlow tf.nn.dropout(): A Beginner Guide – TensorFlow Tutorial

tf.layers.dropout() vs tf.nn.dropout(): A Difference Introduction – TensorFlow Tutorial

Understand Dropout – Place it Before or After Activation Function in Dense Layer?

Dropout vs non-linear activation function

Dropout vs linear activation function

Leave a Reply Cancel reply