Can Apply a Dropout Layer to Softmax Layer in Neural Networks

Softmax layer is often used as a output layer of a neural networks model. A standard structure of neural networks model with softmax as output is:

But to void overfitting in neural networks, can we apply a dropout layer to softmax layer?

The answer is not if softmax is the output layer.

Look at image below:

If you apply a dropout to softmax layer, you may get only two output not five. As to loss function, less output will minimum the loss function of model.

In order to use dropout layer to prevent overfitting in neural networks, you should use it in front of ouput layer.

As to image above, we can apply a dropout before logits layer.

In some papers, we may find this answer:

Dropout regularization (Srivastava et al., 2014) is employed on the final MLP layer, with dropout rate 0.5.

An example code is:

logits = layers.linear(tf.nn.dropout(H_dis, keep_prob=dropout), num_outputs=num_outputs,
                           biases_initializer=biasInit, scope=prefix + 'dis_2', reuse=is_reuse)
return logits