Understand the Gradient of Cross Entropy Loss Function – Machine Learning Tutorial

By | October 29, 2020

Cross entropy loss function is widely used in classification problem in machine learning. In this tutorial, we will discuss the gradient of it.

Cross entropy loss function

We often use softmax function for classification problem, cross entropy loss function can be defined as:

the equation of cross entropy loss

where \(L\) is the cross entropy loss function, \(y_i\) is the label.

For example, if we have 3 classes:

\(o = [2, 3, 4]\)

As to \(y = [0, 1, 0]\)

The softmax score is:

p= [0.090, 0.245, 0.665]

The cross entropy loss is:

the example of cross entropy loss

How to compute the gradient of cross entropy loss function?

In this part, we will introduce how to compute the gradient of cross entropy loss function. As to loss function L, the gradient of \(o_k\) is computed as:

the gradient of cross entropy loss

Here k is the correct class, the gradient of it is:

the gradient of cross entropy loss part 1

However, as to \(o_j\), it is not a correct class. The gradient of it is:

the gradient of cross entropy loss part 2

It means

the gradient of cross entropy loss part 3

As to example above, the gradient of cross entropy loss function is:

the gradient of cross entropy loss part 4

If you want to know how to compute the gradient of softmax function, you can read:

Understand Softmax Function Gradient: A Beginner Guide – Deep Learning Tutorial

Leave a Reply