In tensorflow, we can use tf.losses.sparse_softmax_cross_entropy() and tf.losses.softmax_cross_entropy() to compute cross entropy loss. What the difference between them? In this tutorial, we will introduce this topic.
tf.losses.sparse_softmax_cross_entropy()
The syntax of tf.losses.sparse_softmax_cross_entropy() is defined as:
tf.losses.sparse_softmax_cross_entropy( labels, logits, weights=1.0, scope=None, loss_collection=tf.GraphKeys.LOSSES, reduction=Reduction.SUM_BY_NONZERO_WEIGHTS )
Parameters explained:
labels: the shape of it is [d_0, d_1, …, d_{r-1}], r is the rank of result. labels must be an index in [0, num_classes)
logits: Unscaled log probabilities of shape [d_0, d_1, …, d_{r-1}, num_classes]
For example: logits may be 32 * 10. 32 is the batch size. 10 is the class number.
tf.losses.softmax_cross_entropy()
The syntax of tf.losses.softmax_cross_entropy() is defined as:
tf.losses.softmax_cross_entropy( onehot_labels, logits, weights=1.0, label_smoothing=0, scope=None, loss_collection=tf.GraphKeys.LOSSES, reduction=Reduction.SUM_BY_NONZERO_WEIGHTS )
Parameters explained:
onehot_labels: the shape of onehot_labels is [batch_size, num_classes], it is one-hot-encoded labels.
logits: [batch_size, num_classes] logits outputs of the network. It is same to tf.losses.sparse_softmax_cross_entropy().
Difference between tf.losses.sparse_softmax_cross_entropy() and tf.losses.softmax_cross_entropy()
As above, we can find the difference between tf.losses.sparse_softmax_cross_entropy() and tf.losses.softmax_cross_entropy() is label parameter.
label is [batch_size] in tf.losses.sparse_softmax_cross_entropy()
label is [batch_size, num_classes] in tf.losses.softmax_cross_entropy()
We will use an example to show this difference.
import tensorflow as tf import numpy as np num_classes = 3 batch_size = 3 l = np.array([1, 0, 2]) label_1 = tf.convert_to_tensor(l, dtype = tf.int32) l_2 = np.array([[0, 1, 0],[1, 0, 0], [0, 0, 1]]) label_2 = tf.convert_to_tensor(l_2, dtype = tf.int32) # 3 * 3 logits = tf.convert_to_tensor(np.array([[0.1, 2, 3],[1, 2, 0], [0.4, 0, 1]]), dtype = tf.float32) loss_1 = tf.losses.sparse_softmax_cross_entropy(labels = label_1, logits = logits) loss_2 = tf.losses.softmax_cross_entropy(onehot_labels = label_2, logits = logits) init = tf.global_variables_initializer() init_local = tf.local_variables_initializer() with tf.Session() as sess: sess.run([init, init_local]) np.set_printoptions(precision=4, suppress=True) l1 = sess.run(loss_1) print("tf.losses.sparse_softmax_cross_entropy() loss") print(l1) l2 = sess.run(loss_2) print("tf.losses.softmax_cross_entropy() loss") print(l2)
This example we will use batch_size = 3, num_classes = 3 to test.
Run this code, we can find:
tf.losses.sparse_softmax_cross_entropy() loss 1.1369685 tf.losses.softmax_cross_entropy() loss 1.1369685
They are the same.