Difference Between tf.losses.sparse_softmax_cross_entropy() and tf.losses.softmax_cross_entropy() – TensorFlow Tutorial

By | January 6, 2022

In tensorflow, we can use tf.losses.sparse_softmax_cross_entropy() and tf.losses.softmax_cross_entropy() to compute cross entropy loss. What the difference between them? In this tutorial, we will introduce this topic.

tf.losses.sparse_softmax_cross_entropy()

The syntax of tf.losses.sparse_softmax_cross_entropy() is defined as:

tf.losses.sparse_softmax_cross_entropy(
labels,
logits,
weights=1.0,
scope=None,
loss_collection=tf.GraphKeys.LOSSES,
reduction=Reduction.SUM_BY_NONZERO_WEIGHTS
)

Parameters explained:

labels: the shape of it is [d_0, d_1, …, d_{r-1}], r is the rank of result. labels must be an index in [0, num_classes)

logits: Unscaled log probabilities of shape [d_0, d_1, …, d_{r-1}, num_classes]

For example: logits may be 32 * 10. 32 is the batch size. 10 is the class number.

tf.losses.softmax_cross_entropy()

The syntax of tf.losses.softmax_cross_entropy() is defined as:

tf.losses.softmax_cross_entropy(
onehot_labels,
logits,
weights=1.0,
label_smoothing=0,
scope=None,
loss_collection=tf.GraphKeys.LOSSES,
reduction=Reduction.SUM_BY_NONZERO_WEIGHTS
)

Parameters explained:

onehot_labels: the shape of onehot_labels is [batch_size, num_classes], it is one-hot-encoded labels.

logits: [batch_size, num_classes] logits outputs of the network. It is same to tf.losses.sparse_softmax_cross_entropy().

Difference between tf.losses.sparse_softmax_cross_entropy() and tf.losses.softmax_cross_entropy()

As above, we can find the difference between tf.losses.sparse_softmax_cross_entropy() and tf.losses.softmax_cross_entropy() is label parameter.

label is [batch_size] in tf.losses.sparse_softmax_cross_entropy()

label is [batch_size, num_classes] in tf.losses.softmax_cross_entropy()

We will use an example to show this difference.

import tensorflow as tf
import numpy as np

num_classes = 3
batch_size = 3

l = np.array([1, 0, 2])
label_1 = tf.convert_to_tensor(l, dtype = tf.int32)

l_2 = np.array([[0, 1, 0],[1, 0, 0], [0, 0, 1]])
label_2 = tf.convert_to_tensor(l_2, dtype = tf.int32)

# 3 * 3
logits = tf.convert_to_tensor(np.array([[0.1, 2, 3],[1, 2, 0], [0.4, 0, 1]]), dtype = tf.float32)

loss_1 = tf.losses.sparse_softmax_cross_entropy(labels = label_1, logits = logits)
loss_2 = tf.losses.softmax_cross_entropy(onehot_labels = label_2, logits = logits)

init = tf.global_variables_initializer()
init_local = tf.local_variables_initializer()
with tf.Session() as sess:
sess.run([init, init_local])
np.set_printoptions(precision=4, suppress=True)

l1 = sess.run(loss_1)
print("tf.losses.sparse_softmax_cross_entropy() loss")
print(l1)

l2 = sess.run(loss_2)
print("tf.losses.softmax_cross_entropy() loss")
print(l2)

This example we will use batch_size = 3, num_classes = 3 to test.

Run this code, we can find:

tf.losses.sparse_softmax_cross_entropy() loss
1.1369685
tf.losses.softmax_cross_entropy() loss
1.1369685

They are the same.