Softmax function is differentiable, however, if you get the gradient of it by tf.gradients(), you will get 0. In this tutorial, we will explain the reason for tensorflow beginners.
Look at example code below:
import tensorflow as tf import numpy as np z = tf.Variable(np.array([[1, 2],[3, 2]]), dtype = tf.float32) y = tf.nn.softmax(z, axis = 1) r = tf.gradients(y,z) init = tf.global_variables_initializer() init_local = tf.local_variables_initializer() with tf.Session() as sess: sess.run([init, init_local]) print(sess.run([y])) print(sess.run([r]))
Run this python code, you will get result like:
[array([[0.26894143, 0.7310586 ], [0.7310586 , 0.26894143]], dtype=float32)] [[array([[0., 0.], [0., 0.]], dtype=float32)]]
The value of tf.gradients() is 0.
Why the value of tf.gradients() is 0?
To understand the reason, you should understand these two topic:
As to code above, it can be expressed as:
The gradient of x00 based on y is computed by tf.gradient() as:
So the value of tf.gradients() on tf.nn.softmax() is 0.