It is very easy to compute svd gradient if we use tf.svd() to compute the singular value decomposition of a tensor, however, we often have to replace tf.svd() with numpy.linalg.svd(). There are two main reason:
- TensorFlow tf.svd() runs very slowly
Read More: Fix TensorFLow tf.svd() Run Slowly: A Beginner Guide
- TensorFlow tf.svd() may return NaN value
Read More: Solve tf.svd NaN bug with np.linalg.svd
We can use tf.py_func() to replace tf.svd() with numpy.linalg.svd(). However, we will find the gradient of svd is none.
Here is an example code:
u, s, v = tf.py_func(np.linalg.svd, [tensor, full_matrices, compute_uv],[dtype, dtype, dtype])
How to fix the problem: how to compute the gradient of svd after we have replaced tf.svd() with numpy.linalg.svd()?
There are three difficulties:
- We calculate singular value decomposition (u, s, v) in numpy.linalg.svd() by tf.py_func(). The tensor will be converted to numpy.ndarray type. We shoud compute the gradient of svd in numpy.
To understand tf.py_func() you should read:
TensorFlow tf.py_func(): Run Python Function in TensorFlow Graph
- The formula of the gradient of svd is very complex
- Even if you have computed the gradient of svd in numpy, how to compute it to tensor when running?
To fix all problems, we can do like this:
Install python autograd package
pip install autograd
Python autograd package can allow us to compute the gradient of a numpy function automatically.
Use np_svd_in_tf() to compuate svd
def np_svd_in_tf(w, name = 'np_svd_in_tf'): with tf.name_scope(name): def computeSVD(w): S = np.linalg.svd(w, compute_uv = False ) #print(S) return S grad_svd = egrad(computeSVD) #print(grad_svd(np_w)) @function.Defun() def op_grad(s, grad): return [tf.py_func(grad_svd, [s], tf.float32)] @function.Defun(grad_func=op_grad) def np_replaced_tf_svd(w): return tf.py_func(computeSVD, [w], tf.float32) return np_replaced_tf_svd(w)
np_svd_in_tf() can compute singular value and can process gradient.
We will use some examples to test it.
Create some tensors
np_w = np.array([[[2,2,3,4,5],[6,7,2,9,0],[1,2,2,4,5],[6,2,8,9,0],[1,2,3,4,5]]], dtype = np.float32) w1 = tf.convert_to_tensor(np_w) w1 = tf.Variable(np.array([[2, 3, 5, 1, 3],[2, 3, 5, 1, 3]]), dtype = tf.float32) w2 = tf.Variable(np.array([[2, 2, 5],[2, 3, 5],[2, 3, 5], [2, 3, 5], [2, 3, 5]]), dtype = tf.float32) w3 = tf.matmul(w1, w2) w4 = tf.nn.softmax(w3, axis = 1) w4 = tf.reshape(w4,[-1, 2, 3])
Compute gradient of w2
Using tf.svd()
s = tf.svd(w4, compute_uv = False) tf_svd_grad = tf.gradients(s, w2)
Use np_svd_in_tf()
s_in_np = np_svd_in_tf(w4) svd_grad = tf.gradients(s_in_np,w2)
Test result
with tf.Session() as sess: sess.run(tf.global_variables_initializer()) print("tf.svd() s:\n") print(sess.run(s)) print("np_replaced_tf_svd s:\n") print(sess.run(s_in_np)) print("tf.svd() gradient:\n") print(sess.run(tf_svd_grad)[0]) print("np_replaced_tf_svd gradient:\n") print(sess.run(svd_grad)[0])
The result is:
tf.svd() s: [[1.4142135 0. ]] np_replaced_tf_svd s: [[1.4142135 0. ]] tf.svd() gradient: [[-1.6262104e-18 -2.6467354e-13 0.0000000e+00] [-2.4393155e-18 -3.9701031e-13 0.0000000e+00] [-4.0655261e-18 -6.6168382e-13 0.0000000e+00] [-8.1310520e-19 -1.3233677e-13 0.0000000e+00] [-2.4393155e-18 -3.9701031e-13 0.0000000e+00]] np_replaced_tf_svd gradient: [[-1.6262104e-18 -2.6467354e-13 0.0000000e+00] [-2.4393155e-18 -3.9701031e-13 0.0000000e+00] [-4.0655261e-18 -6.6168382e-13 0.0000000e+00] [-8.1310520e-19 -1.3233677e-13 0.0000000e+00] [-2.4393155e-18 -3.9701031e-13 0.0000000e+00]]
From the result we can find the effect of np_svd_in_tf() is same to tf.svd(). However, we also find the gradient of np_svd_in_tf() may be different from tf.svd().
More detail: SVD Gradient May Be Different in NumPy and TensorFlow
After using np_svd_in_tf(), we can find the speed of it when training our model.
It will takes about 10 minutes per batch, however, if you use tf.svd(), it may will take about 120 minutes.