Tutorial Example

Compute SVD Gradient in TensorFlow After Replacing tf.svd() with numpy.linalg.svd() – TensorFlow Tutorial

It is very easy to compute svd gradient if we use tf.svd() to compute the singular value decomposition of a tensor, however, we often have to replace tf.svd() with numpy.linalg.svd(). There are two main reason:

Read More: Fix TensorFLow tf.svd() Run Slowly: A Beginner Guide

Read More: Solve tf.svd NaN bug with np.linalg.svd

We can use tf.py_func() to replace tf.svd() with numpy.linalg.svd(). However, we will find the gradient of svd is none.

Here is an example code:

u, s, v = tf.py_func(np.linalg.svd, [tensor, full_matrices, compute_uv],[dtype, dtype, dtype])

How to fix the problem: how to compute the gradient of svd after we have replaced tf.svd() with numpy.linalg.svd()?

There are three difficulties:

To understand tf.py_func()  you should read:

TensorFlow tf.py_func(): Run Python Function in TensorFlow Graph

To fix all problems, we can do like this:

Install python autograd package

pip install autograd

Python autograd package can allow us to compute the gradient of a numpy function automatically.

Use np_svd_in_tf() to compuate svd

def np_svd_in_tf(w, name = 'np_svd_in_tf'):    
    with tf.name_scope(name):
        def computeSVD(w):
            S = np.linalg.svd(w, compute_uv = False )
            #print(S)
            return S
    
        grad_svd = egrad(computeSVD)
        #print(grad_svd(np_w))
    
        @function.Defun()
        def op_grad(s, grad):
            return [tf.py_func(grad_svd, [s], tf.float32)]
    
        @function.Defun(grad_func=op_grad)
        def np_replaced_tf_svd(w):
            return tf.py_func(computeSVD, [w], tf.float32)
        return np_replaced_tf_svd(w)

np_svd_in_tf() can compute singular value and can process gradient.

We will use some examples to test it.

Create some tensors

    np_w = np.array([[[2,2,3,4,5],[6,7,2,9,0],[1,2,2,4,5],[6,2,8,9,0],[1,2,3,4,5]]], dtype = np.float32)
    w1 = tf.convert_to_tensor(np_w)
    
    w1 = tf.Variable(np.array([[2, 3, 5, 1, 3],[2, 3, 5, 1, 3]]), dtype = tf.float32)
    w2 = tf.Variable(np.array([[2, 2, 5],[2, 3, 5],[2, 3, 5], [2, 3, 5], [2, 3, 5]]), dtype = tf.float32)
    w3 = tf.matmul(w1, w2)
    w4 = tf.nn.softmax(w3, axis = 1)
    w4 = tf.reshape(w4,[-1, 2, 3])

Compute gradient of w2

Using tf.svd()

s = tf.svd(w4, compute_uv = False)
tf_svd_grad = tf.gradients(s, w2)

Use np_svd_in_tf()

s_in_np = np_svd_in_tf(w4)
svd_grad = tf.gradients(s_in_np,w2)

Test result

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print("tf.svd() s:\n")
    print(sess.run(s))
    print("np_replaced_tf_svd s:\n")
    print(sess.run(s_in_np))
    
    print("tf.svd() gradient:\n")
    print(sess.run(tf_svd_grad)[0])
    print("np_replaced_tf_svd gradient:\n")
    print(sess.run(svd_grad)[0])

The result is:

tf.svd() s:

[[1.4142135 0.       ]]
np_replaced_tf_svd s:

[[1.4142135 0.       ]]
tf.svd() gradient:

[[-1.6262104e-18 -2.6467354e-13  0.0000000e+00]
 [-2.4393155e-18 -3.9701031e-13  0.0000000e+00]
 [-4.0655261e-18 -6.6168382e-13  0.0000000e+00]
 [-8.1310520e-19 -1.3233677e-13  0.0000000e+00]
 [-2.4393155e-18 -3.9701031e-13  0.0000000e+00]]
np_replaced_tf_svd gradient:

[[-1.6262104e-18 -2.6467354e-13  0.0000000e+00]
 [-2.4393155e-18 -3.9701031e-13  0.0000000e+00]
 [-4.0655261e-18 -6.6168382e-13  0.0000000e+00]
 [-8.1310520e-19 -1.3233677e-13  0.0000000e+00]
 [-2.4393155e-18 -3.9701031e-13  0.0000000e+00]]

From the result we can find the effect of np_svd_in_tf() is same to tf.svd(). However, we also find the gradient of np_svd_in_tf() may be different from tf.svd().

More detail: SVD Gradient May Be Different in NumPy and TensorFlow

After using np_svd_in_tf(), we can find the speed of it when training our model.

It will takes about 10 minutes per batch, however, if you use tf.svd(), it may will take about 120 minutes.