Compute SVD Gradient in TensorFlow After Replacing tf.svd() with numpy.linalg.svd() – TensorFlow Tutorial

admin

5 years ago

It is very easy to compute svd gradient if we use tf.svd() to compute the singular value decomposition of a tensor, however, we often have to replace tf.svd() with numpy.linalg.svd(). There are two main reason:

TensorFlow tf.svd() runs very slowly

TensorFlow tf.svd() may return NaN value

We can use tf.py_func() to replace tf.svd() with numpy.linalg.svd(). However, we will find the gradient of svd is none.

Here is an example code:

u, s, v = tf.py_func(np.linalg.svd, [tensor, full_matrices, compute_uv],[dtype, dtype, dtype])

How to fix the problem: how to compute the gradient of svd after we have replaced tf.svd() with numpy.linalg.svd()?

There are three difficulties:

We calculate singular value decomposition (u, s, v) in numpy.linalg.svd() by tf.py_func(). The tensor will be converted to numpy.ndarray type. We shoud compute the gradient of svd in numpy.

To understand tf.py_func() you should read:

TensorFlow tf.py_func(): Run Python Function in TensorFlow Graph

The formula of the gradient of svd is very complex
Even if you have computed the gradient of svd in numpy, how to compute it to tensor when running?

To fix all problems, we can do like this:

Install python autograd package

pip install autograd

Python autograd package can allow us to compute the gradient of a numpy function automatically.

Use np_svd_in_tf() to compuate svd

def np_svd_in_tf(w, name = 'np_svd_in_tf'):    
    with tf.name_scope(name):
        def computeSVD(w):
            S = np.linalg.svd(w, compute_uv = False )
            #print(S)
            return S
    
        grad_svd = egrad(computeSVD)
        #print(grad_svd(np_w))
    
        @function.Defun()
        def op_grad(s, grad):
            return [tf.py_func(grad_svd, [s], tf.float32)]
    
        @function.Defun(grad_func=op_grad)
        def np_replaced_tf_svd(w):
            return tf.py_func(computeSVD, [w], tf.float32)
        return np_replaced_tf_svd(w)

np_svd_in_tf() can compute singular value and can process gradient.

We will use some examples to test it.

Create some tensors

    np_w = np.array([[[2,2,3,4,5],[6,7,2,9,0],[1,2,2,4,5],[6,2,8,9,0],[1,2,3,4,5]]], dtype = np.float32)
    w1 = tf.convert_to_tensor(np_w)
    
    w1 = tf.Variable(np.array([[2, 3, 5, 1, 3],[2, 3, 5, 1, 3]]), dtype = tf.float32)
    w2 = tf.Variable(np.array([[2, 2, 5],[2, 3, 5],[2, 3, 5], [2, 3, 5], [2, 3, 5]]), dtype = tf.float32)
    w3 = tf.matmul(w1, w2)
    w4 = tf.nn.softmax(w3, axis = 1)
    w4 = tf.reshape(w4,[-1, 2, 3])

Compute gradient of w2

Using tf.svd()

s = tf.svd(w4, compute_uv = False)
tf_svd_grad = tf.gradients(s, w2)

Use np_svd_in_tf()

s_in_np = np_svd_in_tf(w4)
svd_grad = tf.gradients(s_in_np,w2)

Test result

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print("tf.svd() s:\n")
    print(sess.run(s))
    print("np_replaced_tf_svd s:\n")
    print(sess.run(s_in_np))
    
    print("tf.svd() gradient:\n")
    print(sess.run(tf_svd_grad)[0])
    print("np_replaced_tf_svd gradient:\n")
    print(sess.run(svd_grad)[0])

The result is:

tf.svd() s:

[[1.4142135 0.       ]]
np_replaced_tf_svd s:

[[1.4142135 0.       ]]
tf.svd() gradient:

[[-1.6262104e-18 -2.6467354e-13  0.0000000e+00]
 [-2.4393155e-18 -3.9701031e-13  0.0000000e+00]
 [-4.0655261e-18 -6.6168382e-13  0.0000000e+00]
 [-8.1310520e-19 -1.3233677e-13  0.0000000e+00]
 [-2.4393155e-18 -3.9701031e-13  0.0000000e+00]]
np_replaced_tf_svd gradient:

[[-1.6262104e-18 -2.6467354e-13  0.0000000e+00]
 [-2.4393155e-18 -3.9701031e-13  0.0000000e+00]
 [-4.0655261e-18 -6.6168382e-13  0.0000000e+00]
 [-8.1310520e-19 -1.3233677e-13  0.0000000e+00]
 [-2.4393155e-18 -3.9701031e-13  0.0000000e+00]]

From the result we can find the effect of np_svd_in_tf() is same to tf.svd(). However, we also find the gradient of np_svd_in_tf() may be different from tf.svd().

More detail: SVD Gradient May Be Different in NumPy and TensorFlow

After using np_svd_in_tf(), we can find the speed of it when training our model.

It will takes about 10 minutes per batch, however, if you use tf.svd(), it may will take about 120 minutes.