Implement KL Divergence Loss in TensorFlow – TensorFlow Tutorial

By | February 23, 2021

KL Divergence  can be defined as:

In tensorflow, we can use tf.distributions.kl_divergence() to compute it. However, it may report nan or inf error. Here is an explaination.

Kullback-Leibler Divergence to NaN or INF in TensorFlow

In order to avoid Nan or INF error, we can fix the tensor value before starting to compute. Here is an example:

Compute Kullback-Leibler Divergence in TensorFlow

However, in this tutorial, we will introduce another method to compute kl divergence, you can use it as a loss to train your model.

Preliminary

KL divergence between \(P\) and \(Q\) can be computed as:

\[D_{KL}(P||Q) = p(x)log\frac{p(x)}{q(x)} = p(x)logp(x)-p(x)logq(x)\]

Here \(P\) is “true” distribution.

Understand Kullback-Leibler Divergence – A Simple Tutorial for Beginners

\(-p(x)logq(x)\) is the cross entropy between \(P(x)\) and \(Q(x)\), which means we can compute kl divergence loss using cross entropy loss.

How to compute kl divergence loss in tensorflow?

Here is an example code:

def klx(true_p, q):
    #plogp-plogq
    true_prob = tf.nn.softmax(true_p, axis = 1)
    loss_1 = -tf.nn.softmax_cross_entropy_with_logits(logits=true_p, labels = true_prob)
    loss_2 = tf.nn.softmax_cross_entropy_with_logits(logits=q, labels = true_prob)   
    loss = loss_1 + loss_2
    return loss

In this example code, we use tf.nn.softmax_cross_entropy_with_logits() to compute kl divergence loss, we will evaluate it.

import numpy as np
import tensorflow as tf

def kl(x, y):
    X = tf.distributions.Categorical(probs=x)
    Y = tf.distributions.Categorical(probs=y)
    return tf.distributions.kl_divergence(X, Y)


def klx(true_p, q):
    #plogp-plogq
    true_prob = tf.nn.softmax(true_p, axis = 1)
    loss_1 = -tf.nn.softmax_cross_entropy_with_logits(logits=true_p, labels = true_prob)
    loss_2 = tf.nn.softmax_cross_entropy_with_logits(logits=q, labels = true_prob)   
    loss = loss_1 + loss_2
    return loss
a = np.array([[0.05, 0.16, 0.9], [0.8, 4.15, 0.05]])
b = np.array([[0.7, 0.2, 0.1], [0.15, 0.9, 0.05]])


aa = tf.convert_to_tensor(a, tf.float32)
bb = tf.convert_to_tensor(b, tf.float32)

a_s = tf.nn.softmax(aa, axis = 1)

b_s = tf.nn.softmax(bb, axis = 1)
sess = tf.Session()
kl_v = kl(a_s, b_s)
kl_v_2 = klx(aa, bb)

init = tf.global_variables_initializer() 
init_local = tf.local_variables_initializer()
with tf.Session() as sess:
    sess.run([init, init_local])
    np.set_printoptions(precision=4, suppress=True)
   
    k1,k2= (sess.run([kl_v, kl_v_2]))
   
    print('k1=')
    print(k1)
    print('k2=')
    print(k2)

Run this code, we will get the output.

k1=
[0.1879 0.4534]
k2=
[0.1879 0.4534]

k1 is same to k2, which mean our method is correct.

Leave a Reply