KL Divergence can be defined as:
In tensorflow, we can use tf.distributions.kl_divergence() to compute it. However, it may report nan or inf error. Here is an explaination.
Kullback-Leibler Divergence to NaN or INF in TensorFlow
In order to avoid Nan or INF error, we can fix the tensor value before starting to compute. Here is an example:
Compute Kullback-Leibler Divergence in TensorFlow
However, in this tutorial, we will introduce another method to compute kl divergence, you can use it as a loss to train your model.
Preliminary
KL divergence between \(P\) and \(Q\) can be computed as:
\[D_{KL}(P||Q) = p(x)log\frac{p(x)}{q(x)} = p(x)logp(x)-p(x)logq(x)\]
Here \(P\) is “true” distribution.
Understand Kullback-Leibler Divergence – A Simple Tutorial for Beginners
\(-p(x)logq(x)\) is the cross entropy between \(P(x)\) and \(Q(x)\), which means we can compute kl divergence loss using cross entropy loss.
How to compute kl divergence loss in tensorflow?
Here is an example code:
def klx(true_p, q): #plogp-plogq true_prob = tf.nn.softmax(true_p, axis = 1) loss_1 = -tf.nn.softmax_cross_entropy_with_logits(logits=true_p, labels = true_prob) loss_2 = tf.nn.softmax_cross_entropy_with_logits(logits=q, labels = true_prob) loss = loss_1 + loss_2 return loss
In this example code, we use tf.nn.softmax_cross_entropy_with_logits() to compute kl divergence loss, we will evaluate it.
import numpy as np import tensorflow as tf def kl(x, y): X = tf.distributions.Categorical(probs=x) Y = tf.distributions.Categorical(probs=y) return tf.distributions.kl_divergence(X, Y) def klx(true_p, q): #plogp-plogq true_prob = tf.nn.softmax(true_p, axis = 1) loss_1 = -tf.nn.softmax_cross_entropy_with_logits(logits=true_p, labels = true_prob) loss_2 = tf.nn.softmax_cross_entropy_with_logits(logits=q, labels = true_prob) loss = loss_1 + loss_2 return loss a = np.array([[0.05, 0.16, 0.9], [0.8, 4.15, 0.05]]) b = np.array([[0.7, 0.2, 0.1], [0.15, 0.9, 0.05]]) aa = tf.convert_to_tensor(a, tf.float32) bb = tf.convert_to_tensor(b, tf.float32) a_s = tf.nn.softmax(aa, axis = 1) b_s = tf.nn.softmax(bb, axis = 1) sess = tf.Session() kl_v = kl(a_s, b_s) kl_v_2 = klx(aa, bb) init = tf.global_variables_initializer() init_local = tf.local_variables_initializer() with tf.Session() as sess: sess.run([init, init_local]) np.set_printoptions(precision=4, suppress=True) k1,k2= (sess.run([kl_v, kl_v_2])) print('k1=') print(k1) print('k2=') print(k2)
Run this code, we will get the output.
k1= [0.1879 0.4534] k2= [0.1879 0.4534]
k1 is same to k2, which mean our method is correct.