KL (Kullback-Leibler) Divergence is defined as:
Here \(p(x)\) is the true distribution, \(q(x)\) is the approximate distribution.
Understand Kullback-Leibler Divergence – A Simple Tutorial for Beginners
If \(p(x)\) and \(q(x)\) are multivariate gaussian distributions, how about \(D_{KL}\)?
If \(p(x)\) is \(N(\mu,\sigma^2)\), \(q(x)\) is \(N(0,1)\), the result will be:
This method is often used in VAE model. Here is a tensorflow code example:
self.latent_loss = 0.5 * tf.reduce_sum(tf.square(z_mean) + tf.square(z_stddev) - tf.log(tf.square(z_stddev)) - 1,1)