Pearson Correlation Coefficient can measure the strength of the relationship between two variables. Here is a tutorial:
A Beginner Guide to Pearson Correlation Coefficient – Machine Learning Tutorial
We can use it as a loss to measure the correlation between two distributions in deep learning model. In this tutorial, we will create this loss function using tensorflow.
Preliminary
We will create two distributions in tensorflow.
import numpy as np import tensorflow as tf a = np.array([[0.15, 0.16, 0.9], [0.8, 4.15, 0.15]]) b = np.array([[0.7, 0.12, 0.1], [0.15, 0.19, 0.05]]) aa = tf.convert_to_tensor(a, tf.float32) bb = tf.convert_to_tensor(b, tf.float32)
\(aa\) and \(bb\) are two distributions, we will compute their pearson correlation coefficient loss.
Pearson Correlation Coefficient Loss
Similar to cosine distance loss, pearson correlation coefficient loss is defined as:
\(loss = 1 – p\)
\(p\) is pearson correlation coefficient.
How to compute pearson correlation coefficient loss in tensorflow?
We will create a function to calculate. Here is an example:
def pearson_r(y_true, y_pred): x = y_true y = y_pred mx = tf.reduce_mean(x, axis=1, keepdims=True) my = tf.reduce_mean(y, axis=1, keepdims=True) xm, ym = x - mx, y - my t1_norm = tf.nn.l2_normalize(xm, axis = 1) t2_norm = tf.nn.l2_normalize(ym, axis = 1) cosine = tf.losses.cosine_distance(t1_norm, t2_norm, axis = 1) return cosine
In this example, we will use cosine distance loss to compute pearson correlation coefficient loss. Here is the reason:
Then we can compute the pearson loss between \(aa\) and \(bb\).
a_s = pearson_r(aa, bb) init = tf.global_variables_initializer() init_local = tf.local_variables_initializer() with tf.Session() as sess: sess.run([init, init_local]) np.set_printoptions(precision=4, suppress=True) a = (sess.run(a_s)) print('a=') print(a)
Run this code, we will get the loss:
0.85890067
Evaluate our pearson correlation coefficient loss function
In order to make sure our function is correct, we will use scipy.stats.pearsonr() to evaluate our function.
Here is the example code:
from scipy.stats import pearsonr p1, _ = pearsonr(a[0,:], b[0,:]) p2, _ = pearsonr(a[1,:], b[1,:]) print(p1) print(p2) print(p1+p2) d = 1-(p1+p2)/2 print(d)
Run this code, \(d\) is:
0.8589005906554071
It is almost same to \(a_s\) in tensorflow, which means our function is correct.