Pearson Correlation Coefficient can measure the strength of the relationship between two variables. Here is a tutorial:
A Beginner Guide to Pearson Correlation Coefficient – Machine Learning Tutorial
We can use it as a loss to measure the correlation between two distributions in deep learning model. In this tutorial, we will create this loss function using tensorflow.
We will create two distributions in tensorflow.
import numpy as np import tensorflow as tf a = np.array([[0.15, 0.16, 0.9], [0.8, 4.15, 0.15]]) b = np.array([[0.7, 0.12, 0.1], [0.15, 0.19, 0.05]]) aa = tf.convert_to_tensor(a, tf.float32) bb = tf.convert_to_tensor(b, tf.float32)
\(aa\) and \(bb\) are two distributions, we will compute their pearson correlation coefficient loss.
Pearson Correlation Coefficient Loss
Similar to cosine distance loss, pearson correlation coefficient loss is defined as:
\(loss = 1 – p\)
\(p\) is pearson correlation coefficient.
How to compute pearson correlation coefficient loss in tensorflow?
We will create a function to calculate. Here is an example:
def pearson_r(y_true, y_pred): x = y_true y = y_pred mx = tf.reduce_mean(x, axis=1, keepdims=True) my = tf.reduce_mean(y, axis=1, keepdims=True) xm, ym = x - mx, y - my t1_norm = tf.nn.l2_normalize(xm, axis = 1) t2_norm = tf.nn.l2_normalize(ym, axis = 1) cosine = tf.losses.cosine_distance(t1_norm, t2_norm, axis = 1) return cosine
In this example, we will use cosine distance loss to compute pearson correlation coefficient loss. Here is the reason:
Understand the Relationship Between Pearson Correlation Coefficient and Cosine Similarity – Machine Learning Tutorial
Then we can compute the pearson loss between \(aa\) and \(bb\).
a_s = pearson_r(aa, bb) init = tf.global_variables_initializer() init_local = tf.local_variables_initializer() with tf.Session() as sess: sess.run([init, init_local]) np.set_printoptions(precision=4, suppress=True) a = (sess.run(a_s)) print('a=') print(a)
Run this code, we will get the loss:
Evaluate our pearson correlation coefficient loss function
In order to make sure our function is correct, we will use scipy.stats.pearsonr() to evaluate our function.
Here is the example code:
from scipy.stats import pearsonr p1, _ = pearsonr(a[0,:], b[0,:]) p2, _ = pearsonr(a[1,:], b[1,:]) print(p1) print(p2) print(p1+p2) d = 1-(p1+p2)/2 print(d)
Run this code, \(d\) is:
It is almost same to \(a_s\) in tensorflow, which means our function is correct.