A Beginner Guide to Pearson Correlation Coefficient – Machine Learning Tutorial

By | December 6, 2020

Pearson correlation coefficient aims to measure the strength of the relationship between two variables. In this tutorial, we will introduce it for machine learning beginners.

Pearson Correlation Coefficient

There are two types of pearson correlation coefficient: pearson correlation coefficient in population and pearson correlation coefficient in sample.

As to population, population correlation coefficient is defined as:

population correlation coefficient

Here \(cov(X, y)\) is the covariance of X and Y,\(\sigma_X\) and \(\sigma_Y\) are the standard deviation of X and Y.

As to sample, sample correlation coefficient is defined as:

pearson correlation coefficient in sample

Here \(n\) is the total number of a sample, \(\overline{X}\) and \(\overline{Y}\) are the mean of X and Y.

The value of pearson correlation coefficient

The value of pearson correlation coefficient is in [-1, 1]

  • -1: negative correlation
  • 0: no correlation
  • 1: positive correlation

Moreover, it can be viewed as:

  • .00-.19: very weak
  • .20-.39: weak
  • .40-.59: moderate
  • .60-.79: strong
  • .80-1.0: very strong

Here is an picture to show the correlation.

the correlation of different pearson correlation coefficient value