# Correlation in Statistics

## Correlation

Correlation measures  the relation between two variables  that how they are related.  And is denoted by r and  ρ moreover, the correlation quantifies the level of relationship between -1 to +1. If the value of correlation  r is -1 then there is perfect negative relationship. If value of  correlation is  +1  then there is positive correlation between variables.

Important Points

• r=-1   Perfect Negative Correlation
• r=0    No Correlation
• r=1    Perfect Positive Correlation
• r lies between -1 and +1

## Pearson Correlation-

Named after Karl Pearson it is the most widely used formula for correlation coefficient. If there are two variables X and Y having N instances. Then the correlation coefficient r is given in formula. Calculation of Pearson’s Correlation Coefficient in Python

Let there be  two variables X and Y

The values of  X and Y are

X = [40,46,55,60,70,75,78,80 , 85, 95] Y = [40,46,55,60,70,75, 78,80 , 85, 95]

Then Python’s code for computation of correlation is

#Computation of Pearson’s Correlation Coefficient
from scipy.stats import pearsonr
from matplotlib import pyplot
X = [40,46,55,60,70,75,78,80 , 85, 95]
Y = [40,46,55,60,70,75, 78,80 , 85, 95]
# pearsonr(X,Y) Calculates Pearson’s Correlation Coefficient
r= pearsonr(X,Y)
print(“Pearson’s Correlation Coefficient”, r)
pyplot.scatter(X,Y)
pyplot.savefig(“pearsonr.png”)

The output of the program would be

Output:  Pearson’s Correlation Coefficient (1.0, 0.0)

Scatter plot for the data is

From the above scatter diagram you can observed that there is perfect positive correlation between X and Y variables.

This is due to X and Y having the same values.

Again consider the data set

X = [40,46,55,60,70,75,78,80 , 85, 95] Y= [95,85,80,78,75,70,60,55,46,40]

And Corresponding Python’s Code

#Computation of Pearson’s Correlation Coefficient
from scipy.stats import pearsonr
from matplotlib import pyplot
X = [40,46,55,60,70,75,78,80 , 85, 95]
Y= [95,85,80,78,75,70,60,55,46,40]
# pearsonr(X,Y) Calculates Pearson’s Correlation Coefficient
r= pearsonr(X,Y)
print(“Pearson’s Correlation Coefficient”, r)
pyplot.scatter(X,Y)
pyplot.savefig(“negativepearsonr.png”)

The output of the program would be

Pearson’s Correlation Coefficient (-0.9613416714042071, 9.325227687014438e-06)

The the scatter plot of the data is

You can   observed that I have just reversed the data and then relation has become negatively correlated.

## Spearman Rank Correlation

Spearman’n rank correlation is used for qualitative data. The first step is to convert qualitative comparative data into rank. Then apply the following formula.

Set of values of R1 and R2 are

R1 = [3,5,8,10,15,26,30,36,40,42] R2 = [3,5,8,10,15,26,30,36,40,42]

Python’s Code for Calculation Spearman’s Rank Correlation Coefficient

#Spearman’s Correlation Coefficient
from scipy.stats import spearmanr
from matplotlib import pyplot
R1 = [3,5,8,10,15,26,30,36,40,42]
R2 = [3,5,8,10,15,26,30,36,40,42]
# spearmanr(R1,R2) Calculates Spearman’s Rank Correlation Coefficient
r= spearmanr(R1,R2)
print(“Spearman’s Correlation Coefficient”, r)
pyplot.scatter(R1,R2)
pyplot.savefig(“spearmanr.png”)

Output of the program would be

Output : Spearman’s Correlation Coefficient SpearmanrResult(correlation=0.9999999999999999, pvalue=6.646897422032013e-64)

And scatter plot is

Furthermore, If I reverse R2 then the plot will be

The Python’s code is

#Spearman’s Correlation Coefficient
from scipy.stats import spearmanr
from matplotlib import pyplot
R1 = [3,5,8,10,15,26,30,36,40,42]
R2= [42,40,36,30,26,15,10,8,5,3]
# spearmanr(R1,R2) Calculates Spearman’s Rank Correlation Coefficient
r= spearmanr(R1,R2)
print(“Spearman’s Correlation Coefficient”, r)
pyplot.scatter(R1,R2)
pyplot.savefig(“negcorspearmanr.png”)

The output of the program would be

Output : Spearman’s Correlation Coefficient SpearmanrResult(correlation=-0.9999999999999999, pvalue=6.646897422032013e-64)

And corresponding scatter plot is

Correlation is very important topic in machine learning, statistics and data science. It helps to find out relationship in a data set. In this post, I have explained  two popular method for correlation computation. Hope you will understand and apply.

References-