For those starting out in analytics, it can be quite confusing to understand the underlying difference between the terms ‘Correlation’ and ‘Dependency’. In statistics, when we talk about dependency, we are referring to any statistical relationship between two random variables or two sets of data. Correlation, on the other hand refers to any of a broad class of statistical relationships involving dependence. Let us further define these two terms:

Dependency: A variable whose value depends on the value assigned to another variable (independent variable).

Correlation: The relationship between two or more variables is considered as correlation. The correlation coefficient always assumes linear relationship regardless of whether that assumption is correct or not.

Example: Let’s consider a unit circle    , non linear relation.

We can write the unit circle as

Now we can say that Y is a dependent variable.

Consider the values for X variable, as the unit circle takes points on  from .

>x=c(-1,-0.8,-0.6,-0.4,-0.2,0,0.2,0.4,0.6,0.8,1)

Define y

>y=function(x){sqrt(1-x^2)}

The value of dependent variable y on each point of x is

Y=y(x)

[1] 0.0000000 0.6000000 0.8000000 0.9165151 0.9797959 1.0000000 0.9797959 0.9165151 0.8000000 0.6000000 0.0000000

> cor(x,Y)

[1] 0

Despite considering the dependent variables we arrive at NIL correlation. Which means, “A pair of variables which are perfectly dependent on each other, can also give you a zero  Correlation.”

When we select negative points for variable:

> x1=c(-1,-0.8,-0.6,-0.4,-0.2)

> y1=y(x1)

> y1

[1] 0.0000000 0.6000000 0.8000000 0.9165151 0.9797959

> cor(x1,y1)

[1] 0.9090862

It gives positive correlation between

When we select non negative points for  variable:

> x2=c(0,0.2,0.4,0.6,0.8,1)

> y2=y(x2)

> y2

[1] 1.0000000 0.9797959 0.9165151 0.8000000 0.6000000 0.0000000

> cor(x2,y2)

[1] -0.8789944

It gives negative correlation between

Thus we can conclude by saying that:

Correlation can be used to quantify the linear dependency of two variables. It cannot capture non-linear relationship between variables.

Independent variables has NIL correlation, r=0.

If r=0, indicates NIL correlation but not a non dependency (Independency), they can be dependent.

In other words variables which are perfectly dependent on each other, can also give you a zero Correlation.

If you found this article interesting and want to further understand correlation, take a look at the article Explaining Correlation to a Newbie to Data Analytics.

Image Courtesy: http://www.freedigitalphotos.net/

Suggested Read:

Explaining Correlation to a Newbie to Data Analytics

Why Missing or Incomplete Data is Crucial to the Data Analyst

SHARE
share

Are you ready to build your own career?