For those starting out in analytics, it can be quite confusing to understand the underlying difference between the terms ‘Correlation’ and ‘Dependency’. In statistics, when we talk about dependency, we are referring to any statistical relationship between two random variables or two sets of data. Correlation, on the other hand refers to any of a broad class of statistical relationships involving dependence. Let us further define these two terms:
Dependency: A variable whose value depends on the value assigned to another variable (independent variable).
Correlation: The relationship between two or more variables is considered as correlation. The correlation coefficient always assumes linear relationship regardless of whether that assumption is correct or not.
Example: Let’s consider a unit circle , non linear relation.
We can write the unit circle as
Now we can say that Y is a dependent variable.
Consider the values for X variable, as the unit circle takes points on from .
>x=c(1,0.8,0.6,0.4,0.2,0,0.2,0.4,0.6,0.8,1)
Define y
>y=function(x){sqrt(1x^2)}
The value of dependent variable y on each point of x is
Y=y(x)
[1] 0.0000000 0.6000000 0.8000000 0.9165151 0.9797959 1.0000000 0.9797959 0.9165151 0.8000000 0.6000000 0.0000000
> cor(x,Y)
[1] 0
Despite considering the dependent variables we arrive at NIL correlation. Which means, “A pair of variables which are perfectly dependent on each other, can also give you a zero Correlation.”
When we select negative points for variable:
> x1=c(1,0.8,0.6,0.4,0.2)
> y1=y(x1)
> y1
[1] 0.0000000 0.6000000 0.8000000 0.9165151 0.9797959
> cor(x1,y1)
[1] 0.9090862
It gives positive correlation between
When we select non negative points for variable:
> x2=c(0,0.2,0.4,0.6,0.8,1)
> y2=y(x2)
> y2
[1] 1.0000000 0.9797959 0.9165151 0.8000000 0.6000000 0.0000000
> cor(x2,y2)
[1] 0.8789944
It gives negative correlation between
Thus we can conclude by saying that:
Correlation can be used to quantify the linear dependency of two variables. It cannot capture nonlinear relationship between variables.
Independent variables has NIL correlation, r=0.
If r=0, indicates NIL correlation but not a non dependency (Independency), they can be dependent.
In other words variables which are perfectly dependent on each other, can also give you a zero Correlation.
If you found this article interesting and want to further understand correlation, take a look at the article Explaining Correlation to a Newbie to Data Analytics.
Image Courtesy: http://www.freedigitalphotos.net/
Suggested Read:
Explaining Correlation to a Newbie to Data Analytics
Why Missing or Incomplete Data is Crucial to the Data Analyst
PEOPLE ALSO READ

PotpourriJigsaw Academy is the #1 Analytics Training Institute in India

Articles“I Would Recommend This Course To Anyone Who’s Interested In Pursuing Business Analytics” – That’s What Our Learners Say!

ArticlesChannel Your Inner Business Analyst With The Right Upskilling Program

ArticlesAI needs Diversity to reduce Gender and Racial Bias!

ArticlesWhen Is The Best Time To Build A Career In Data Science You Ask? – We Say NOW!