For those starting out in analytics, it can be quite confusing to understand the underlying difference between the terms ‘Correlation’ and ‘Dependency’. In statistics, when we talk about dependency, we are referring to any statistical relationship between two random variables or two sets of data. Correlation, on the other hand refers to any of a broad class of statistical relationships involving dependence. Let us further define these two terms:
Dependency: A variable whose value depends on the value assigned to another variable (independent variable).
Correlation: The relationship between two or more variables is considered as correlation. The correlation coefficient always assumes linear relationship regardless of whether that assumption is correct or not.
Example: Let’s consider a unit circle , non linear relation.
We can write the unit circle as
Now we can say that Y is a dependent variable.
Consider the values for X variable, as the unit circle takes points on from .
>x=c(-1,-0.8,-0.6,-0.4,-0.2,0,0.2,0.4,0.6,0.8,1)
Define y
>y=function(x){sqrt(1-x^2)}
The value of dependent variable y on each point of x is
Y=y(x)
[1] 0.0000000 0.6000000 0.8000000 0.9165151 0.9797959 1.0000000 0.9797959 0.9165151 0.8000000 0.6000000 0.0000000
> cor(x,Y)
[1] 0
Despite considering the dependent variables we arrive at NIL correlation. Which means, “A pair of variables which are perfectly dependent on each other, can also give you a zero Correlation.”
When we select negative points for variable:
> x1=c(-1,-0.8,-0.6,-0.4,-0.2)
> y1=y(x1)
> y1
[1] 0.0000000 0.6000000 0.8000000 0.9165151 0.9797959
> cor(x1,y1)
[1] 0.9090862
It gives positive correlation between
When we select non negative points for variable:
> x2=c(0,0.2,0.4,0.6,0.8,1)
> y2=y(x2)
> y2
[1] 1.0000000 0.9797959 0.9165151 0.8000000 0.6000000 0.0000000
> cor(x2,y2)
[1] -0.8789944
It gives negative correlation between
Thus we can conclude by saying that:
Correlation can be used to quantify the linear dependency of two variables. It cannot capture non-linear relationship between variables.
Independent variables has NIL correlation, r=0.
If r=0, indicates NIL correlation but not a non dependency (Independency), they can be dependent.
In other words variables which are perfectly dependent on each other, can also give you a zero Correlation.
If you found this article interesting and want to further understand correlation, take a look at the article Explaining Correlation to a Newbie to Data Analytics.
Image Courtesy: http://www.freedigitalphotos.net/
Suggested Read:
Explaining Correlation to a Newbie to Data Analytics
Why Missing or Incomplete Data is Crucial to the Data Analyst
PEOPLE ALSO READ
- PotpourriJigsaw Academy is the #1 Analytics Training Institute in India
- Cyber SecurityElliptic Curve Cryptography: An Overview
- Data ScienceHow Is Data Science Changing Web Design?
- Business AnalyticsBusiness Analytics – Way To Your Dream Career!
- Cyber SecurityData Science & Cyber Security: 5 Reasons Why Digital Economy Cannot Do Without Them
