# Correlation and Dependency. Aren’t They One and the Same?

For those starting out in analytics, it can be quite confusing to understand the underlying difference between the terms ‘Correlation’ and ‘Dependency’. In statistics, when we talk about dependency, we are referring to any statistical relationship between two random variables or two sets of data. Correlation, on the other hand refers to any of a broad class of statistical relationships involving dependence. Let us further define these two terms:

**Dependency**: A variable whose value depends on the value assigned to another variable (independent variable).

**Correlation:** The relationship between two or more variables is considered as correlation. The correlation coefficient always assumes *linear relationship* regardless of whether that assumption is correct or not.

Example: Let’s consider a unit circle , non linear relation.

We can write the unit circle as

Now we can say that Y is a dependent variable.

Consider the values for X variable, as the unit circle takes points on from .

>x=c(-1,-0.8,-0.6,-0.4,-0.2,0,0.2,0.4,0.6,0.8,1)

Define y

>y=function(x){sqrt(1-x^2)}

The value of dependent variable y on each point of x is

Y=y(x)

[1] 0.0000000 0.6000000 0.8000000 0.9165151 0.9797959 1.0000000 0.9797959 0.9165151 0.8000000 0.6000000 0.0000000

> cor(x,Y)

[1] 0

Despite considering the dependent variables we arrive at NIL correlation. Which means, *“A pair of variables which are perfectly dependent on each other, can also give you a zero **Correlation.”*

When we select negative points for variable:

> x1=c(-1,-0.8,-0.6,-0.4,-0.2)

> y1=y(x1)

> y1

[1] 0.0000000 0.6000000 0.8000000 0.9165151 0.9797959

> cor(x1,y1)

[1] 0.9090862

*It gives positive correlation between *

When we select non negative points for variable:

> x2=c(0,0.2,0.4,0.6,0.8,1)

> y2=y(x2)

> y2

[1] 1.0000000 0.9797959 0.9165151 0.8000000 0.6000000 0.0000000

> cor(x2,y2)

[1] -0.8789944

*It gives negative correlation between *

Thus we can conclude by saying that:

*Correlation can be used to quantify the linear dependency of two variables. It cannot capture non-linear relationship between variables.*

*Independent variables has NIL correlation, r=0. *

*If r=0, indicates NIL correlation but not a non dependency (Independency), they can be dependent.*

*In other words variables which are perfectly dependent on each other, can also give you a zero Correlation.*

If you found this article interesting and want to further understand correlation, take a look at the article **Explaining Correlation to a Newbie to Data Analytics**.

*Image Courtesy: http://www.freedigitalphotos.net/*

**Suggested Read:**

Explaining Correlation to a Newbie to Data Analytics

Why Missing or Incomplete Data is Crucial to the Data Analyst