Introduction

Central Limit Theorem, also known as CLT, is an important and often used concept in statistics. It is a fairly simple concept to understand and is a landmark discovery in the field of statistics. It forms the basis of probability distribution and has significant implications on the applied machine learning. CLT uses sampling distribution to generalize the samples and calculate approximate mean, standard deviation, and other parameters.  

In this article let us look at:

  1. Central Limit Theorem
  2. Worked Example with Dice
  3. Impact on Machine Learning

1.Central Limit Theorem

Central Limit Theorem is a statistical concept that states that the sample means the distribution of a random variable will approach a normal or Gaussian distribution if the sample sizes are large irrespective of the shape of the original population distribution.

To understand it better, let’s define the terms:

  • Normal Distribution means that a set of numbers should look like a bell curve when mapped on a graph.
  • Sample Mean (μ) is the average of the subsets of a large dataset or population where subsets are chosen randomly.  

The rule of the thumb or something which is considered safe is that the sample size should be greater than 30 (n>=30).

In simpler terms, CLT states that for 30 or more data points in your sample, the mean of that sample will be a part of a bell-shaped curve closer to the centre with few averages lying on either extreme and will represent the mean of the entire population. The same applies to standard deviation ‘σ’ as well. The average standard deviation of all the samples is representative of the standard deviation of the entire population.

One important assumption for this theorem to give correct statistical inference is to consider a sufficiently large random or unbiased sample from the population. Also, as we increase our sample size, it increases the estimate of the accuracy of the Gaussian distribution.

The Central Limit Theorem  formula can be represented as:

For a population(n) if “X” has finite mean μ and sd σ, CLT is defined by,

For the mean of the sample mean

For standard deviation of the sample means

Where, 

μ = Population mean

 σ = Population standard deviation

 μ = Sample mean

 σ = Sample standard deviation

 n = Sample size

To sum up, the Central Limit Theorem statistics states that for a large population n, X-bar can be approximated by a normal distribution with mean µ and standard deviation σ/√n.

2. Worked Example with Dice

We can understand the Central Limit Theorem with a worked example of rolling a dice. We have 1 to 6 numbers on each side of a dice cube. When we roll a dice multiple times, we expect to get an equal proportion of each roll. Each number has an equal probability of 1 in 6 to turn up from each roll. If we roll the dice multiple times, say 500, we will get more or less a linear or uniform distribution of the likelihood of any number appearing on the dice.

We increase the sample size and plot the averages of those samples to prove CLT. We roll the dice twice and repeat this process 500 times. Then compute the average of each pair and plot it on a graph. This process is to be repeated by rolling the same dice 5 times, 10 times, and the average of rolls to be plotted. The histogram of each set of averages shows that as the number of rolls increases (which means the sample size increases), the distribution of the averages comes close to Gaussian distribution. Also, the variation of the sample means decreases with an increase in sample size.   

3. Impact on Machine Learning

The implications of the Central Limit Theorem in the field of applied machine learning is significant. It is at the core of what machine learning does, make inferences about data. This theorem helps to quantify the likelihood of the sample getting deviated from the original population without considering a new sample. Also, an independent sample is a good representation of the complete population being observed, so the whole population’s attributes are not required.

The concept of significance testing and confidence interval is also based on CLT. By knowing that our sample mean will be a part of a normal or Gaussian distribution, we can use an understanding of the Gaussian distribution to estimate the probability of the sample mean based on the sample size and calculate an interval of desired confidence around the skill of the machine learning model.

Conclusion

To summarize, the article articulates the following: 

  • Central Limit Theorem is an integral concept of statistics and probability and has a noteworthy impact on data sciences and machine learning.
  • The theorem defines that regardless of the distribution of the population under study, the shape of the sampling distribution will turn normal or Gaussian as the sample size increases, provided the sample size is large and selected randomly.
  • Gaussian distribution and CLT knowledge are widely used in machine learning to make inferences about model performance.

There are no right or wrong ways of learning AI and ML technologies – the more, the better! These valuable resources can be the starting point for your journey on how to learn Artificial Intelligence and Machine Learning. Do pursuing AI and ML interest you? If you want to step into the world of emerging tech, you can accelerate your career with this Machine Learning And AI Courses by Jigsaw Academy.

ALSO READ

SHARE