Introduction

An outlier is an observation that falls at an abnormal distance from the other values that are there on the random sample from the population. The decision, however, rests on the analyst to understand which point will be considered to be abnormal or an outlier. The articles talk about the types of outliers.

  1. What are outliers?
  2. What is an outlier to explain the types of outliers?

1) What are outliers?

The outlier is a point of data that is different from the other observations. The outlier could be caused because of measurement variability. It could also be formed because of some experimental error. In the case of the latter, it gets altogether eliminated from the data set. An outlier is something that can affect the statistical analysis phenomenally.

The outlier values could be formed by chance in the distribution. In most cases, this could be because of a measurement error or if the population comes with a heavy-tailed distribution. In the former case, one will wish to discard them or use the test that is robust to the outlier. In the latter case, the analysis will show that the distribution is highly skewed and that one should be cautious when using the tools that assume a normal distribution.

Another cause of an outlier could be when two distributions are mixed. These could be two subpopulations that may be disconnected or it could indicate a trail that has an error of measurement. This gets modelled using the mixture model.

Outlier points could be faulty data or an error procedure. It could also be an area where some theory would be invalidated. If the sample size is large then some outlier is fine to have.

The outlier is an extreme observation and this could include the sample minimum or maximum or both of them. On the contrary, the sample minimum and maximum are not an outlier because these may not be far from the observation.

2) What is an outlier to explain the types of outliers?

Now that you know what an outlier is here are the types of outliers.

  • Global Outliers or the point namely:

A data point gets considered to be a global outlier in case its value is very far away from the entire data set in what it is found. The global outlier is basically a sample point that is measured and which has a very high or a low value relative to the values that are present in the dataset.

  • Contextual outliers or the conditional outlier

If a particular data point is different in the context that is specific to a condition but it is not different otherwise then this is called the contextual outlier. The data object attribute needs to get divided into two groups. The behavioural attributes are the object characteristics that are used in evaluating the outlier. It is difficult to spot the contextual outlier if you do not have any background information.

  • Collective Outliers

If there is a data point collection that is totally different from the entire set of data then this is the collective outlier. A subset of the data point in the data set is different if the values as a group deviate from the data set totally. However, the values of these data points are not different in a global or a contextual sense.

Conclusion


It is important to investigate outliers carefully. They may have some information about the process which is under investigation. Before you consider eliminating it you should first try to understand the reasons why they may have been here in the first place. In most cases, outliers could be a bad data point. Unfortunately, there is no strict statistical rule for outlier identification. This makes it highly subjective which is dependent on the analysts’ knowledge and the process of data collection.

If you are interested in making it big in the world of data and evolve as a Future Leader, you may consider our Integrated Program in Business Analytics, a 10-month online program, in collaboration with IIM Indore!

ALSO READ

SHARE