Introduction

Recall and Precision are crucial topics in data science, especially machine learning. It is important to know the basic differences between precision vs recall vs accuracy. This article will cover basic knowledge about recall and precision parameters and how to apply them to data models.

In this article let us look at:

  1. Problem Statements
  2. What is Precision?
  3. What is a Recall?
  4. What is Accuracy?
  5. Importance of F1 score
  6. ROC curve (Receiver Operating Characteristic Curve) and AUC (Area Under the Curve)
  7. PRC (Precision-Recall Curve)

1. Problem Statements

Recall and Precision are terms you will come across when evaluating datasets through problem statements. Datasets include industrial and research-related models. 

Before we begin, let us learn some important terms:

True Positive (TP): The predicted positive value matches the actual value of a class.

False Positive (FP)(Type I error): The predicted value is a yes, but the class’s actual value is no we are dealing with a false positive. This tends to incorrectly point towards the existence of something that doesn’t exist.

True Negative (TN): Matching negative values, i.e. when the predicted negative values of a class is no, and the actual value is also no.

False Negative (FN) (Type II error): This is the exact opposite of false negative when the actual class value is yes, but the predicted class value is no.

2. What is Precision?

Precision value is the ratio between the True Positives (TP) and total positive predictions. Total positive predictions are a sum of the number of True Positives (TP) and False Positives (FP). Precision Formula is expressed as: 

Precision = True Positive (TP)/ True Positive (TP) False Positive (FP) 

Precisions help us measure the relevant value data points in a model.

3. What is a Recall

Recall value represents the percentage of relevant results that data models accurately identify. Recall value signifies the accuracy of data models. The recall is also known as the sensitivity of a model. The formula of recall is represented as the ratio between True Positive (TP) and the sum of True Positives (TP) and False Negatives (FN):

Recall = True Positive (TP)/ True Positive (TP) False Negative (FN)

4. What is Accuracy?

Accuracy is the ratio between the sum of correct predictions and the total number of predictions. The formula for accuracy is written as:

Accuracy = True Positive (TP) True Negatives (TN)/ True Positives (TP) False Positives (FP) True Negative (TN) False Negative (FN)

A high accuracy value is indicative of an efficient model, and it is best for symmetric models. It is very easy to differentiate between accuracy and prediction when we compare the various models. Accuracy has some shortcomings, such as; this model is not very efficient for datasets containing two or more data classes as they might be neglected. Furthermore, if a dataset is non-symmetric or imbalanced, the accuracy is not well represented. Recall and precision come to the rescue, in this case. Utilizing precision, recall, F1 score and a confusion matrix, we can design efficient data evaluation models.

5. Importance of F1 score

F1 score is related to recall and precision. It is required for establishing a balance between recall and precision. The formula for the F1 score is as follows:

f1= 2x precision*recall/precision-recall

On close observation, it is evident that the F1 score is the harmonic mean of recall and precision. F1 score also plays an important role in handling non-symmetric datasets.

6. ROC curve (Receiver Operating Characteristic Curve) and AUC (Area Under the Curve)

The ROC curve consists of two components; False Positive Rate (FPR) and True Negative Rate (TNR). The False Positive Rate is the ration between False Positives and the actual number of negatives. The True Negative Rate, known as Specificity, is the ration of the predicted true negatives and the actual number of negative observations. In a ROC, the FPR is on the x-axis, and TNR is on the y-axis. Here are some of the characteristics of the ROC curve:

(1) The Area Under the Curve of the ROC curve is a brilliant metric of your data-model.

(2) Data-models with a high AUC value is called models with good skill.

(3) At the lowest point (0,0), the threshold is set at 1, and at the highest point (1,1), the threshold is set at 0.

7. PRC (Precision-Recall Curve)

PRC represents the precision on the y-axis and recalls on the x-axis. The Area Under the Curve (AUC) for this model ranges from 0 to 1. Essentially a high AUC value is preferred. In a PRC, the threshold is 1 at the lowest point (0,0) and set at 0 at the highest point (1,1) respectively.  For any study, it is the aim to maintain the curve close to (1,1) as it signifies good recall and precision.

Conclusion

Every study has a different requirement. In a lot of studies, recall and precision are equally important but in some cases, getting a high recall is more important than high precision or vice versa. It is important to consider that you cannot get high precision and recall at the same time. It is all up to the particular study’s specific requirements, which dictate which metrics you will need to concentrate on.  A multifaceted study might observe each metric, precision, recall, accuracy, and f1 score, and their consequences to the results on an individual basis.

There are no right or wrong ways of learning AI and ML technologies – the more, the better! These valuable resources can be the starting point for your journey on how to learn Artificial Intelligence and Machine Learning. Do pursuing AI and ML interest you? If you want to step into the world of emerging tech, you can accelerate your career with this Machine Learning And AI Courses by Jigsaw Academy.

ALSO READ

SHARE
share

Are you ready to build your own career?