Introduction

In modern technology, machine learning has enabled huge innovations using accurate predictions based on the process of making the best possible decisions. Ensemble Learning is especially used to modify the unaccountability and high susceptibility to errors of such algorithms. AdaBoost is one such method of ensemble learning, also called “meta-learning,” used initially with binary classifiers to increase their efficiency. It uses weak classifiers to learn the mistakes through successive iterations until it turns the classifier into a strong classifier.

  1. Ensemble Learning
  2. AdaBoost Pseudocode
  3. Disadvantages vs Advantages of AdaBoost

1) Ensemble Learning

Ensemble learning is an optimized set of base AdaBoost algorithms in an algorithm (predictive).  Ex: Consider a classification Decision Tree with several factors being turned into the rules for questions. Each factor here takes into consideration another factor or takes a decision. The multiple decision rules for the tree of decisions quickly turn ambiguous, especially new sub-factors are input or when the unclear threshold consideration takes a decision. Ensemble Methods do not depend on a single Decision Tree making the right decision as they use aggregate different trees while turning the final predictor into a strong one.

The Ensemble methods are of 3 types and use-dependent in

  • Bagging where they reduce the variance.
  • Boosting where they reduce the bias
  • Stacking where they better the predictability.

There are two groups of such Ensemble methods:

  • Sequential Learners generate different models sequentially, all the while learning from the previous model mistakes and providing learning to the successor model. Ex: Adaboost uses the model dependency by providing a mislabeled example with higher weights.
  • Parallel Learners connect the base generating models in parallel to exploit the independence of the compared models and average the mistakes. Ex: Random Forests.

How does Boosting occur in the Ensemble Methods? The Adaptive Boosting algorithm attempts to set up a predictive strong learner model of AdaBoost regression from the errors of the preceding weaker models. Beginning with the trainer data’s first model, sequential models are sequentially added with each model correcting the predecessor model through successive iterations until maximum models number is reached and the training data perfectly predicted. Boosting reduces the error in biasing since the models cannot identify relevant data trends and need to evaluate differences of actual and predicted values.

There are 3 types oAdaBoost example Algorithms, namely.

  • Adaptive Boosting or AdaBoost
  • XGBoost
  • Gradient Tree Boosting

Unravelling AdaBoost:

The AdaBoost paper was authored by Robert Schapire and Yoav Freund. Adaptive Boosting or AdaBoost sklearn combines several weak classifiers in progressive learning to build the final strong predictive classifier since when predicting an object’s classification alone classifier does not predict accurately. Ex: Logistic Regression and the default Decision Trees (AdaBoost algorithm example) often have weak AdaBoost classifier, and the predictability is inaccurate when objects that are wrongly classified are presented to them, resulting in weak decisions. A  classifier that is weak gets good at random guessing, though it performs poorly in the classification of objects. 

AdaBoost is not a model but is a method applied to any classifier giving it the ability to learn its errors and hence suggest an accurate and better model for future use as an AdaBoost classifier example. It obviously scores in an AdaBoost vs XGBoost comparison.

Using the Decision Stumps, AdaBoost works since they are like the not “fully grown” trees of a (Random) Forest and have just two leaves and one node. AdaBoost uses many such stumps instead of decision trees and AdaBoost parameters. But, the stumps are not the right method to make decisions versus a tree that is full-grown, which predicts the target by combining all variables when making a decision. A stump suffers as it uses a single variable in decision-making. But using several stumps wherein few stumps have more weight in classification gives AdaBoost great advantages in right decisions for classification of objects and showing the difference between AdaBoost and gradient boosting.

2) AdaBoost Pseudocode

The Psuedo Code needs one to firstly set uniform weights for example. Next issue for Each (command) base learner do: Then comes Train (command for) base learner with a weighted sample. Now, Test (command to) base learner on (command) all data. Further Set (command) learner weight with (command) a weighted error. Set (command) example weights based on  (command) ensemble predictions. Lastly, end for

ML Showcase is the right place to run the code and tutorial for free.

Implementation of AdaBoost:

AdaBoost implementation in Python shows how AdaBoost algorithm works and has the following AdaBoost algorithm steps.

1: Modules Importing: One has to firstly import modules, packages etc. Python uses the AdaBoostRegressor  and AdaBoostClassifier  from the library of scikit-learn. Assuming the task is classification, one will firstly import and split the dataset into test and training sets using the train_test_split method.  Iris Dataset is used to import datasets. 

2: Data Exploring: One can use any dataset for classification. However, the Iris dataset is needed for its multi-class abilities for classification. Consider that the dataset of Irises has 4 features (sepal’s width, length, petal’s width, length) and different Iris flower types. One will target and predict from 3 possibilities based on the flower types (Virginica, Setosa, Versicolour) for which the Scikit-learn library has the data set or download the same from UCI Library for ML.

The data is readied using the load_iris() (command) method from the packaged datasets and assign the variable iris to the data. The dataset is also split with the variable X having features sepal width, length, petal width and length. Y is the target or the classification into the three flower types (Virginica, Setosa and Versicolour).

3: Data splitting: Making a split of the dataset into testing and training datasets is an example to check the model’s data points correctness and classifying abilities on unseen data using 30% test and 70% training samples of datasets.

4: Fitting the Model: AdaBoost Model building comprises of allowing AdaBoost to take the default learner model to be a Decision Tree and name the AdaBoostClassifier object as abc. 

Note important parameters are

  • n_estimators: Each iteration trains n number of weak learners.
  • base_estimator: This is the training model using a weak learner.
  • learning_rate: Uses default value 1 and contributes when setting weights for weak learners. 

Once these values are set one fits the object abc to the training dataset as its model.

5: Prediction Making: Here one check, how bad or good the model is when predicting target values. One then uses the (unseen) data to predict and uses this as a sample observation using predict() method to check the model’s class.

6: Model Evaluation: The accuracy of the model is an indicator of how oft the right classes are predicted by the model. This example yields 86.66% accuracy and one can use learners like Logistic Regression, (Support Vector) SV Machine etc for higher accuracy and compare AdaBoost vs gradient boosting.

3) Disadvantages vs Advantages of AdaBoost

Advantages: AdaBoost has many advantages due to its ease of use and less parameter tweaking when compared with the SVM algorithms. Plus AdaBoost can be used with SVM though theoretically, overfitting is not a feature of AdaBoost applications, perhaps because the parameters are not optimized jointly and the learning process is slowed due to estimation stage-wise. This link is useful to understand mathematics. The flexible AdaBoost can also be used for accuracy improvement of weak classifiers and cases in image/text classification.

Disadvantages: AdaBoost uses a progressively learning boosting technique. Hence high-quality data is needed in examples of AdaBoost vs Random Forest. It is also very sensitive to outliers and noise in data requiring the elimination of these factors before using the data. It is also much slower than the XGBoost algorithm.

Conclusion

AdaBoost and Ensemble Learning have been discussed above along with their various methods, types, etc. It is used in accuracy improvements of classification algorithms and was the first algorithm that successfully boosted binary classification. It now finds use in systems for systems using Facial Recognition and detection.

If you are interested in making a career in the Data Science domain, our 11-month in-person Postgraduate Certificate Diploma in Data Science course can help you immensely in becoming a successful Data Science professional. 

ALSO READ

SHARE