LightGBM represents a framework for the light gradient to enhance memory utilization and productivity based on the decision trees algorithm. LightGBM works on Windows, Linux, and macOS and supports Python, C, R, and C#. LightGBM uses two new techniques, and those are :

  • Gradient-based one side sampling
  • Exclusive Feature Bundling  (EFB)

The primary difference between lightGBM vs. XGboost is that the LightGBM approach filters data instances to determine a divided value through the GOSS technique, while XGBoost uses a histogram-based algorithm and a pre-sorted algorithm for the most efficient division calculation. Instead, LightGBM uses a highly optimized histogram-based decision-making algorithm, which provides both efficiency and memory consumption along with its considerable advantages.

  1. Algorithm for GOSS
  2. Mathematical Analysis for GOSS Technique
  3. Algorithm for EFB Technique
  4. Parameter Tuning

1. Algorithm for GOSS

In the calculation of the information gain, many different data instances have various responsibilities. Large gradients instances tend to add more to the information gain. For lightGBM example, the greater than or between the top percentiles are above the predefined thresholds to drop randomly. These instances with the small gradients are the limitations of accuracy in order to preserve the data-gain estimate. This process should lead to a more precise in estimating the gain with the same rate of sampling than random sampling, particularly if the gain in the information is of a wide range.

Input:d: iterations, I: training data

Input: a: large gradient data sampling ratio

Input: b: small gradient sample data

Input: L: low learner, loss: loss function

models ? {}, randN ? b × len(I), fact ? (1-a)/b

topN ? a × len(I)

for i = 1 to d do

preds ? w ? {1, 1, …}, models.predict(I) g ? loss(I, preds).

sorted ? Get Sorted Indices (abs(g))

topSet ? sorted [1:topN]

randSet ? Random Pick usedSet ? topSet randSet

w[randSet] × = fact . Assign weight f act to the

small gradient data.

newModel, g[usedSet], w[usedSet])


2. Mathematical Analysis for GOSS Technique

As per GBM machine learning, let us see the GOSS technique mathematical analysis.

For a training set in which each xi is in space Xs, in case n {x1,···, xn}, a vector in s. The negatives of the loss function in relation to output in the GBM model are referred to as {g1, · · ·, gn} in every iteration of gradient boost. In this GOSS LightGBM technique, the training sessions are classified in descending order as per their absolute values. Then, the top cases with larger gradients are retained, and the instance subset A comes in. Finally, we divide the cases by estimated VJ(d) variance gain over the sub-set A? B.

where and the coefficient (1-a)/b is used to for the sum of the gradients and Al = {xi ? A : xij ? d}, Bl = {xi ? B : xij ? d}, Ar = {xi ? A : xij > d}, Br = {xi ? B : xij > d}.

3. Algorithm for EFB Technique

Usually, high-dimensional information is very small, which allows us to design an almost non-loss approach to minimize the number of characteristics. There are many mutually exclusive features in sparse function space, which means that non-zero values are never taken simultaneously. The exclusive features should be securely integrated into one feature.  Therefore, speed is increased without harming accuracy for the training framework.

Input: numData: number of data

Input: F: One bundle of exclusive features

binRanges ? {0}, totalBin ? 0

for f in F do

    totalBin  = f.numBin


newBin ? new Bin(numData)

for i = 1 to numData do

    newBin[i] ? 0

for j = 1 to len(F) do

if F[j].bin[i] is 0 then new F[j].bin[i]? Bin[i], bin Ranges[j]

Output: newBin, binRanges

  • Architecture

According to other boosting algorithms, the LightGBM classifier has the tree from the leaf point of view. LightGBM selects the leaf to rise with the highest delta loss. The leaf’s lightGBM algorithm is weaker than the level algorithm because of its fixation. The growth in the leafy tree should improve the model’s complications and cause small data sets to overfit.

4. Parameter Tuning

The following are a few critical parameters, and their usages are described below: 

  1. Max depth: Sets a tree depth limit. It is 20 by default. It is efficient for fitting control.
  2. Categorical feature: Specifies the categorical characteristic used for the model training.
  3. Bagging fraction: determines the fraction of the data for each iteration to be considered.
  4. The number of items to be done is indicated in num iterations. It defaults to 100.
  5. The leaves’ numbers are specified in a tree. The square of max depth should be smaller.
  6. Max bin: Set the bucket values to the highest number of bins.
  7. Min data in the bin: Minimum data amount in one bin is specified.
  8. Work: It dictates the tasks that we choose to carry out, or train, or forecast. The train is the default entry. For this parameter, a projection is another potential value.
  9. Feature fraction: Specifies a fraction of functions for each iteration to be considered. There is one default value.


I hope the above article has a detailed overview of LightGBM and how lightGBM works.

There are no right or wrong ways of learning AI and ML technologies – the more, the better! These valuable resources can be the starting point for your journey on how to learn Artificial Intelligence and Machine Learning. Do pursuing AI and ML interest you? If you want to step into the world of emerging tech, you can accelerate your career with this Machine Learning And AI Courses by Jigsaw Academy.



Are you ready to build your own career?