XGBoost: Extreme Gradient Boosting — All you need to know

Sanchari Gautam
4 min readSep 12, 2021

--

Before we dig deep into the XGBoost Algorithm, we have to know a little bit of context to understand why and where this algorithm is used. If you’re trying to learn more about XGBoost, I can assume that you’re well aware of the Decision Tree algorithms, which is a part of the non-linear supervised machine learning method.

Now, we sometimes combine several decision trees to produce a strong learner with more efficient predictive performance than a single weak learning decision tree. We refer to these combinations of weak learners as Ensemble Decision Trees.

Definition:

Bagging and boosting are two basic techniques used for making ensemble decision trees. XGBoost is an algorithm to make such ensembles using Gradient Boosting on shallow decision trees.

If we recollect Gradient Boosting correctly, we would remember that the main idea behind Gradient Boosting is that it creates an ensemble by adding an incremental model to the previous model at each iteration, which is built by modeling the residues.

The only difference in XGBoost is that this time, we will not be using the residues but a shallow decision tree for adding to the previous model.

Gradient Boosting vs XGBoost Algorithms

A pseudo algorithm is attached above for an easy reference. As you can see above, in XGBoost we are fitting a decision tree on the computed target values and the corresponding (xi, yi’) to get J terminal nodes, say.

Let us represent each terminal node by R^ j (∀ j = 1,2,3,…,J). For each terminal node, we aim to find those α values for which the loss function between the actual y-value and the predicted y-value (given by F(x)+α) is the minimum. The stopping criteria for the XGBoost model is that the gradients are close to zero.

Here, we have two hyperparameters which on tuning, highly impact the performance of the XGBoost model —

  1. λ : Learning Rate

It is also known as shrinkage, which is used to regularize the model. The value of the learning rate varies from 0 to 1, with the most common widely used value at 0.01.

A larger λ ensures that the model is regularized by shrinking the number of trees to reach the global minima. Although we have to keep in mind that exceptionally high learning rates incurs the risk of missing the minima and thus bouncing between the function (a situation known as cross-over).

A very small learning rate, on the other hand requires large number of decision trees to reach the minima, thus leading to a very long training period. This can also lead to overfitting in the trees.

2. T : Number of Trees

This is another key feature to regularize the model. But we have to make sure that both λ and T are not tuned together, because larger λ implies lower T and vice versa.

Since both learning rate and the number of decision trees tuning can regularize the model, XGBoost uses a more regularized model formulation to control overfitting. The objective function of the XGBoost Algorithm is given by:

Objective Function = Training Loss + Regularization

XGBoost Objective Function Formula

Advantages of XGBoost:

  1. This algorithm uses regularization by default, which makes this the most optimally complex algorithm present.
  2. XGBoost uses all the cores of the PC enabling it’s capacity to do parallel computation, thus increasing the speed of the computations.
  3. This algorithm is efficient enough to capture the trend of the missing values. Thus there is no need to treat missing values in the dataset if one is planning to perform XGBoost.

Conclusion:

Although XGBoost is so far the best algorithm that is present today in regression analysis but there are still some scenarios where it is better not to use XGBoost. One such situation is when the number of training examples is less than the number of features present.

Nevertheless, XGBoost surpasses all the limitations because of its highly efficient computational power and model performance. This can be used in regression, classification, ranking and user defined prediction case studies also.

I am currently working on the Python implementation of the XGBoost algorithm, which will be shortly pushed into my github profile. Kindly follow and stay tuned if you like this article.

--

--

No responses yet