<< Click to Display Table of Contents >>
## 5.11.7. XGBoost ( action) |

Icon:

Function: XGBoost

Property window:

Short description:

Use the XGBoost Library

Long Description:

Gradiant Boosting is probably the most popular algorithm in this second decade of the 21st century. The main reason is that it performed extraordinarily well in most data mining competitions. It usually ensures one of the highest accuracy in situations where the learning and test datasets are from the same time frame.

In practice, we’ve seen those models degrade very quickly over time (in a banking setting, for example, the accuracy dropped 10 points below LASSO in just two months), so we tend not to use it.

The general idea of gradient boosting is to make ensemble modeling on steroid. By putting together hundreds or thousands of weak models we can obain a fairly good classifier. This is done at the cost of interpetability.

Fit a gradient Boosting model. The different operating modes are:

•linear regression

•logistic regression

•logistic regression for binary classification, output probability

•logistic regression for binary classification, output score before logistic transformation

•poisson regression for count data, output mean of poisson distribution

•multiclass classification using the softmax objective

•multiclass classification using the softmax objective but with output as a vector that contains predicted probability of each data point belonging to each class

•ranking task by minimizing the pairwise loss

•gamma regression for severity data, output mean of gamma distribution