5.11.7. XGBoost (clip0243 action)

<< Click to Display Table of Contents >>

Navigation:  5. Detailed description of the Actions > 5.11. R Predictive >

5.11.7. XGBoost (clip0243 action)

 

Icon: ANATEL~4_img24  

 
Function: XGBoost
 

Property window:

 

ANATEL~4_img23

 

Short description:

Use the XGBoost Library

 

Long Description:

Gradiant Boosting is probably the most popular algorithm in this second decade of the 21st century. The main reason is that it performed extraordinarily well in most data mining competitions. It usually ensures one of the highest accuracy in situations where the learning and test datasets are from the same time frame.

 

In practice, we’ve seen those models degrade very quickly over time (in a banking setting, for example, the accuracy dropped 10 points below LASSO in just two months), so we tend not to use it.

 

The general idea of gradient boosting is to make ensemble modeling on steroid. By putting together hundreds or thousands of weak models we can obain a fairly good classifier. This is done at the cost of interpetability.

 

Fit a gradient Boosting model. The different operating modes are:
 

linear regression

logistic regression

logistic regression for binary classification, output probability

logistic regression for binary classification, output score before logistic transformation

poisson regression for count data, output mean of poisson distribution

multiclass classification using the softmax objective

multiclass classification using the softmax objective but with output as a vector that contains predicted probability of each data point belonging to each class

ranking task by minimizing the pairwise loss

gamma regression for severity data, output mean of gamma distribution