incremental learning with xgb_model - wrong predictions #5192

jfrery · 2020-01-09T15:13:55Z

Hello,

I am experimenting with the parameter xgb_model in oder to update my model with new data in an incremental fashion.

reg0 = XGBRegressor(**params)
reg0.fit(X_train0,y_train0,eval_set=[(X_test,y_test)],eval_metric=my_metric)
reg0.save_model('model0')

Then I use this model to train a new one over new data.

reg1 = XGBRegressor(**params) 
reg1.fit(X_train1,y_train1,eval_set=[(X_test,y_test)],xgb_model='model0',eval_metric=my_metric)

I can see that the training of reg1 is starting with the same performance on the test as the performance of reg0 on the test at the end of the learning process which is great.

Once the training is over, I call reg1.predict(X_test) and compute my_metric(). Here I can see a big difference from the metric I got at the end of my reg1 training.

In fact, the predictions from reg1.predict() are very different from the predictions that were going in my_metric() at the last iterations of my training. I assume that the predict function does not take into account reg0 in the prediction.

I have experimented with the learning api and this problem does not seem to occur.

The text was updated successfully, but these errors were encountered:

trivialfis · 2020-01-10T08:20:43Z

@jfrery Could you please provide a self-contained script I can run? Maybe start with small random data generated by numpy:

import numpy as np
import xgboost

np.random.seed(1994)

kRows = 1000
kCols = 100

X = np.random.randn(kRows, kCols)
y = np.random.randn(kRows)

dtrain = xgboost.DMatrix(X, y)

...

jfrery · 2020-01-13T17:56:12Z

@trivialfis Sorry for the delay.

The problem is simpler than I though. I didn't mention, but I used early_stopping_rounds parameter in the fit function of the second model. The problem is that when training the second model, the iteration starts at 0 without taking into account the previous model that we passed in the xgb_model parameter. When calling the predict method, the model simply stops at the wrong iteration since the model.best_iteration does not take into account the first model.

trivialfis · 2020-01-16T20:15:17Z

Okay, I need to spend some time on that. Thanks for the explanation.

trivialfis · 2020-01-20T07:26:56Z

Will implement the getter of boosted round in xgboost core later.

trivialfis added the type: bug label Jan 16, 2020

trivialfis mentioned this issue Dec 16, 2020

Support early stopping with training continuation, correct num boosted rounds. #6506

Merged

trivialfis closed this as completed in #6506 Dec 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incremental learning with xgb_model - wrong predictions #5192

incremental learning with xgb_model - wrong predictions #5192

jfrery commented Jan 9, 2020 •

edited

Loading

trivialfis commented Jan 10, 2020

jfrery commented Jan 13, 2020

trivialfis commented Jan 16, 2020 •

edited

Loading

trivialfis commented Jan 20, 2020

incremental learning with xgb_model - wrong predictions #5192

incremental learning with xgb_model - wrong predictions #5192

Comments

jfrery commented Jan 9, 2020 • edited Loading

trivialfis commented Jan 10, 2020

jfrery commented Jan 13, 2020

trivialfis commented Jan 16, 2020 • edited Loading

trivialfis commented Jan 20, 2020

jfrery commented Jan 9, 2020 •

edited

Loading

trivialfis commented Jan 16, 2020 •

edited

Loading