incrimental GPU-training #4932

pikaliov · 2019-10-11T14:36:38Z

Version of XGBoost: 0.90
I have a task to train classification system, features from dataset it's extracted embeddings from NN (final size of dataset is 100 Gb... so system run out of memory in GPU training).

params = {
"learning_rate":0.001,
"objective":'multi:softmax',
"tree_method":'gpu_hist',
"predictor":'gpu_predictor',
"seed":27,
'num_class': len(np.unique(data_train_Y)),
#'update':'refresh',
#'process_type': 'update',
#'refresh_leaf': True,
}

batch_size = 128
iterations = 100
model = None
print('[Info] Training...')
for i in range(iterations):
  print('Iteration #{}'.format(i))
  for start in range(0, len(data_train_X), batch_size):
    print('{} of {} batches'.format(start, len(data_train_X)))
    model = xgboost.XGBClassifier(**params, xgb_model=model)
    X_train_batch = data_train_X[start:start+batch_size]
    y_train_batch = data_train_Y[start:start+batch_size]
    #eval_set = [(X_train_batch, y_train_batch)]
    model.fit(X_train_batch, y_train_batch, verbose=True)
    y_pr = model.predict(X_train_batch)

I tried the above code and after first step of training (first 128 batches) process stucked.
Can be done incremental GPU-training with XGBoost and how? (i can't use distributed training)
Can you link some examples with CPU/GPU incrimental training?
I tried this one https://gist.github.com/ylogx/53fef94cc61d6a3e9b3eb900482f41e0 but its kinda outdated.

The text was updated successfully, but these errors were encountered:

trivialfis · 2019-10-11T15:56:24Z

Let me take a look at weekend.

pikaliov · 2019-10-16T11:27:05Z

@trivialfis did u find out any solutio?

trivialfis · 2019-10-16T13:02:15Z

@pikaliov I tried with master branch. There are a few issues maybe related to your use case:

Memory limit. Python is not releasing the object even if your model is replaced by newer one. The memory cleaning process is hard to control even if you call gc module. So all used GPU memory is still active, which can be verified by nvidia-smi.
I'm not sure why you create iteration. In XGBoost, the term "iteration" is just number of stacked (boosted) trees. From scikit-learn interface you can use n_estimators parameter.
Verify your number of classes. One stack of trees for each class so your memory might run out fairly fast.

We have a hack for dealing with the Python memory model mentioned above:

import numpy as np
import xgboost

kRows = 128 * 10
kCols = 128

data_train_X = np.random.randn(kRows, kCols)
data_train_Y = np.random.randint(0, 4, kRows)
n_classes = len(np.unique(data_train_Y))
assert n_classes == 4

params = {
    "learning_rate": 0.001,
    "objective": 'multi:softmax',
    "tree_method": 'gpu_hist',
    "predictor": 'gpu_predictor',
    "seed": 27,
    'num_class': n_classes,
    'verbosity': 2,
    'nthread': -1
}

batch_size = 128

print('[Info] Training...')
for start in range(0, data_train_X.shape[0], batch_size):
    print('{} of {} batches'.format(start, len(data_train_X) / batch_size))
    model = xgboost.XGBClassifier(n_estimators=128, **params, xgb_model='model.booster')
    X_train_batch = data_train_X[start:start + batch_size, :]
    y_train_batch = data_train_Y[start:start + batch_size]
    print('y_train_batch:', y_train_batch)
    model.fit(X_train_batch, y_train_batch, verbose=True)
    y_pr = model.predict(X_train_batch)
    model.save_model('model.booster')

trivialfis · 2019-10-16T13:10:37Z

Also #4357 is an on going effort to bring external memory support for XGBoost GPU Hist, which iterates dataset internally and has a sound theory background. I'm not sure your stochastic training is indeed robust.

pikaliov · 2019-10-16T21:16:44Z

@trivialfis Thanks for your answer. I used above code but process still stacked. I changed size of batch to batch_size=512 and completed training procedure.
I know about memory limit, but its strange... process stacked and wo CUDA runout of memory.

pikaliov closed this as completed Oct 16, 2019

lock bot locked as resolved and limited conversation to collaborators Jan 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incrimental GPU-training #4932

incrimental GPU-training #4932

pikaliov commented Oct 11, 2019

trivialfis commented Oct 11, 2019

pikaliov commented Oct 16, 2019

trivialfis commented Oct 16, 2019 •

edited

Loading

trivialfis commented Oct 16, 2019 •

edited

Loading

pikaliov commented Oct 16, 2019

incrimental GPU-training #4932

incrimental GPU-training #4932

Comments

pikaliov commented Oct 11, 2019

trivialfis commented Oct 11, 2019

pikaliov commented Oct 16, 2019

trivialfis commented Oct 16, 2019 • edited Loading

trivialfis commented Oct 16, 2019 • edited Loading

pikaliov commented Oct 16, 2019

trivialfis commented Oct 16, 2019 •

edited

Loading

trivialfis commented Oct 16, 2019 •

edited

Loading