Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incrimental GPU-training #4932

Closed
pikaliov opened this issue Oct 11, 2019 · 5 comments
Closed

incrimental GPU-training #4932

pikaliov opened this issue Oct 11, 2019 · 5 comments

Comments

@pikaliov
Copy link

Version of XGBoost: 0.90
I have a task to train classification system, features from dataset it's extracted embeddings from NN (final size of dataset is 100 Gb... so system run out of memory in GPU training).

params = {
"learning_rate":0.001,
"objective":'multi:softmax',
"tree_method":'gpu_hist',
"predictor":'gpu_predictor',
"seed":27,
'num_class': len(np.unique(data_train_Y)),
#'update':'refresh',
#'process_type': 'update',
#'refresh_leaf': True,
}

batch_size = 128
iterations = 100
model = None
print('[Info] Training...')
for i in range(iterations):
  print('Iteration #{}'.format(i))
  for start in range(0, len(data_train_X), batch_size):
    print('{} of {} batches'.format(start, len(data_train_X)))
    model = xgboost.XGBClassifier(**params, xgb_model=model)
    X_train_batch = data_train_X[start:start+batch_size]
    y_train_batch = data_train_Y[start:start+batch_size]
    #eval_set = [(X_train_batch, y_train_batch)]
    model.fit(X_train_batch, y_train_batch, verbose=True)
    y_pr = model.predict(X_train_batch)

I tried the above code and after first step of training (first 128 batches) process stucked.
Can be done incremental GPU-training with XGBoost and how? (i can't use distributed training)
Can you link some examples with CPU/GPU incrimental training?
I tried this one https://gist.github.com/ylogx/53fef94cc61d6a3e9b3eb900482f41e0 but its kinda outdated.

@trivialfis
Copy link
Member

Let me take a look at weekend.

@pikaliov
Copy link
Author

@trivialfis did u find out any solutio?

@trivialfis
Copy link
Member

trivialfis commented Oct 16, 2019

@pikaliov I tried with master branch. There are a few issues maybe related to your use case:

  • Memory limit. Python is not releasing the object even if your model is replaced by newer one. The memory cleaning process is hard to control even if you call gc module. So all used GPU memory is still active, which can be verified by nvidia-smi.
  • I'm not sure why you create iteration. In XGBoost, the term "iteration" is just number of stacked (boosted) trees. From scikit-learn interface you can use n_estimators parameter.
  • Verify your number of classes. One stack of trees for each class so your memory might run out fairly fast.

We have a hack for dealing with the Python memory model mentioned above:

import numpy as np
import xgboost

kRows = 128 * 10
kCols = 128

data_train_X = np.random.randn(kRows, kCols)
data_train_Y = np.random.randint(0, 4, kRows)
n_classes = len(np.unique(data_train_Y))
assert n_classes == 4

params = {
    "learning_rate": 0.001,
    "objective": 'multi:softmax',
    "tree_method": 'gpu_hist',
    "predictor": 'gpu_predictor',
    "seed": 27,
    'num_class': n_classes,
    'verbosity': 2,
    'nthread': -1
}

batch_size = 128

print('[Info] Training...')
for start in range(0, data_train_X.shape[0], batch_size):
    print('{} of {} batches'.format(start, len(data_train_X) / batch_size))
    model = xgboost.XGBClassifier(n_estimators=128, **params, xgb_model='model.booster')
    X_train_batch = data_train_X[start:start + batch_size, :]
    y_train_batch = data_train_Y[start:start + batch_size]
    print('y_train_batch:', y_train_batch)
    model.fit(X_train_batch, y_train_batch, verbose=True)
    y_pr = model.predict(X_train_batch)
    model.save_model('model.booster')

@trivialfis
Copy link
Member

trivialfis commented Oct 16, 2019

Also #4357 is an on going effort to bring external memory support for XGBoost GPU Hist, which iterates dataset internally and has a sound theory background. I'm not sure your stochastic training is indeed robust.

@pikaliov
Copy link
Author

@trivialfis Thanks for your answer. I used above code but process still stacked. I changed size of batch to batch_size=512 and completed training procedure.
I know about memory limit, but its strange... process stacked and wo CUDA runout of memory.

@lock lock bot locked as resolved and limited conversation to collaborators Jan 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants