Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA] illegal memory access when using CUDA and small max_bin #4315

Closed
RobinDong opened this issue May 24, 2021 · 1 comment
Closed

[CUDA] illegal memory access when using CUDA and small max_bin #4315

RobinDong opened this issue May 24, 2021 · 1 comment
Labels

Comments

@RobinDong
Copy link
Contributor

RobinDong commented May 24, 2021

Description

By using CUDA histogram of the master branch, the simple python code report memory error if it uses max_bin: 7

Reproducible example

Get master branch of LightGBM

Build it with CUDA

cmake -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DUSE_CUDA=ON -DUSE_DEBUG=ON -DCUDA_DEBUG=3 ..

Run python snippet below

import numpy as np
import pandas as pd
import lightgbm as lgb

# Generate fake data
NUM_DATA = 100000

data_map = {}
for index in range(200):
    col = f"col{index}"
    data_map[col] = np.random.rand(NUM_DATA)
    if index == 0:
        col0 = data_map[col]

data_map["target"] = col0

df = pd.DataFrame(data_map)
train = df[NUM_DATA//10:]
test = df[0:NUM_DATA//10]

train_target = train['target']
test_target = test['target']
train = train.drop(['target'], axis=1)
test = test.drop(['target'], axis=1)
valid_set = lgb.Dataset(test.values, label=test_target.values)
train_set = lgb.Dataset(train.values, label=train_target.values)

# Set parameters and start to train
params = {
    'boosting_type': 'gbdt',
    'objective': 'regression',
    'metric': {'l2', 'l1'},
    'device_type': 'cuda',
    'verbose': 2,
    'max_bin': 7,
}

print('LightGBM training started')
print('Parameters: %s' % params)

trained_model = lgb.train(
    params,
    train_set,
    valid_sets=valid_set)

print(trained_model)

And it will report error:

[LightGBM] [Fatal] [CUDA] an illegal memory access was encountered LightGBM/src/treelearner/cuda_tree_learner.cpp 239

terminate called after throwing an instance of 'std::runtime_error'
  what():  [CUDA] an illegal memory access was encountered LightGBM/src/treelearner/cuda_tree_learner.cpp 239

Aborted (core dumped)

Environment info

Operating System: Linux

CPU Model: Intel(R) Xeon(R) CPU @ 2.30GHz
GPU model: Nvidia T4
CUDA: 11.0.182
Python 3.6.9
LightGBM version or commit hash: master branch

Command(s) you used to install LightGBM

mkdir build
cd build
cmake -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DUSE_CUDA=ON -DUSE_DEBUG=ON -DCUDA_DEBUG=3 ..
make -j
cd ../python-package/
python3 setup.py install --precompile
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants