Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug between colsample_bynode and gpu_hist #8824

Closed
ngupta20 opened this issue Feb 18, 2023 · 11 comments · Fixed by #8850
Closed

Possible bug between colsample_bynode and gpu_hist #8824

ngupta20 opened this issue Feb 18, 2023 · 11 comments · Fixed by #8850

Comments

@ngupta20
Copy link

ngupta20 commented Feb 18, 2023

Introduction

It seems that setting a colsample_bynode causes GPU training to fail. I've tested this out on linux and windows machines. The same script was functional on tag 1.7.3, so I believe this was introduced during the 1.7.4 patch release.

Reproduction

import pandas as pd
import numpy as np
import xgboost as xgb

assert xgb.__version__ == "1.7.4"

data = pd.DataFrame(np.random.rand(1024, 8))
data.columns = "x" + data.columns.astype(str)
features = data.columns
data["y"] = data.sum(axis=1) < 4
dtrain = xgb.DMatrix(data[features], label=data["y"])
model = xgb.train(
    dtrain=dtrain,
    params={
        "max_depth": 5,
        "learning_rate": 0.05,
        "objective": "binary:logistic",
        "tree_method": "gpu_hist",
        "colsample_bytree": 0.5,
        "colsample_bylevel": 0.5,
        "colsample_bynode": 0.5,  # Causes issues
        "reg_alpha": 0.05,
        "reg_lambda": 0.005,
        "seed": 66,
        "subsample": 0.5,
        "gamma": 0.2,
        "predictor": "auto",
        "eval_metric": "auc",
    },
    num_boost_round=150,
)

Linux System Information

nvidia-smi

Sat Feb 18 22:38:08 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.03   Driver Version: 450.119.03   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   27C    P8    13W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

nvidia-smi -L

GPU 0: Tesla T4 (UUID: GPU-596bf232-c791-5884-88a3-515c7f7c00c3)

lscpu

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
Stepping:            7
CPU MHz:             3102.204
BogoMIPS:            4999.99
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            36608K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni

Stack Trace

../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [0,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [1,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [2,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [3,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [4,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [5,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [6,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [7,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [8,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [9,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [10,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [11,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [12,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [13,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [14,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [15,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [16,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [17,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [18,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [19,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [20,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [21,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [22,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [23,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [24,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [25,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [26,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [27,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [28,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [29,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [30,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
../src/tree/gpu_hist/evaluate_splits.cu:92: xgboost::tree::EvaluateSplitAgent<kBlockSize>::EvaluateSplitAgent(xgboost::tree::EvaluateSplitAgent<kBlockSize>::TempStorage *, int, const xgboost::tree::EvaluateSplitInputs &, const xgboost::tree::EvaluateSplitSharedInputs &, const xgboost::tree::TreeEvaluator::SplitEvaluator<xgboost::tree::GPUTrainingParam> &) [with kBlockSize = 32]: block: [0,0,0], thread: [31,0,0] Assertion `(!shared_inputs.is_dense || missing.GetQuantisedHess() == 0)` failed.
---------------------------------------------------------------------------
XGBoostError                              Traceback (most recent call last)
Cell In[1], line 12
     10 data["y"] = data.sum(axis=1) < 4
     11 dtrain = xgb.DMatrix(data[features], label=data["y"])
---> 12 model = xgb.train(
     13     dtrain=dtrain,
     14     params={
     15         "max_depth": 5,
     16         "learning_rate": 0.05,
     17         "objective": "binary:logistic",
     18         "tree_method": "gpu_hist",
     19         "colsample_bytree": 0.5,
     20         "colsample_bylevel": 0.5,
     21         "colsample_bynode": 0.5,  # Causes issues
     22         "reg_alpha": 0.05,
     23         "reg_lambda": 0.005,
     24         "seed": 66,
     25         "subsample": 0.5,
     26         "gamma": 0.2,
     27         "predictor": "auto",
     28         "eval_metric": "auc",
     29     },
     30     num_boost_round=150,
     31 )

File ~/repos/repo/.venv/lib/python3.8/site-packages/xgboost/core.py:620, in require_keyword_args.<locals>.throw_if.<locals>.inner_f(*args, **kwargs)
    618 for k, arg in zip(sig.parameters, args):
    619     kwargs[k] = arg
--> 620 return func(**kwargs)

File ~/repos/repo/.venv/lib/python3.8/site-packages/xgboost/training.py:185, in train(params, dtrain, num_boost_round, evals, obj, feval, maximize, early_stopping_rounds, evals_result, verbose_eval, xgb_model, callbacks, custom_metric)
    183 if cb_container.before_iteration(bst, i, dtrain, evals):
    184     break
--> 185 bst.update(dtrain, i, obj)
    186 if cb_container.after_iteration(bst, i, dtrain, evals):
    187     break

File ~/repos/repo/.venv/lib/python3.8/site-packages/xgboost/core.py:1918, in Booster.update(self, dtrain, iteration, fobj)
   1915 self._validate_dmatrix_features(dtrain)
   1917 if fobj is None:
-> 1918     _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
   1919                                             ctypes.c_int(iteration),
   1920                                             dtrain.handle))
   1921 else:
   1922     pred = self.predict(dtrain, output_margin=True, training=True)

File ~/repos/repo/.venv/lib/python3.8/site-packages/xgboost/core.py:279, in _check_call(ret)
    268 """Check the return value of C API call
    269 
    270 This function will raise exception when error occurs.
   (...)
    276     return value from API calls
    277 """
    278 if ret != 0:
--> 279     raise XGBoostError(py_str(_LIB.XGBGetLastError()))

XGBoostError: [22:43:45] ../src/tree/updater_gpu_hist.cu:798: Exception in gpu_hist: [22:43:45] ../src/c_api/../data/../common/common.h:46: ../src/tree/../collective/../common/device_helpers.cuh: 1395: cudaErrorAssert: device-side assert triggered
Stack trace:
  [bt] (0) /home/nandish-gupta/repos/repo/.venv/lib/python3.8/site-packages/xgboost/lib/libxgboost.so(+0x453f99) [0x7ffa1e186f99]
  [bt] (1) /home/nandish-gupta/repos/repo/.venv/lib/python3.8/site-packages/xgboost/lib/libxgboost.so(+0x458983) [0x7ffa1e18b983]
  [bt] (2) /home/nandish-gupta/repos/repo/.venv/lib/python3.8/site-packages/xgboost/lib/libxgboost.so(+0x72788c) [0x7ffa1e45a88c]
  [bt] (3) /home/nandish-gupta/repos/repo/.venv/lib/python3.8/site-packages/xgboost/lib/libxgboost.so(+0x73c4ce) [0x7ffa1e46f4ce]
  [bt] (4) /home/nandish-gupta/repos/repo/.venv/lib/python3.8/site-packages/xgboost/lib/libxgboost.so(+0x73d0c1) [0x7ffa1e4700c1]
  [bt] (5) /home/nandish-gupta/repos/repo/.venv/lib/python3.8/site-packages/xgboost/lib/libxgboost.so(+0x2a0dc9) [0x7ffa1dfd3dc9]
  [bt] (6) /home/nandish-gupta/repos/repo/.venv/lib/python3.8/site-packages/xgboost/lib/libxgboost.so(+0x2a1b1d) [0x7ffa1dfd4b1d]
  [bt] (7) /home/nandish-gupta/repos/repo/.venv/lib/python3.8/site-packages/xgboost/lib/libxgboost.so(+0x2dc5f7) [0x7ffa1e00f5f7]
  [bt] (8) /home/nandish-gupta/repos/repo/.venv/lib/python3.8/site-packages/xgboost/lib/libxgboost.so(XGBoosterUpdateOneIter+0x70) [0x7ffa1de60be0]



Stack trace:
  [bt] (0) /home/nandish-gupta/repos/repo/.venv/lib/python3.8/site-packages/xgboost/lib/libxgboost.so(+0x71dd19) [0x7ffa1e450d19]
  [bt] (1) /home/nandish-gupta/repos/repo/.venv/lib/python3.8/site-packages/xgboost/lib/libxgboost.so(+0x73d378) [0x7ffa1e470378]
  [bt] (2) /home/nandish-gupta/repos/repo/.venv/lib/python3.8/site-packages/xgboost/lib/libxgboost.so(+0x2a0dc9) [0x7ffa1dfd3dc9]
  [bt] (3) /home/nandish-gupta/repos/repo/.venv/lib/python3.8/site-packages/xgboost/lib/libxgboost.so(+0x2a1b1d) [0x7ffa1dfd4b1d]
  [bt] (4) /home/nandish-gupta/repos/repo/.venv/lib/python3.8/site-packages/xgboost/lib/libxgboost.so(+0x2dc5f7) [0x7ffa1e00f5f7]
  [bt] (5) /home/nandish-gupta/repos/repo/.venv/lib/python3.8/site-packages/xgboost/lib/libxgboost.so(XGBoosterUpdateOneIter+0x70) [0x7ffa1de60be0]
  [bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7ffa8ee8edae]
  [bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x22f) [0x7ffa8ee8e71f]
  [bt] (8) /usr/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(_ctypes_callproc+0x8ce) [0x7ffa8f0a434e]

Thanks in advance!

Nandish Gupta
Senior AI Engineer at SolasAI

@trivialfis
Copy link
Member

@RAMitchell Could you please help take a look when you are available?

@afogarty85
Copy link

Same issue -- which version would be an ideal second best to use? The latest nightly suffers from the same.

@trivialfis trivialfis reopened this Mar 6, 2023
@trivialfis
Copy link
Member

Do you have a reproducible example for nightly?

@afogarty85
Copy link

@hcho3
Copy link
Collaborator

hcho3 commented Mar 6, 2023

@afogarty85
Copy link

The 2.0.0 dev seems to be working for the toy example above and a slightly modified version below for categorical data:

print('starting...')
data = pd.DataFrame(np.random.randint(low=0, high=100, size=(1000000, 45)))
data.columns = "x" + data.columns.astype(str)
features = data.columns
for col in features:
    data[col] = data[col].astype('category')
featureTypes = ['c' for col in features]
data["y"] = np.random.randint(low=0, high=2, size=data.shape[0])
dtrain = xgb.DMatrix(data[features], label=data["y"], feature_types=featureTypes, enable_categorical=True)
model = xgb.train(verbose_eval=1,
    dtrain=dtrain,
    params={
        "max_depth": 8,
        "learning_rate": 0.05,
        "objective": "binary:logistic",
        "tree_method": "gpu_hist",
        "colsample_bytree": 0.5,
        "colsample_bylevel": 0.5,
        "colsample_bynode": 0.5,
        "reg_alpha": 0.05,
        "reg_lambda": 0.005,
        "seed": 66,
        "max_cat_to_onehot": 1,
        "validate_parameters": 1,
        "grow_policy": 'lossguide',
        'sampling_method': 'gradient_based',
        "subsample": 0.5,
        "gamma": 0.2,
        "predictor": "auto",
        "eval_metric": "auc",
    },
    num_boost_round=1000,
)
print('training complete!')

@hcho3
Copy link
Collaborator

hcho3 commented Mar 6, 2023

Great! I'll go ahead and close this issue.

@hcho3 hcho3 closed this as completed Mar 6, 2023
@afogarty85
Copy link

I am not sure how to replicate it with dummy data yet, but I am getting gpu crashes with colsample_bylevel and colsample_bynode; colsample_bytree works. Will look to open something if I can find it.

@hcho3
Copy link
Collaborator

hcho3 commented Mar 6, 2023

@afogarty85 Please open a new issue if you find problems with colsample_bylevel and colsample_bynode.

@Quetzalcohuatl
Copy link

I have same error with gpu_hist and colsample_bynode on xgb version 1.7.4

removing colsample_bynode parameter fixed it for me. Alternatively it seems updating xgb would also fix it

thanks for the issue!

@trivialfis
Copy link
Member

Should be fixed in the latest patch release 1.7.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants