Enable compiling with system cub. #7232

trivialfis · 2021-09-15T16:52:59Z

~~- This works only with CUDA 11.4, for CUDA 11.2 there are some obscure errors with inclusive scan and custom iterators.~~

Tested with all CUDA 11.x.
Workaround cub scan by using discard iterator in AUC.
Limit the size of Argsort when compiled with CUDA cub.

trivialfis · 2021-09-16T12:36:15Z

Eventually, we will have to move on to using system cub, with ctk 11.4 and custom cub there are lots of warnings. I think this might be the last version we can workaround it.

trivialfis · 2021-09-16T18:12:33Z

src/metric/auc.cu

      });
  // shrink down to pair
  auto fptp_it_out = thrust::make_transform_output_iterator(
-      dh::tbegin(d_fptp), [=] __device__(Triple const &t) {
-        return thrust::make_pair(thrust::get<1>(t), thrust::get<2>(t));
+      dh::TypedDiscard<Triple>{}, [d_fptp] __device__(Triple const &t) {


Originally this reduces the triple to pair, cub in CUDA 11.0~11.2 doesn't handle changed type. So here I just write the result in lambda and use discard iterator to keep the output type unchanged.

* For now we need to limit the number of items to INT32_MAX This is caused by latest RMM brings cub into its include path, so when XGBoost is compiled with RMM the paths conflict.

wbo4958 · 2021-09-22T02:28:52Z

@trivialfis Could you cherry-pick this PR to 1.4.0 branch?

trivialfis · 2021-09-22T03:57:31Z

No, I'm planning for a new release

Enable compiling with system cub. (dmlc#7232) See merge request nvspark/xgboost!394

trivialfis force-pushed the limit-cub-sort-size branch from be6dbc0 to ab7d37d Compare September 16, 2021 14:24

trivialfis mentioned this pull request Sep 16, 2021

Fix auc.cu compilation having no atomicAdd() instance overload. #7199

Closed

trivialfis commented Sep 16, 2021

View reviewed changes

hcho3 approved these changes Sep 17, 2021

View reviewed changes

trivialfis added 5 commits September 17, 2021 11:51

Allow using standard cub installation.

a0d2e87

* For now we need to limit the number of items to INT32_MAX This is caused by latest RMM brings cub into its include path, so when XGBoost is compiled with RMM the paths conflict.

Conditional.

ef77fce

Specialization.

008b012

Handle 11.2 too.

186936a

Use discard iterator instead.

5a3a1ae

trivialfis force-pushed the limit-cub-sort-size branch from ab7d37d to 5a3a1ae Compare September 17, 2021 03:51

trivialfis merged commit c311a8c into dmlc:master Sep 17, 2021

trivialfis deleted the limit-cub-sort-size branch September 17, 2021 06:28

NvTimLiu pushed a commit to NvTimLiu/spark-xgboost that referenced this pull request Nov 1, 2021

Enable compiling with system cub. (dmlc#7232)

f4d9df5

NvTimLiu added a commit to NvTimLiu/spark-xgboost that referenced this pull request Nov 1, 2021

Merge branch 'cub' into 'nv-release-1.4.0'

373fb5c

Enable compiling with system cub. (dmlc#7232) See merge request nvspark/xgboost!394

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable compiling with system cub. #7232

Enable compiling with system cub. #7232

trivialfis commented Sep 15, 2021 •

edited

Loading

trivialfis commented Sep 16, 2021 •

edited

Loading

trivialfis Sep 16, 2021

wbo4958 commented Sep 22, 2021

trivialfis commented Sep 22, 2021

Enable compiling with system cub. #7232

Enable compiling with system cub. #7232

Conversation

trivialfis commented Sep 15, 2021 • edited Loading

trivialfis commented Sep 16, 2021 • edited Loading

trivialfis Sep 16, 2021

Choose a reason for hiding this comment

wbo4958 commented Sep 22, 2021

trivialfis commented Sep 22, 2021

trivialfis commented Sep 15, 2021 •

edited

Loading

trivialfis commented Sep 16, 2021 •

edited

Loading