Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark CUDA 10.1 as unsupported. #4264

Closed
trivialfis opened this issue Mar 16, 2019 · 10 comments · Fixed by #4265
Closed

Mark CUDA 10.1 as unsupported. #4264

trivialfis opened this issue Mar 16, 2019 · 10 comments · Fixed by #4265

Comments

@trivialfis
Copy link
Member

trivialfis commented Mar 16, 2019

Issue is the same in #4223. Spliter of NVCC might have some problems with pointers, running this in HostDeviceVectorImpl::Copy:

      LOG(DEBUG) << "other->Distribution(): " << other->Distribution();
      LOG(DEBUG) << "other.Distribution(): " << (*other).Distribution();

returns different result. The bug should be reproducible by simply running unittests with CUDA 10.1 on Ubuntu 18.10. Instead of working around it like the last time I did, I think it's more appropriate to explicitly mark CUDA 10.1 is not being supported, or at least we need to mark the nvcc included in CUDA 10.1 is not supported.

Sadly, it's currently the only version that doesn't break my machine ... @RAMitchell WDYT?

@RAMitchell
Copy link
Member

Yes we may have to wait for another cuda update. Do what you think is best.

@hcho3
Copy link
Collaborator

hcho3 commented Mar 17, 2019

@trivialfis I was about to include CUDA 10.1 as one of the targets in my upcoming PR for CI refactor. Should we leave it out for the time being?

@trivialfis
Copy link
Member Author

@hcho3 yes. Let's skip this version.

@trivialfis trivialfis changed the title Mark CUDA 10.1 Unsupported. Mark CUDA 10.1 as unsupported. Mar 17, 2019
@rongou
Copy link
Contributor

rongou commented Mar 18, 2019

This problem seems to be specific to gcc. I tried clang-7, it seems to work fine.

@trivialfis
Copy link
Member Author

@rongou I'm not sure how did you compile XGBoost with clang, assuming you are referring to the non-apple clang. There was a type deduction in dmlc core logging facility which tricks clang-7.

@rongou
Copy link
Contributor

rongou commented Mar 19, 2019

@trivialfis not sure, I built the jvm packages with cuda enabled using clang-7 and it seems to run fine. What is the logging issue? Maybe it only affects cli or python?

@trivialfis
Copy link
Member Author

@rongou Drap to the bottom of clang-tidy test: https://xgboost-ci.net/blue/organizations/jenkins/xgboost/detail/PR-4149/12/pipeline

The type of &std:free can not be deduced. Same with clang. The problem might be in std::free implementation if you are using libc++ instead of libstdc++. ; )

@rongou
Copy link
Contributor

rongou commented Mar 19, 2019

Yeah I installed libc++ when I installed clang-7. I guess clang uses it by default? If you are using clang-tidy, maybe it's not a crazy idea to go all in with clang.

Anyway, the CUDA bug should be fixed in the next patch update, if all goes well.

@jamesdalg
Copy link

If it's possible, can someone post some detailed documentation as to how to work around this issue with visual studio 2017 and cmake, with any version that works currently?

@rongou
Copy link
Contributor

rongou commented Apr 16, 2019

You can use CUDA 10.0: https://developer.nvidia.com/cuda-10.0-download-archive

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants