Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

[BUG] cudaErrorInvalidDeviceFunction: invalid device function #1737

Closed
tomilov opened this issue Jul 3, 2022 · 3 comments
Closed

[BUG] cudaErrorInvalidDeviceFunction: invalid device function #1737

tomilov opened this issue Jul 3, 2022 · 3 comments

Comments

@tomilov
Copy link
Contributor

tomilov commented Jul 3, 2022

Algorithm (namely transform, but others also can) failed in shared mode build. When linking is static (CMake's BUILD_SHARED_LIBS setting is OFF), then all is fine. Behavior doesn't depend on whether __host__ __device__ lambda is pased, or functor with __host__ __device__ operator () is passed.
Error message:

terminate called after throwing an instance of 'thrust::system::system_error'
what(): parallel_for failed: cudaErrorInvalidDeviceFunction: invalid device function

Settings: CMAKE_CUDA_HOST_COMPILER=clang++, CMAKE_CUDA_ARCHITECTURES=86, CMAKE_CUDA_FLAGS=--expt-relaxed-constexpr --extended-lambda, CMAKE_CUDA_RUNTIME_LIBRARY is STATIC or SHARED in conjunction with BUILD_SHARED_LIBS value, CMAKE_CUDA_RESOLVE_DEVICE_SYMBOLS is OFF, CMAKE_CUDA_SEPARABLE_COMPILATION is ON, CMAKE_POSITION_INDEPENDENT_CODE is ON, nvidia-smi said that CUDA is 11.7, Thrust configures as thrust_create_target(Thrust FROM_OPTIONS) with HOST_SYSTEM=CPP and DEVICE_SYSTEM=CUDA.

The problem arise when I moved function setTriangles from .cu file to .cuh and made it template function. From that point it began to be used from yet another (second) .so file. I suspect it resulted in generation of a CUDA instrumentation code in both .so files, but something goes wrong: either one of .sos didn't load theirs initialization code or both load, but somehow conflicting.

How can this be fixed?

@tomilov
Copy link
Contributor Author

tomilov commented Jul 3, 2022

I faced exactly this problem a couple of years ago: https://forums.developer.nvidia.com/t/invalid-device-function-error-when-using-thrust/75838. As you can see I use __host__ __device__ everywhere. Also using ordinary functor does not affect result.

@tomilov
Copy link
Contributor Author

tomilov commented Jul 4, 2022

During debugging some runs (w/o rebuild) are failed with the error, some of ones are succeeded. How CUDA instrumentation code works? Is there lazy PTX/cubin loading of some kind? May it be too late?
Is there a way to trace CUDA initialization?

@alliepiper
Copy link
Collaborator

Sorry for the late reply, I'm just catching up on github notifs today.

This is a known issue when using Thrust and CUB from shared libraries. See #1401 for more info and some workarounds. The "official" workaround is to use the macros in this header to wrap Thrust/CUB in a unique namespace per-library, but some users in #1401 have also reported compiler flags that worked for their situation.

Closing as a duplicate of #1401.

@alliepiper alliepiper closed this as not planned Won't fix, can't repro, duplicate, stale Jul 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants