[BUG] cudaErrorInvalidDeviceFunction: invalid device function #1737

tomilov · 2022-07-03T08:12:30Z

Algorithm (namely transform, but others also can) failed in shared mode build. When linking is static (CMake's BUILD_SHARED_LIBS setting is OFF), then all is fine. Behavior doesn't depend on whether __host__ __device__ lambda is pased, or functor with __host__ __device__ operator () is passed.
Error message:

terminate called after throwing an instance of 'thrust::system::system_error'
what(): parallel_for failed: cudaErrorInvalidDeviceFunction: invalid device function

Settings: CMAKE_CUDA_HOST_COMPILER=clang++, CMAKE_CUDA_ARCHITECTURES=86, CMAKE_CUDA_FLAGS=--expt-relaxed-constexpr --extended-lambda, CMAKE_CUDA_RUNTIME_LIBRARY is STATIC or SHARED in conjunction with BUILD_SHARED_LIBS value, CMAKE_CUDA_RESOLVE_DEVICE_SYMBOLS is OFF, CMAKE_CUDA_SEPARABLE_COMPILATION is ON, CMAKE_POSITION_INDEPENDENT_CODE is ON, nvidia-smi said that CUDA is 11.7, Thrust configures as thrust_create_target(Thrust FROM_OPTIONS) with HOST_SYSTEM=CPP and DEVICE_SYSTEM=CUDA.

The problem arise when I moved function setTriangles from .cu file to .cuh and made it template function. From that point it began to be used from yet another (second) .so file. I suspect it resulted in generation of a CUDA instrumentation code in both .so files, but something goes wrong: either one of .sos didn't load theirs initialization code or both load, but somehow conflicting.

How can this be fixed?

The text was updated successfully, but these errors were encountered:

tomilov · 2022-07-03T08:15:36Z

I faced exactly this problem a couple of years ago: https://forums.developer.nvidia.com/t/invalid-device-function-error-when-using-thrust/75838. As you can see I use __host__ __device__ everywhere. Also using ordinary functor does not affect result.

tomilov · 2022-07-04T05:14:42Z

During debugging some runs (w/o rebuild) are failed with the error, some of ones are succeeded. How CUDA instrumentation code works? Is there lazy PTX/cubin loading of some kind? May it be too late?
Is there a way to trace CUDA initialization?

alliepiper · 2022-07-25T21:51:12Z

Sorry for the late reply, I'm just catching up on github notifs today.

This is a known issue when using Thrust and CUB from shared libraries. See #1401 for more info and some workarounds. The "official" workaround is to use the macros in this header to wrap Thrust/CUB in a unique namespace per-library, but some users in #1401 have also reported compiler flags that worked for their situation.

Closing as a duplicate of #1401.

alliepiper closed this as not planned Won't fix, can't repro, duplicate, stale Jul 25, 2022

daniellengyel mentioned this issue Aug 9, 2023

[QST] Thrust::system:system_error rapidsai/raft#1721

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] cudaErrorInvalidDeviceFunction: invalid device function #1737

[BUG] cudaErrorInvalidDeviceFunction: invalid device function #1737

tomilov commented Jul 3, 2022

tomilov commented Jul 3, 2022

tomilov commented Jul 4, 2022 •

edited

Loading

alliepiper commented Jul 25, 2022

[BUG] cudaErrorInvalidDeviceFunction: invalid device function #1737

[BUG] cudaErrorInvalidDeviceFunction: invalid device function #1737

Comments

tomilov commented Jul 3, 2022

tomilov commented Jul 3, 2022

tomilov commented Jul 4, 2022 • edited Loading

alliepiper commented Jul 25, 2022

tomilov commented Jul 4, 2022 •

edited

Loading