Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA_ERROR_NOT_INITIALIZED from reduce call #1842

Open
jonasdelacour opened this issue Sep 12, 2024 · 1 comment
Open

CUDA_ERROR_NOT_INITIALIZED from reduce call #1842

jonasdelacour opened this issue Sep 12, 2024 · 1 comment
Labels

Comments

@jonasdelacour
Copy link

jonasdelacour commented Sep 12, 2024

I previously posted about 1 issue related to static default policies in the oneapi/dpl headers, which setting the environment variable ONEDPL_USE_PREDEFINED_POLICIES 0 fixed.

I have now obtained the same issue in a slightly different way:

#define ONEDPL_USE_PREDEFINED_POLICIES 0
#include <oneapi/dpl/algorithm>
#include <oneapi/dpl/execution>
#include <unistd.h>
#include <sys/wait.h>
#include <vector>

int main(){
    {
        pid_t pid = fork();
        sycl::queue Q(sycl::gpu_selector_v);
        auto policy = oneapi::dpl::execution::make_device_policy(Q);
        std::vector<int> vec = {1, 2, 3, 4, 5};
        oneapi::dpl::reduce(policy, vec.begin(), vec.end());
        waitpid(pid, NULL, 0);
    }

    {
        pid_t pid = fork();
        waitpid(pid, NULL, 0);
    }
    return 0;
}

This again produces the error:

UR CUDA ERROR:
        Value:           3
        Name:            CUDA_ERROR_NOT_INITIALIZED
        Description:     initialization error
        Function:        setContext
        Source Location: /tmp/tmp.HhqPmzG672/intel-llvm-mirror/build/_deps/unified-runtime-src/source/adapters/cuda/context.hpp:142

Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES)

UR CUDA ERROR:
        Value:           3
        Name:            CUDA_ERROR_NOT_INITIALIZED
        Description:     initialization error
        Function:        setContext
        Source Location: /tmp/tmp.HhqPmzG672/intel-llvm-mirror/build/_deps/unified-runtime-src/source/adapters/cuda/context.hpp:142

Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES)

Without the code in the second set of braces the program no longer produces any runtime errors. Crucially if no call to oneapi::dpl::reduce is made the program produces no runtime errors. This suggests to me that this function somehow leaks shared pointers to a sycl::queue or something similar that would prevent the release of the underlying CUDA context.

ICPX version:

Intel(R) oneAPI DPC++/C++ Compiler 2024.2.1 (2024.2.1.20240711)

compilation command:

icpx minimum_crash.cc -fsycl -fsycl-targets=nvptx64-nvidia-cuda
@mmichel11
Copy link
Contributor

Hi, @jonasdelacour

I was able to reproduce your issue with the CUDA backend. I was also able to reproduce it independent of oneDPL using just pure SYCL code:

#include <unistd.h>
#include <sys/wait.h>
#include <vector>
#include <sycl/sycl.hpp>

int main(){
    {
        pid_t pid = fork();
        sycl::queue Q(sycl::gpu_selector_v);
        std::vector<int> vec = {1, 2, 3, 4, 5};
        {
            sycl::buffer<int> buf(vec.data(), vec.size());
            Q.submit([&](sycl::handler& cgh) {
                        auto acc = buf.get_access<sycl::access::mode::read_write>(cgh);
                        cgh.parallel_for(sycl::range<1>(vec.size()), [=](sycl::item<1> it) {
                            acc[it] += 1;
                        });
                    });
        }
        waitpid(pid, NULL, 0);
    }

    {
        pid_t pid = fork();
        waitpid(pid, NULL, 0);
    }
    return 0;
}
UR CUDA ERROR:
        Value:           3
        Name:            CUDA_ERROR_NOT_INITIALIZED
        Description:     initialization error
        Function:        setContext
        Source Location: /tmp/tmp.IpEAV9Rdzp/intel-llvm-mirror/build/_deps/unified-runtime-src/source/adapters/cuda/context.hpp:142

Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES)

UR CUDA ERROR:
        Value:           3
        Name:            CUDA_ERROR_NOT_INITIALIZED
        Description:     initialization error
        Function:        setContext
        Source Location: /tmp/tmp.IpEAV9Rdzp/intel-llvm-mirror/build/_deps/unified-runtime-src/source/adapters/cuda/context.hpp:142

Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES)

This seems to be a general problem with the SYCL CUDA backend as opposed to an issue within oneDPL, so I would recommend filing an issue to https://github.com/intel/llvm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants