[SYCL][CUDA][HIP] Support zero range kernel for cuda and hip backends. #7044

mmoadeli · 2022-10-13T09:33:43Z

Fixes issue 6963 to allow range zero kernel for cuda and hip backends.

steffenlarsen

Thank you for submitting a fix for this, @mmoadeli! This definitely seems like a bug, given the SYCL 2020 specification states that:

When the global size is zero, the kernel function is not executed, the local size is ignored, and any dependencies are satisfied.

I do however wonder if it would be better to handle this at the runtime level rather than in the individual plugins. Based on the new test I assume this is also currently not handled by the L0 plugin. @smaslov-intel - What do you think?

steffenlarsen · 2022-10-13T10:29:57Z

sycl/plugins/cuda/pi_cuda.cpp

@@ -2608,6 +2608,10 @@ pi_result cuda_piEnqueueKernelLaunch(
  assert(work_dim > 0);
  assert(work_dim < 4);

+  if (*global_work_size == 0) {
+    return PI_SUCCESS;
+  }


Doing an early exit here means that we do not create an event. I fear that could cause unexpected problems. Same goes for HIP.

@steffenlarsen I agree with you on this regarding the event.
If I move the condition (if (*global_work_size =! 0 )) to guard the call to PI_CHECK_ERROR(cuLaunchKernel( it will preserve the functionality related to event handling. Do you have any reservation doing that?
I have not tried the L0, but the opencl:cpu and esimd (and host in an earlier version of DP++) backend are tried and work without the need for any modifications.

Either that or you could copy the event creation to here, something like:

if (event) { std::unique_ptr<_pi_event> retImplEv{nullptr}; retImplEv = std::unique_ptr<_pi_event>(_pi_event::make_native( PI_COMMAND_TYPE_NDRANGE_KERNEL, command_queue)); retImplEv->start(); retImplEv->record(); *event = retImplEv.release(); }

I'm not sure it will be a correct implementation. A call with NDRange == 0 while should do nothing in terms of running the kernel, the event produced should be as if we run a kernel:

Completion of such an event should guarantee completion of events passed in event_wait_list

The event can be used in a event_wait_list of a subsequent enqueue.

So, I think that the more correct implementation would be just calling cuda_piEnqueueEventsWaitWithBarrier if NDRange == 0.

That is a good point. Now that there are multiple queues, recording an event is not enough to act like a barrier.

steffenlarsen · 2022-10-13T10:32:00Z

sycl/test/basic_tests/range_zero_size.cpp

+  queue q;
+  q.submit(
+      [&](handler &cgh) { cgh.parallel_for(range<1>(0), [=](id<1> i) {}); });
+}


Device-dependent testing should be in https://github.com/intel/llvm-test-suite rather than in the in-tree LIT tests. Could you please move this test to there?

if you agree, the device dependent part of the test to be removed. The same behaviour should be seen on all backends.

Even if that is the case, there is no guarantee that a device is available, so it could fail to create a queue.

romanovvlad · 2022-10-13T10:36:44Z

Thank you for submitting a fix for this, @mmoadeli! This definitely seems like a bug, given the SYCL 2020 specification states that:

When the global size is zero, the kernel function is not executed, the local size is ignored, and any dependencies are satisfied.

I do however wonder if it would be better to handle this at the runtime level rather than in the individual plugins. Based on the new test I assume this is also currently not handled by the L0 plugin. @smaslov-intel - What do you think?

Probably it's not worth optimizing, but the SYCL RT can emulate the required behavior by submitting an RT barrier command only, while plugins can do something better.

…zero. - Removes range_zero_size.cpp, this to be added in to sycl-test-suit repo.

mmoadeli · 2022-11-04T14:23:32Z

@romanovvlad @steffenlarsen I'd be happy to address any potential issues on this PR to have it merged.

steffenlarsen

Sorry for the delay! Changes look good. Where did you move the tests to?

mmoadeli · 2022-11-04T14:47:41Z

Sorry for the delay! Changes look good. Where did you move the tests to?
@steffenlarsen I had a test here, but I was advised to move it to llvm-tests-suite. I was waiting for this ones approval and potential merge to add the test to llvm-tests-suies

The test can be found in first commit 719c6b8

steffenlarsen · 2022-11-04T14:55:14Z

Sorry for the delay! Changes look good. Where did you move the tests to?
@steffenlarsen I had a test here, but I was advised to move it to llvm-tests-suite. I was waiting for this ones approval and potential merge to add the test to llvm-tests-suies

The test can be found in first commit 719c6b8

It would be preferable if you can open a PR on the test-suite with the test. We can trigger testing with it on this PR, albeit only on L0 and OCL.

mmoadeli · 2022-11-04T15:19:17Z

@steffenlarsen Test Suite PR 1363

steffenlarsen · 2022-11-04T15:35:25Z

/verify with intel/llvm-test-suite#1363

sycl/plugins/cuda/pi_cuda.cpp

sycl/plugins/hip/pi_hip.cpp

Co-authored-by: Romanov Vlad <vlad.romanov@intel.com>

faust403 · 2023-04-24T15:49:40Z

i have got same problem:

terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): Native API failed. Native API returns: -30 (PI_ERROR_INVALID_VALUE) -30 (PI_ERROR_INVALID_VALUE)

code:
Queue.submit([&](sycl::handler& Handler) -> void { Handler.parallel_for<class ParallelFor>(sycl::range<1>{ Threads }, [=](sycl::id<1> Id) -> void { }); }).wait();

It does not falling when i removing parallel_for from submit. How can i fix it? Queue is under cpu_selector_v

[SYCL][CUDA][HIP] Suppor zero range kernel for cuda and hip backends.

719c6b8

mmoadeli requested review from a team as code owners October 13, 2022 09:33

mmoadeli requested a review from steffenlarsen October 13, 2022 09:33

steffenlarsen changed the title ~~[SYCL][CUDA][HIP] Suppor zero range kernel for cuda and hip backends.~~ [SYCL][CUDA][HIP] Support zero range kernel for cuda and hip backends. Oct 13, 2022

steffenlarsen reviewed Oct 13, 2022

View reviewed changes

mmoadeli added 2 commits October 17, 2022 10:36

Merge branch 'sycl' into zero_range

1eabb73

[SYCL][Test] Address the event handling issue for cuda and hip range …

70e06e3

…zero. - Removes range_zero_size.cpp, this to be added in to sycl-test-suit repo.

mmoadeli mentioned this pull request Oct 17, 2022

[SYCL][CUDA][HIP] Crashes with range zero SYCL kernel #6963

Closed

steffenlarsen approved these changes Nov 4, 2022

View reviewed changes

mmoadeli mentioned this pull request Nov 4, 2022

[SYCL] Add test for zero size range kernels. intel/llvm-test-suite#1363

Merged

mmoadeli added 2 commits November 5, 2022 00:06

Merge remote-tracking branch 'upstream/sycl' into zero_range

6678c49

Merge branch 'zero_range' of github.com:mmoadeli/llvm into zero_range

e7a69d8

bader requested a review from romanovvlad November 7, 2022 12:04

romanovvlad reviewed Nov 7, 2022

View reviewed changes

sycl/plugins/cuda/pi_cuda.cpp Outdated Show resolved Hide resolved

sycl/plugins/hip/pi_hip.cpp Outdated Show resolved Hide resolved

mmoadeli and others added 2 commits November 7, 2022 12:46

Update sycl/plugins/hip/pi_hip.cpp

a76ac27

Co-authored-by: Romanov Vlad <vlad.romanov@intel.com>

Update sycl/plugins/cuda/pi_cuda.cpp

02bf9ae

Co-authored-by: Romanov Vlad <vlad.romanov@intel.com>

romanovvlad approved these changes Nov 7, 2022

View reviewed changes

[SYCL] Fix style issues.

207f2b2

pvchupin merged commit a395886 into intel:sycl Nov 10, 2022

mmoadeli deleted the zero_range branch July 7, 2023 10:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][CUDA][HIP] Support zero range kernel for cuda and hip backends. #7044

[SYCL][CUDA][HIP] Support zero range kernel for cuda and hip backends. #7044

mmoadeli commented Oct 13, 2022 •

edited

Loading

steffenlarsen left a comment

steffenlarsen Oct 13, 2022

mmoadeli Oct 14, 2022 •

edited

Loading

steffenlarsen Oct 14, 2022

romanovvlad Oct 14, 2022

steffenlarsen Oct 14, 2022

steffenlarsen Oct 13, 2022

mmoadeli Oct 13, 2022 •

edited

Loading

steffenlarsen Oct 13, 2022

romanovvlad commented Oct 13, 2022

mmoadeli commented Nov 4, 2022 •

edited

Loading

steffenlarsen left a comment

mmoadeli commented Nov 4, 2022 •

edited

Loading

steffenlarsen commented Nov 4, 2022

mmoadeli commented Nov 4, 2022

steffenlarsen commented Nov 4, 2022

faust403 commented Apr 24, 2023 •

edited

Loading

[SYCL][CUDA][HIP] Support zero range kernel for cuda and hip backends. #7044

[SYCL][CUDA][HIP] Support zero range kernel for cuda and hip backends. #7044

Conversation

mmoadeli commented Oct 13, 2022 • edited Loading

steffenlarsen left a comment

Choose a reason for hiding this comment

steffenlarsen Oct 13, 2022

Choose a reason for hiding this comment

mmoadeli Oct 14, 2022 • edited Loading

Choose a reason for hiding this comment

steffenlarsen Oct 14, 2022

Choose a reason for hiding this comment

romanovvlad Oct 14, 2022

Choose a reason for hiding this comment

steffenlarsen Oct 14, 2022

Choose a reason for hiding this comment

steffenlarsen Oct 13, 2022

Choose a reason for hiding this comment

mmoadeli Oct 13, 2022 • edited Loading

Choose a reason for hiding this comment

steffenlarsen Oct 13, 2022

Choose a reason for hiding this comment

romanovvlad commented Oct 13, 2022

mmoadeli commented Nov 4, 2022 • edited Loading

steffenlarsen left a comment

Choose a reason for hiding this comment

mmoadeli commented Nov 4, 2022 • edited Loading

steffenlarsen commented Nov 4, 2022

mmoadeli commented Nov 4, 2022

steffenlarsen commented Nov 4, 2022

faust403 commented Apr 24, 2023 • edited Loading

mmoadeli commented Oct 13, 2022 •

edited

Loading

mmoadeli Oct 14, 2022 •

edited

Loading

mmoadeli Oct 13, 2022 •

edited

Loading

mmoadeli commented Nov 4, 2022 •

edited

Loading

mmoadeli commented Nov 4, 2022 •

edited

Loading

faust403 commented Apr 24, 2023 •

edited

Loading