Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][CUDA][HIP] Support zero range kernel for cuda and hip backends. #7044

Merged
merged 8 commits into from
Nov 10, 2022

Conversation

mmoadeli
Copy link
Contributor

@mmoadeli mmoadeli commented Oct 13, 2022

  • Fixes issue 6963 to allow range zero kernel for cuda and hip backends.

@mmoadeli mmoadeli requested review from a team as code owners October 13, 2022 09:33
@steffenlarsen steffenlarsen changed the title [SYCL][CUDA][HIP] Suppor zero range kernel for cuda and hip backends. [SYCL][CUDA][HIP] Support zero range kernel for cuda and hip backends. Oct 13, 2022
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for submitting a fix for this, @mmoadeli! This definitely seems like a bug, given the SYCL 2020 specification states that:

When the global size is zero, the kernel function is not executed, the local size is ignored, and any dependencies are satisfied.

I do however wonder if it would be better to handle this at the runtime level rather than in the individual plugins. Based on the new test I assume this is also currently not handled by the L0 plugin. @smaslov-intel - What do you think?

@@ -2608,6 +2608,10 @@ pi_result cuda_piEnqueueKernelLaunch(
assert(work_dim > 0);
assert(work_dim < 4);

if (*global_work_size == 0) {
return PI_SUCCESS;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing an early exit here means that we do not create an event. I fear that could cause unexpected problems. Same goes for HIP.

Copy link
Contributor Author

@mmoadeli mmoadeli Oct 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steffenlarsen I agree with you on this regarding the event.
If I move the condition (if (*global_work_size =! 0 )) to guard the call to PI_CHECK_ERROR(cuLaunchKernel( it will preserve the functionality related to event handling. Do you have any reservation doing that?
I have not tried the L0, but the opencl:cpu and esimd (and host in an earlier version of DP++) backend are tried and work without the need for any modifications.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either that or you could copy the event creation to here, something like:

if (event) {
  std::unique_ptr<_pi_event> retImplEv{nullptr};
  retImplEv = std::unique_ptr<_pi_event>(_pi_event::make_native(
      PI_COMMAND_TYPE_NDRANGE_KERNEL, command_queue));
  retImplEv->start();
  retImplEv->record();
  *event = retImplEv.release();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it will be a correct implementation. A call with NDRange == 0 while should do nothing in terms of running the kernel, the event produced should be as if we run a kernel:

  1. Completion of such an event should guarantee completion of events passed in event_wait_list
  2. The event can be used in a event_wait_list of a subsequent enqueue.

So, I think that the more correct implementation would be just calling cuda_piEnqueueEventsWaitWithBarrier if NDRange == 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point. Now that there are multiple queues, recording an event is not enough to act like a barrier.

queue q;
q.submit(
[&](handler &cgh) { cgh.parallel_for(range<1>(0), [=](id<1> i) {}); });
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Device-dependent testing should be in https://github.com/intel/llvm-test-suite rather than in the in-tree LIT tests. Could you please move this test to there?

Copy link
Contributor Author

@mmoadeli mmoadeli Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you agree, the device dependent part of the test to be removed. The same behaviour should be seen on all backends.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if that is the case, there is no guarantee that a device is available, so it could fail to create a queue.

@romanovvlad
Copy link
Contributor

Thank you for submitting a fix for this, @mmoadeli! This definitely seems like a bug, given the SYCL 2020 specification states that:

When the global size is zero, the kernel function is not executed, the local size is ignored, and any dependencies are satisfied.

I do however wonder if it would be better to handle this at the runtime level rather than in the individual plugins. Based on the new test I assume this is also currently not handled by the L0 plugin. @smaslov-intel - What do you think?

Probably it's not worth optimizing, but the SYCL RT can emulate the required behavior by submitting an RT barrier command only, while plugins can do something better.

…zero.

- Removes range_zero_size.cpp, this to be added in to sycl-test-suit repo.
@mmoadeli
Copy link
Contributor Author

mmoadeli commented Nov 4, 2022

@romanovvlad @steffenlarsen I'd be happy to address any potential issues on this PR to have it merged.

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay! Changes look good. Where did you move the tests to?

@mmoadeli
Copy link
Contributor Author

mmoadeli commented Nov 4, 2022

Sorry for the delay! Changes look good. Where did you move the tests to?
@steffenlarsen I had a test here, but I was advised to move it to llvm-tests-suite. I was waiting for this ones approval and potential merge to add the test to llvm-tests-suies

The test can be found in first commit 719c6b8

@steffenlarsen
Copy link
Contributor

Sorry for the delay! Changes look good. Where did you move the tests to?
@steffenlarsen I had a test here, but I was advised to move it to llvm-tests-suite. I was waiting for this ones approval and potential merge to add the test to llvm-tests-suies

The test can be found in first commit 719c6b8

It would be preferable if you can open a PR on the test-suite with the test. We can trigger testing with it on this PR, albeit only on L0 and OCL.

@mmoadeli
Copy link
Contributor Author

mmoadeli commented Nov 4, 2022

@steffenlarsen
Copy link
Contributor

/verify with intel/llvm-test-suite#1363

@bader bader requested a review from romanovvlad November 7, 2022 12:04
sycl/plugins/cuda/pi_cuda.cpp Outdated Show resolved Hide resolved
sycl/plugins/hip/pi_hip.cpp Outdated Show resolved Hide resolved
mmoadeli and others added 2 commits November 7, 2022 12:46
Co-authored-by: Romanov Vlad <vlad.romanov@intel.com>
Co-authored-by: Romanov Vlad <vlad.romanov@intel.com>
@pvchupin pvchupin merged commit a395886 into intel:sycl Nov 10, 2022
@faust403
Copy link

faust403 commented Apr 24, 2023

i have got same problem:

terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): Native API failed. Native API returns: -30 (PI_ERROR_INVALID_VALUE) -30 (PI_ERROR_INVALID_VALUE)

code:
Queue.submit([&](sycl::handler& Handler) -> void { Handler.parallel_for<class ParallelFor>(sycl::range<1>{ Threads }, [=](sycl::id<1> Id) -> void { }); }).wait();

It does not falling when i removing parallel_for from submit. How can i fix it? Queue is under cpu_selector_v

@mmoadeli mmoadeli deleted the zero_range branch July 7, 2023 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants