-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][ESIMD][EMU] Atomic update #6661
[SYCL][ESIMD][EMU] Atomic update #6661
Conversation
dongkyunahn-intel
commented
Aug 29, 2022
- __esimd_svm_atomic0/1/2
- cmpxchng argument order fix (New value first, expected value second)
- atomic_add/sub/min/max/cmpxchg update
'dword_atomic_smoke.cpp' from intel/llvm-test-suite fails with infinite looping. Still debugging. |
The test is disabled in intel/llvm-test-suite as the test fails for GPU as well - intel/llvm-test-suite#1185 |
template <typename Ty> Ty atomic_load(Ty *ptr) { | ||
#ifdef _WIN32 | ||
// TODO: Windows will be supported soon | ||
__ESIMD_UNSUPPORTED_ON_HOST; | ||
#else | ||
return __atomic_load(ptr, __ATOMIC_SEQ_CST); | ||
return __atomic_load_n((CmpxchgTy<Ty> *)ptr, __ATOMIC_SEQ_CST); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type __atomic_load_n (type *ptr, int memorder)
so AFAIU return __atomic_load_n((CmpxchgTy<Ty> *)ptr, __ATOMIC_SEQ_CST);
does type conversion from CmpxchgTy<Ty>
to Ty
before returning the result, which is wrong. Do you have tests for this? They should have caught this problem.
return __atomic_load_n((CmpxchgTy<Ty> *)ptr, __ATOMIC_SEQ_CST); | |
return sycl::bit_cast<Ty>(__atomic_load_n((CmpxchgTy<Ty> *)ptr, __ATOMIC_SEQ_CST)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a unit test for this. I'll put TODO comments with __ESIMD_UNSUPPORTED_ON_HOST
.
/verify with intel/llvm-test-suite#1259 |
@@ -1879,7 +1808,9 @@ __ESIMD_INTRIN void __esimd_lsc_fence(__ESIMD_DNS::simd_mask_storage_t<N> pred) | |||
; | |||
#else // __SYCL_DEVICE_ONLY__ | |||
{ | |||
__ESIMD_UNSUPPORTED_ON_HOST; | |||
// In ESIMD_EMULATOR device interface, write operations are applied |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is extremely dangerous, and can be a source of obscure bugs.
Emulator uses multiple threads to emulate GPU threads, so memory fence can't be a no-op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed - built-in __atomic_thread_fence()
is called.
/verify with intel/llvm-test-suite#1274 |
1 similar comment
/verify with intel/llvm-test-suite#1274 |
Failures from ESIMD_EMULATOR are already handled in intel/llvm-test-suite#1274
|
@@ -30,52 +39,87 @@ template <typename Ty> Ty atomic_store(Ty *ptr, Ty val) { | |||
// TODO: Windows will be supported soon | |||
__ESIMD_UNSUPPORTED_ON_HOST; | |||
#else | |||
__atomic_store(ptr, val, __ATOMIC_SEQ_CST); | |||
Ty ret = atomic_load<Ty>((CmpxchgTy<Ty> *)ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks strange that we need atomic_load as a part of atomic_store implementation, but that's how Gen ISA is :)
no action for this comment is needed
@@ -30,52 +39,87 @@ template <typename Ty> Ty atomic_store(Ty *ptr, Ty val) { | |||
// TODO: Windows will be supported soon | |||
__ESIMD_UNSUPPORTED_ON_HOST; | |||
#else | |||
__atomic_store(ptr, val, __ATOMIC_SEQ_CST); | |||
Ty ret = atomic_load<Ty>((CmpxchgTy<Ty> *)ptr); | |||
__atomic_store_n((CmpxchgTy<Ty> *)ptr, val, __ATOMIC_SEQ_CST); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: strictly speaking, reinterpret_cast should be used. Can be fixed later.
do { | ||
_old = *ptr; | ||
_new = std::max(_old, val); | ||
_old_bits = *(CmpxchgTy<Ty> *)&_old; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and in many more places:
Nit: sycl::bitcast should be used instead. Can be fixed later
lsc_slm/surf/usm will be fixed once argument order for cmpxchg is confirmed. Other failures are handled in intel/llvm-test-suite#1274.
|
/verify with intel/llvm-test-suite#1274 |
1 similar comment
/verify with intel/llvm-test-suite#1274 |
- __esimd_svm_atomic0/1/2 - cmpxchng argument order fix (New value first, expected value second) - atomic_add/sub/min/max/cmpxchg update
Co-authored-by: kbobrovs <Konstantin.S.Bobrovsky@intel.com>
- atomic intrinsic update and return value fix
- To be enabled later with unit test
- To prevent potential 'odr' failures from 'atomic_*' functions instantiated with same template argument
584db3c
to
b6ae426
Compare
Failures from 'SYCL / Linux / ESIMD Emu LLVM Test Suite (pull_request_target)'
These tests are modified in intel/llvm-test-suite#1274 |
/verify |