-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ParticleIDWrapper::make_invalid()
#3735
Conversation
c9b0c39
to
c5ef706
Compare
ParticleIDWrapper::negate()
ParticleIDWrapper::flip_valid()
and ::is_valid()
1e5d663
to
2d9510f
Compare
ParticleIDWrapper::flip_valid()
and ::is_valid()
ParticleIDWrapper::make_valid()
ParticleIDWrapper::make_valid()
ParticleIDWrapper::make_invalid()
A cheaper and explicit way to swap validity sign on particle ids. Not the same as `id = -id`, but also reversible.
@WeiqunZhang @atmyers @AlexanderSinn ready for review now - let me know if this looks legit |
We will wait till the next release tomorrow. |
It's just adding and now changing stuff, so it should be pretty safe, but I will also just need it after the release tomorrow, so no rush. |
As another optimization, I explored using 32bit registers via tricks like: bool is_valid () const noexcept
{
// the leftmost bit is our id's inverse sign
auto const * const i32 = (uint32_t*)&m_idata;
return *i32 >> 31;
} This does what one expects on CPU (DWORD over QWORD, 32bit register used over 64bit one) and on CUDA GPUs (SM_80) it demotes a 64bit load to a 32bit one & reduces one |
Replacing a 64 bit load with a 32 bit one? Don’t do this, the 64 bit load would be coalesced but the 32 bit load not because there would be a gap to the next thread. The 32 bit version might be slower. |
You are right. Yeah, I though to load coalesced and then copy into a 32bit register, do rest of ops there... but this micro-optimization seems not worth it. |
Summary
A cheaper way to swap validity sign on particle ids, as needed to select and track particles from one kernel to another (e.g., boundary condition treatment, re-emission physics, scraping of particles, etc.).
With our current encoding,
ParticleIDWrapper::make_invalid()
is the same asid = -id
, but cheaper.Improvements:
Additional background
Host Code
https://godbolt.org/z/KPjzExWz1
CUDA Device Code
PTX: https://godbolt.org/z/6En5rK14o
SASS for SM_80: https://godbolt.org/z/d6zYfxaKG
id = -id
: now saves 4 registers 🎉Interesting: there are still no 64bit shifts / but shuffles in CUDA hardware...
Checklist
The proposed changes: