-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt at including offsets in kernel launch #399
base: main
Are you sure you want to change the base?
Conversation
Thanks for the initial implementation I will have to think about this a bit.
|
@timholy might also be able to offer advise. IIUC you are trying to implement an exterior/interior iteration split like |
@vchuravy following this as ability to handle ranges passed to kernels is also a feature that we would necessitate (FD MPI code) to allow for communication computation overlap (in a similar way as pointed out by @simone-silvestri). |
You can do this right now as you would do with CUDA.jl/AMDGPU.jl by projecting a smaller ndrange to your custom index space. This is more about if we can do something like that automatically. @lcw I think had some code that does this for his DG code |
Yeah - having something more automatised could be a nice thing. @utkinis may have a small MWE on what we did recently which would be handy to have as well in KA (similar to the proposed thing). |
This PR tries to include offsets in kernel launches so that the
Global
indices returned by@index(Global, NTuple)
and@index(Global, Linear)
are offset by anoffset
argument.Example:
where the last argument
(-1, -2)
is the offsets to the global indices.This PR constrains the offsetting of global indices on static kernel size at launch.
@vchuravy I found it a bit difficult to implement arbitrary indices because of the division in blocks, which would have to be rethought. Aka, this is the easiest (probably not the most general) implementation of offsets. Let me know if you would rather it be implemented in another way.