-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow JIT Compile to, and Link from, LTOIR for cuda source input #48
Conversation
Co-authored-by: Graham Markall <535640+gmarkall@users.noreply.github.com>
# TODO: when linker is configured to generate LTOIR, assembly is not | ||
# directly visible via the Linker API. We should provide `-ptx` as | ||
# an additional flag to nvjitlink to generate PTX. However, this | ||
# could break the linking pipeline. We need to investigate this further. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To get PTX when doing LTO, get_cubin()
will need to run the linker twice, once without "-ptx"
(as it does now) and once again with "-ptx"
(additionally) to get the PTX - it can then dump it in a similar way to how it's dumped here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a set of changes on top of #23 this looks good - if you can add the logic to dump assembly correctly when using LTO (and some tests) I think that should be enough for this PR. Do you see anything else that might be needed?
Without looking very deep into the code base, where is
Ok sounds good. |
numba_cuda/numba/cuda/codegen.py
Outdated
ptx = linker.get_linked_ptx().decode('utf-8') | ||
|
||
if config.DUMP_ASSEMBLY: | ||
print(("ASSEMBLY (AFTER LTO) %s" % self._name).center(80, '-')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If dump_assembly=1
, the pipeline will dump PTX twice: once before LTO, and once after LTO. I think it might be helpful to keep both results. So I added (AFTER LTO)
here for clarity.
Good point, I think it's missing - I think when #23 adds tests that would have become apparent, and it will be required. |
In 159e77e I added the line to make lto enabled by default. |
Closing in favour of #62, which is based on a recent |
Depends on #23
This PR adds the functionality supporting kernel and FFI functions being JIT Compile to and link from LTOIR, allowing better optimization when foreign function is used in Numba-cuda.