Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow JIT Compile to, and Link from, LTOIR for cuda source input #48

Closed
wants to merge 29 commits into from

Conversation

isVoid
Copy link
Collaborator

@isVoid isVoid commented Aug 23, 2024

Depends on #23

This PR adds the functionality supporting kernel and FFI functions being JIT Compile to and link from LTOIR, allowing better optimization when foreign function is used in Numba-cuda.

Copy link

copy-pr-bot bot commented Aug 23, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@isVoid isVoid changed the base branch from main to develop August 23, 2024 05:20
@isVoid isVoid changed the title Allow JIT to, and Link from, LTOIR for cuda source input Allow JIT Compile to, and Link from, LTOIR for cuda source input Aug 23, 2024
# TODO: when linker is configured to generate LTOIR, assembly is not
# directly visible via the Linker API. We should provide `-ptx` as
# an additional flag to nvjitlink to generate PTX. However, this
# could break the linking pipeline. We need to investigate this further.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get PTX when doing LTO, get_cubin() will need to run the linker twice, once without "-ptx" (as it does now) and once again with "-ptx" (additionally) to get the PTX - it can then dump it in a similar way to how it's dumped here.

Copy link
Collaborator

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a set of changes on top of #23 this looks good - if you can add the logic to dump assembly correctly when using LTO (and some tests) I think that should be enough for this PR. Do you see anything else that might be needed?

@gmarkall gmarkall added the 4 - Waiting on author Waiting for author to respond to review label Aug 23, 2024
@isVoid
Copy link
Collaborator Author

isVoid commented Aug 23, 2024

Do you see anything else that might be needed?

Without looking very deep into the code base, where is Linker(lto=True) set in the pipeline? Are they enabled by default through @cuda.jit?

As a set of changes on top of #23 this looks good - if you can add the logic to dump assembly correctly when using LTO (and some tests) I think that should be enough for this PR.

Ok sounds good.

ptx = linker.get_linked_ptx().decode('utf-8')

if config.DUMP_ASSEMBLY:
print(("ASSEMBLY (AFTER LTO) %s" % self._name).center(80, '-'))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If dump_assembly=1, the pipeline will dump PTX twice: once before LTO, and once after LTO. I think it might be helpful to keep both results. So I added (AFTER LTO) here for clarity.

@gmarkall
Copy link
Collaborator

Without looking very deep into the code base, where is Linker(lto=True) set in the pipeline? Are they enabled by default through @cuda.jit?

Good point, I think it's missing - I think when #23 adds tests that would have become apparent, and it will be required.

@gmarkall gmarkall mentioned this pull request Sep 9, 2024
@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Sep 9, 2024
@isVoid
Copy link
Collaborator Author

isVoid commented Sep 26, 2024

Without looking very deep into the code base, where is Linker(lto=True) set in the pipeline? Are they enabled by default through @cuda.jit?

Good point, I think it's missing - I think when #23 adds tests that would have become apparent, and it will be required.

In 159e77e I added the line to make lto enabled by default.

@gmarkall gmarkall added the develop A PR targeted at the develop branch that will need moving to main label Oct 21, 2024
@gmarkall gmarkall added this to the v0.0.18 milestone Oct 21, 2024
@gmarkall
Copy link
Collaborator

Closing in favour of #62, which is based on a recent main.

@gmarkall gmarkall closed this Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on reviewer Waiting for reviewer to respond to author develop A PR targeted at the develop branch that will need moving to main
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants