Allow JIT Compile to, and Link from, LTOIR for cuda source input #48

isVoid · 2024-08-23T05:16:40Z

Depends on #23

This PR adds the functionality supporting kernel and FFI functions being JIT Compile to and link from LTOIR, allowing better optimization when foreign function is used in Numba-cuda.

Co-authored-by: Graham Markall <535640+gmarkall@users.noreply.github.com>

copy-pr-bot · 2024-08-23T05:16:44Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

gmarkall · 2024-08-23T08:04:47Z

numba_cuda/numba/cuda/cudadrv/driver.py

+            # TODO: when linker is configured to generate LTOIR, assembly is not
+            # directly visible via the Linker API. We should provide `-ptx` as
+            # an additional flag to nvjitlink to generate PTX. However, this
+            # could break the linking pipeline. We need to investigate this further.


To get PTX when doing LTO, get_cubin() will need to run the linker twice, once without "-ptx" (as it does now) and once again with "-ptx" (additionally) to get the PTX - it can then dump it in a similar way to how it's dumped here.

gmarkall

As a set of changes on top of #23 this looks good - if you can add the logic to dump assembly correctly when using LTO (and some tests) I think that should be enough for this PR. Do you see anything else that might be needed?

isVoid · 2024-08-23T11:38:45Z

Do you see anything else that might be needed?

Without looking very deep into the code base, where is Linker(lto=True) set in the pipeline? Are they enabled by default through @cuda.jit?

As a set of changes on top of #23 this looks good - if you can add the logic to dump assembly correctly when using LTO (and some tests) I think that should be enough for this PR.

Ok sounds good.

isVoid · 2024-08-26T01:10:56Z

numba_cuda/numba/cuda/codegen.py

+            ptx = linker.get_linked_ptx().decode('utf-8')
+
+            if config.DUMP_ASSEMBLY:
+                print(("ASSEMBLY (AFTER LTO) %s" % self._name).center(80, '-'))


If dump_assembly=1, the pipeline will dump PTX twice: once before LTO, and once after LTO. I think it might be helpful to keep both results. So I added (AFTER LTO) here for clarity.

gmarkall · 2024-08-27T09:58:42Z

Without looking very deep into the code base, where is Linker(lto=True) set in the pipeline? Are they enabled by default through @cuda.jit?

Good point, I think it's missing - I think when #23 adds tests that would have become apparent, and it will be required.

isVoid · 2024-09-26T09:46:18Z

Without looking very deep into the code base, where is Linker(lto=True) set in the pipeline? Are they enabled by default through @cuda.jit?

Good point, I think it's missing - I think when #23 adds tests that would have become apparent, and it will be required.

In 159e77e I added the line to make lto enabled by default.

gmarkall · 2024-10-28T19:06:24Z

Closing in favour of #62, which is based on a recent main.

brandon-b-miller and others added 26 commits July 12, 2024 11:47

off the ground

521be20

cleanup

0f9bc4a

Merge remote-tracking branch 'upstream/develop' into develop

b25db1f

enough to launch a kernel

1c3517f

pass through kwargs

cbcbbab

patch_cuda once

4406809

refactor

dc887b6

Merge remote-tracking branch 'upstream/develop' into develop

7d17759

merge latest/resolve conflicts

b9898ec

style and other fixes

60d4ca7

Merge remote-tracking branch 'upstream/develop' into develop

db32cfa

merge latest/resolve conflict

bc424ae

cleanup

56db9c8

bifurcate error messages

c57053c

partially address reviews

363b86d

move add_file_guess_ext logic to Linker base class

32164e9

refactor __new__ logic

c3b9084

address reviews

2c940ee

refactor config logic

a8c38b6

continue addressing reviews

421fdfb

rename errors

16314a7

minor cleanup

41d85a9

Apply suggestions from code review

f7939b6

Co-authored-by: Graham Markall <535640+gmarkall@users.noreply.github.com>

address reviews

91f06a8

bug fixes and map ltoir to CU_JIT_INPUT_NVVM

0541dcf

initial porting of LTOIR related changes

b28acd3

isVoid changed the base branch from main to develop August 23, 2024 05:20

isVoid changed the title ~~Allow JIT to, and Link from, LTOIR for cuda source input~~ Allow JIT Compile to, and Link from, LTOIR for cuda source input Aug 23, 2024

gmarkall reviewed Aug 23, 2024

View reviewed changes

gmarkall requested changes Aug 23, 2024

View reviewed changes

gmarkall added the 4 - Waiting on author Waiting for author to respond to review label Aug 23, 2024

isVoid mentioned this pull request Aug 23, 2024

LTO Support NVIDIA/numbast#33

Open

dump LTO-ed PTX if dump-assembly=1

e50e8d8

isVoid commented Aug 26, 2024

View reviewed changes

style

c474146

isVoid mentioned this pull request Aug 28, 2024

Add Bfloat16 Benchmark and Benchmark Suite NVIDIA/numbast#71

Open

gmarkall mentioned this pull request Sep 9, 2024

Use pynvjitlink for MVC #23

Closed

gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Sep 9, 2024

remove redundant check for config assembly, and enable lto by default

159e77e

gmarkall added the develop A PR targeted at the develop branch that will need moving to main label Oct 21, 2024

gmarkall added this to the v0.0.18 milestone Oct 21, 2024

gmarkall mentioned this pull request Oct 22, 2024

#48 rebased: "Allow JIT Compile to, and Link from, LTOIR for cuda source input" #60

Closed

gmarkall closed this Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow JIT Compile to, and Link from, LTOIR for cuda source input #48

Allow JIT Compile to, and Link from, LTOIR for cuda source input #48

isVoid commented Aug 23, 2024 •

edited

Loading

copy-pr-bot bot commented Aug 23, 2024

gmarkall Aug 23, 2024

gmarkall left a comment

isVoid commented Aug 23, 2024 •

edited

Loading

isVoid Aug 26, 2024

gmarkall commented Aug 27, 2024

isVoid commented Sep 26, 2024

gmarkall commented Oct 28, 2024

Allow JIT Compile to, and Link from, LTOIR for cuda source input #48

Allow JIT Compile to, and Link from, LTOIR for cuda source input #48

Conversation

isVoid commented Aug 23, 2024 • edited Loading

copy-pr-bot bot commented Aug 23, 2024

gmarkall Aug 23, 2024

Choose a reason for hiding this comment

gmarkall left a comment

Choose a reason for hiding this comment

isVoid commented Aug 23, 2024 • edited Loading

isVoid Aug 26, 2024

Choose a reason for hiding this comment

gmarkall commented Aug 27, 2024

isVoid commented Sep 26, 2024

gmarkall commented Oct 28, 2024

isVoid commented Aug 23, 2024 •

edited

Loading

isVoid commented Aug 23, 2024 •

edited

Loading