Allow CUDA source inputs compiled to LTOIR, and enable pynvjitlinker to link inputs that contains LTOIR #62

isVoid · 2024-10-23T15:37:26Z

This PR supercedes #60 due to write permission issue.

isVoid · 2024-10-28T01:45:06Z

I'm not able to reproduce this segfault on my v100 machine:

test_namedunituple (numba.cuda.tests.cudapy.test_array_args.TestCudaArrayArg.test_namedunituple) ... Fatal Python error: Segmentation fault
...

Extension modules: numpy._core._multiarray_umath, numpy._core._multiarray_tests, numpy.linalg._umath_linalg, numba.core.typeconv._typeconv, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, numba.mviewbuf, pynvjitlink._nvjitlinklib, numba.types.itertools (total: 22)
ci/test_conda_pynvjitlink.sh: line 72:  2238 Segmentation fault      (core dumped) ENABLE_PYNVJITLINK=1 NUMBA_CUDA_TEST_BIN_DIR=$NUMBA_CUDA_TEST_BIN_DIR python -m numba.runtests numba.cuda.tests -v
/__w/numba-cuda/numba-cuda

isVoid · 2024-10-30T02:12:09Z

In afcce87 I added an additional flag ignore_nonlto to the linker so that only LTO-able objects are added when the flag is enabled. This allows the driver is in the correct state when -ptx flag is set. And this is also the desired behavior since Numba now dumps the optimized PTX only for the portion that are LTO-abled added to the linker, and raise warning for any source that aren't optimizable.

isVoid · 2024-10-30T09:29:45Z

A subtle case here is that lto=True is only enabled for cuda>12.0 and is tested so. Because this feature depends on pynvjitlink, which is only tested in CTK12.5 environment.

isVoid · 2024-10-30T09:34:53Z

numba_cuda/numba/cuda/cudadrv/driver.py

@@ -2704,6 +2710,13 @@ def add_file_guess_ext(self, path_or_code):
                        "Don't know how to link file with extension "
                        f"{ext}"
                    )
+                if ignore_nonlto and kind != FILE_EXTENSION_MAP["ltoir"]:


There's a nuance here where fatbin object can also only contains LTOIR. Though that requires building additional bindings for cuobjdump and outputting the result.

In bac11f6 I made a new created a new fatbin object that contains both LTOIR and SASS and feed into the linker for testing. I added a helper function that invokes cuobjdump to extract object types from the fatbin / object. We allow and fatbin / object that contains LTOIR to be passed into the linker.

numba_cuda/numba/cuda/cudadrv/driver.py

gmarkall · 2024-11-29T22:26:45Z

After a merge from main, this passes all tests for me locally on Windows.

gmarkall · 2024-12-02T12:15:45Z

numba_cuda/numba/cuda/cudadrv/driver.py

        """
        Add a file or LinkableCode object to the link. If a file is
        passed, the type will be inferred from the extension. A LinkableCode
        object represents a file already in memory.
+
+        When `ignore_nonlto` is set to true, do not add code that are will not


Suggested change

When `ignore_nonlto` is set to true, do not add code that are will not

When `ignore_nonlto` is set to true, do not add code that will not

numba_cuda/numba/cuda/cudadrv/driver.py

gmarkall

This tests OK for me locally on Windows. I've pushed changes to the Windows test binary build script that mirror the Makefile changes.

I think this is good to merge, pending:

Completion of CI
@isVoid checking integration with Numbast again locally, with the recent changes.

numba_cuda/numba/cuda/codegen.py

isVoid and others added 12 commits October 22, 2024 10:58

initial porting of LTOIR related changes

e0e8ae9

dump LTO-ed PTX if dump-assembly=1

d227fcb

remove redundant check for config assembly, and enable lto by default

88bdde0

Fix style check

1eb9986

lto disabled by default

d5806b6

add test file, and augment nvrtc bindings with LTOIR API

a7005a8

Also enable the rest of the linkings

0f41616

collapse the two tests in one

1ce439b

raise warning for inputs that has non-LTOIR objects

21216f1

style fixes

9ea2fd4

enable LTO nvrtc function only for 11.*

27b304d

enable LTO linker items only for cuda 12

b42c67d

gmarkall mentioned this pull request Oct 28, 2024

Allow JIT Compile to, and Link from, LTOIR for cuda source input #48

Closed

gmarkall added this to the nvmath-python support milestone Oct 29, 2024

Conditionally add LTO-able objects for PTX prints

afcce87

isVoid commented Oct 30, 2024

View reviewed changes

isVoid added 2 commits October 30, 2024 22:40

Examine fatbin input that contains LTOIR

bac11f6

add docstring to helper

300348d

isVoid changed the title ~~Allow CUDA source inputs compiled to LTOIR, and enable pynvjitlinker to link inputs that contains LTOIR"~~ Allow CUDA source inputs compiled to LTOIR, and enable pynvjitlinker to link inputs that contains LTOIR Oct 31, 2024

add cuda-cuobjdump to dependency

302259c

gmarkall reviewed Nov 1, 2024

View reviewed changes

numba_cuda/numba/cuda/cudadrv/driver.py Show resolved Hide resolved

isVoid added 2 commits November 4, 2024 09:36

raise when cuobjdump is not installed

224eef4

style

ce434e5

isVoid mentioned this pull request Nov 19, 2024

LTO Support NVIDIA/numbast#33

Open

Merge remote-tracking branch 'NVIDIA/main' into fea-cu-lto

7d07e7b

gmarkall added the 4 - Waiting on reviewer Waiting for reviewer to respond to author label Nov 29, 2024

gmarkall reviewed Dec 2, 2024

View reviewed changes

numba_cuda/numba/cuda/cudadrv/driver.py Show resolved Hide resolved

gmarkall added 2 commits December 2, 2024 16:22

Add multi-device fatbin compilation to Windows test binary build script

1ed4dad

Fix small typo in docstring

8617731

gmarkall approved these changes Dec 2, 2024

View reviewed changes

gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Dec 2, 2024

isVoid commented Dec 4, 2024

View reviewed changes

numba_cuda/numba/cuda/codegen.py Show resolved Hide resolved

gmarkall modified the milestones: nvmath-python support, v0.0.21 Dec 5, 2024

gmarkall added 5 - Ready to merge Testing and reviews complete, ready to merge and removed 4 - Waiting on author Waiting for author to respond to review labels Dec 6, 2024

gmarkall merged commit 779782d into NVIDIA:main Dec 6, 2024
31 checks passed

gmarkall mentioned this pull request Dec 6, 2024

[FEA] Support JITting and Linking LTOIR from cuda source inputs #45

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow CUDA source inputs compiled to LTOIR, and enable pynvjitlinker to link inputs that contains LTOIR #62

Allow CUDA source inputs compiled to LTOIR, and enable pynvjitlinker to link inputs that contains LTOIR #62

isVoid commented Oct 23, 2024

isVoid commented Oct 28, 2024

isVoid commented Oct 30, 2024

isVoid commented Oct 30, 2024 •

edited

Loading

isVoid Oct 30, 2024

isVoid Oct 31, 2024

gmarkall commented Nov 29, 2024

gmarkall Dec 2, 2024

gmarkall left a comment

	When `ignore_nonlto` is set to true, do not add code that are will not
	When `ignore_nonlto` is set to true, do not add code that will not

Allow CUDA source inputs compiled to LTOIR, and enable pynvjitlinker to link inputs that contains LTOIR #62

Allow CUDA source inputs compiled to LTOIR, and enable pynvjitlinker to link inputs that contains LTOIR #62

Conversation

isVoid commented Oct 23, 2024

isVoid commented Oct 28, 2024

isVoid commented Oct 30, 2024

isVoid commented Oct 30, 2024 • edited Loading

isVoid Oct 30, 2024

Choose a reason for hiding this comment

isVoid Oct 31, 2024

Choose a reason for hiding this comment

gmarkall commented Nov 29, 2024

gmarkall Dec 2, 2024

Choose a reason for hiding this comment

gmarkall left a comment

Choose a reason for hiding this comment

isVoid commented Oct 30, 2024 •

edited

Loading