Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow CUDA source inputs compiled to LTOIR, and enable pynvjitlinker to link inputs that contains LTOIR #62

Merged
merged 21 commits into from
Dec 6, 2024

Conversation

isVoid
Copy link
Collaborator

@isVoid isVoid commented Oct 23, 2024

This PR supercedes #60 due to write permission issue.

@isVoid
Copy link
Collaborator Author

isVoid commented Oct 28, 2024

I'm not able to reproduce this segfault on my v100 machine:

test_namedunituple (numba.cuda.tests.cudapy.test_array_args.TestCudaArrayArg.test_namedunituple) ... Fatal Python error: Segmentation fault
...

Extension modules: numpy._core._multiarray_umath, numpy._core._multiarray_tests, numpy.linalg._umath_linalg, numba.core.typeconv._typeconv, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, numba.mviewbuf, pynvjitlink._nvjitlinklib, numba.types.itertools (total: 22)
ci/test_conda_pynvjitlink.sh: line 72:  2238 Segmentation fault      (core dumped) ENABLE_PYNVJITLINK=1 NUMBA_CUDA_TEST_BIN_DIR=$NUMBA_CUDA_TEST_BIN_DIR python -m numba.runtests numba.cuda.tests -v
/__w/numba-cuda/numba-cuda

@isVoid
Copy link
Collaborator Author

isVoid commented Oct 30, 2024

In afcce87 I added an additional flag ignore_nonlto to the linker so that only LTO-able objects are added when the flag is enabled. This allows the driver is in the correct state when -ptx flag is set. And this is also the desired behavior since Numba now dumps the optimized PTX only for the portion that are LTO-abled added to the linker, and raise warning for any source that aren't optimizable.

@isVoid
Copy link
Collaborator Author

isVoid commented Oct 30, 2024

A subtle case here is that lto=True is only enabled for cuda>12.0 and is tested so. Because this feature depends on pynvjitlink, which is only tested in CTK12.5 environment.

@@ -2704,6 +2710,13 @@ def add_file_guess_ext(self, path_or_code):
"Don't know how to link file with extension "
f"{ext}"
)
if ignore_nonlto and kind != FILE_EXTENSION_MAP["ltoir"]:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a nuance here where fatbin object can also only contains LTOIR. Though that requires building additional bindings for cuobjdump and outputting the result.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In bac11f6 I made a new created a new fatbin object that contains both LTOIR and SASS and feed into the linker for testing. I added a helper function that invokes cuobjdump to extract object types from the fatbin / object. We allow and fatbin / object that contains LTOIR to be passed into the linker.

@isVoid isVoid changed the title Allow CUDA source inputs compiled to LTOIR, and enable pynvjitlinker to link inputs that contains LTOIR" Allow CUDA source inputs compiled to LTOIR, and enable pynvjitlinker to link inputs that contains LTOIR Oct 31, 2024
@gmarkall
Copy link
Collaborator

After a merge from main, this passes all tests for me locally on Windows.

@gmarkall gmarkall added the 4 - Waiting on reviewer Waiting for reviewer to respond to author label Nov 29, 2024
"""
Add a file or LinkableCode object to the link. If a file is
passed, the type will be inferred from the extension. A LinkableCode
object represents a file already in memory.

When `ignore_nonlto` is set to true, do not add code that are will not
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When `ignore_nonlto` is set to true, do not add code that are will not
When `ignore_nonlto` is set to true, do not add code that will not

Copy link
Collaborator

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests OK for me locally on Windows. I've pushed changes to the Windows test binary build script that mirror the Makefile changes.

I think this is good to merge, pending:

  • Completion of CI
  • @isVoid checking integration with Numbast again locally, with the recent changes.

@gmarkall gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Dec 2, 2024
@gmarkall gmarkall added 5 - Ready to merge Testing and reviews complete, ready to merge and removed 4 - Waiting on author Waiting for author to respond to review labels Dec 6, 2024
@gmarkall gmarkall merged commit 779782d into NVIDIA:main Dec 6, 2024
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge Testing and reviews complete, ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants