have cuda_library output RDC #125

garymm · 2023-07-06T21:41:21Z

This allows a cuda_library that was built with rdc=True to be
depended upon by another such library. This is convenient as such a
library can be consumed by either another cuda_library OR a
cc_library.

Note: if we have this, I'm not sure why we need a separate cuda_objects
rule anymore. Perhaps it can be removed in a later PR.

Change-Id: I1014d28a0ab3a9c76b788821211b13c4a9956d2a

cloudhan · 2023-07-11T06:01:55Z

This somehow allows cc_library to depend on an incomplete cuda_library, is there a way to avoid it?

garymm · 2023-07-11T17:27:14Z

This somehow allows cc_library to depend on an incomplete cuda_library, is there a way to avoid it?

Can you please elaborate? Or illustrate with an example? If rdc=True, the cuda_library will always perform a device link step, so it should be complete.

cloudhan · 2023-07-12T01:13:38Z

cuda/private/rules/cuda_library.bzl

@@ -87,12 +91,19 @@ def _cuda_library_impl(ctx):
            pic_lib = pic_libs,
            objects = objects,
            pic_objects = pic_objects,
+            rdc_objects = rdc_objects,


Are you missing rdc_pic_objects here?

cloudhan · 2023-07-12T01:19:08Z

cuda/private/rules/cuda_library.bzl

@@ -104,7 +115,7 @@ cuda_library = rule(
        "hdrs": attr.label_list(allow_files = ALLOW_CUDA_HDRS),
        "deps": attr.label_list(providers = [[CcInfo], [CudaInfo]]),
        "alwayslink": attr.bool(default = False),
-        "rdc": attr.bool(default = False, doc = "whether to perform relocateable device code linking, otherwise, normal device link."),
+        "rdc": attr.bool(default = False, doc = "Whether to produce and consume relocateable device code. Transitive deps that contain device code must all either be cuda_objects or cuda_library that has rdc = True. If False, all device code must be in the same translation unit. May have performance implications. See https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#using-separate-compilation-in-cuda."),


Could you please break this line a little bit, way too long for it.

cloudhan · 2023-07-12T02:04:26Z

If rdc=True, the cuda_library will always perform a device link step, so it should be complete.

Indeed.

The only limitation left is this cannot support circular object dependency due to the mandatory device link, which by itself is really stupid so I don't think user will need it generally.

cloudhan · 2023-07-12T02:11:10Z

Backlog for memo here:

Originally, it is modeled as

host code	device code
cc_library for reusable cpu objects	cuda_objects for reusable device objects
cc_binary for linked cpu executable	cuda_library for linked device executable

For non-rdc and non-dlto case, the device executable is automatically linked already. rdc will enforce device_link then the compiler will ensure the device executable is linked.

This is something like cc_binary expose all objects for reusing with other cc_binarys. But this is not possible because the main entrypoint symbol conflict. For device case, the entrypoint is determined by a kernel launch, so this is not a problem.

Thus, allow cuda_library to expose the underlying objects for reusing between cuda_library but not cc_library will be harmless then.

This allows a `cuda_library` that was built with `rdc=True` to be depended upon by another such library. This is convenient as such a library can be consumed by either another `cuda_library` OR a `cc_library`. Note: if we have this, I'm not sure why we need a separate cuda_objects rule anymore. Perhaps it can be removed in a later PR. Change-Id: I1014d28a0ab3a9c76b788821211b13c4a9956d2a

cloudhan

Thanks for the improvement.

garymm · 2023-07-13T15:56:55Z

@cloudhan would you be OK with me deleting cuda_objects? Seems useless now.

cloudhan · 2023-07-14T00:05:08Z

No

garymm · 2023-07-14T14:29:27Z

Why?

Compile the newly added example in #125 with `_wrapper_device_link` with rdc causes artifact name conflict. ``` cuda_library( name = "b", srcs = ["b.cu"], hdrs = ["b.cuh"], rdc = True, ) ``` The reason is that our compile always compile to <target_output_dir>/_objs: cuda_library(name="b") --> artifact base name is `b` during `compile` with `compile` for following srcs "b.cu" ------------------> artifact `<target_output_dir>/_objs/b.rdc.o` `fatbin.cu` ------rdc----> artifact `<target_output_dir>/_objs/b.rdc.o` After this PR: "b.cu" ------------------> artifact `<target_output_dir>/_objs/b.rdc.o` `fatbin.cu` ------rdc----> artifact `<target_output_dir>/_objs/_dlink/b.rdc.o`

cloudhan · 2023-09-13T01:13:54Z

OK, so when building a shared library, we inevitably -Wl,--whole-archive the static library. Let cuda_library do transitive device link is not harmful for the device code, but is not the case for interfacing with host code.

the cuda_library can contain incomplete device object but can be consumed by the cc_* rules
multiple cuda_library targets that form a chain will contain same symbols when doing --whole-archive linking.

So this will cause misusing easily and is unfixable in some condition. I will revert this PR.

garymm · 2023-09-13T17:04:18Z

Can you add an example that illustrates the problem?
I don't understand exactly how 1 or 2 can happen.

cloudhan · 2023-09-14T00:48:13Z

1 is not very important, because those imcomplete objects are not exposed by CcInfo. For 2

rules_cuda/cuda/private/rules/cuda_library.bzl

Lines 38 to 48 in 22a46e6

    
           if use_rdc: 
        
               transitive_objects = depset(transitive = [dep[CudaInfo].rdc_objects for dep in attr.deps if CudaInfo in dep]) 
        
               transitive_pic_objects = depset(transitive = [dep[CudaInfo].rdc_pic_objects for dep in attr.deps if CudaInfo in dep]) 
        
               objects = depset(transitive = [objects, transitive_objects]) 
        
               rdc_objects = objects 
        
               pic_objects = depset(transitive = [pic_objects, transitive_pic_objects]) 
        
               rdc_pic_objects = pic_objects 
        
               dlink_object = depset([device_link(ctx, cuda_toolchain, cc_toolchain, objects, common, pic = False, rdc = use_rdc)]) 
        
               dlink_pic_object = depset([device_link(ctx, cuda_toolchain, cc_toolchain, pic_objects, common, pic = True, rdc = use_rdc)]) 
        
               objects = depset(transitive = [objects, dlink_object]) 
        
               pic_objects = depset(transitive = [pic_objects, dlink_pic_object])

Lets consider the issue 162, build the shared library will cause

/usr/bin/ld.gold: error: bazel-out/k8-fastbuild/bin/tmp/_objs/mul/mul.rdc.pic.o: multiple definition of 'mul(double*, double const*, double const*, int)'
/usr/bin/ld.gold: bazel-out/k8-fastbuild/bin/tmp/_objs/mul/mul.rdc.pic.o: previous definition here
/usr/bin/ld.gold: error: bazel-out/k8-fastbuild/bin/tmp/_objs/mul/mul.rdc.pic.o: multiple definition of 'mul_impl(double*, double const*, double const*, int)'
/usr/bin/ld.gold: bazel-out/k8-fastbuild/bin/tmp/_objs/mul/mul.rdc.pic.o: previous definition here
/usr/bin/ld.gold: error: bazel-out/k8-fastbuild/bin/tmp/_objs/mul/mul.rdc.pic.o: multiple definition of '__fatbinwrap_4280babd_6_mul_cc_a09afee1'
/usr/bin/ld.gold: bazel-out/k8-fastbuild/bin/tmp/_objs/mul/mul.rdc.pic.o: previous definition here
/usr/bin/ld.gold: error: bazel-out/k8-fastbuild/bin/tmp/_objs/mul/mul.rdc.pic.o: multiple definition of '__device_stub__Z8mul_implPdPKdS1_i(double*, double const*, double const*, int)'
/usr/bin/ld.gold: bazel-out/k8-fastbuild/bin/tmp/_objs/mul/mul.rdc.pic.o: previous definition here
/usr/bin/ld.gold: error: bazel-out/k8-fastbuild/bin/tmp/_objs/mul/mul_dlink.rdc.pic.o: multiple definition of '__cudaRegisterLinkedBinary_4280babd_6_mul_cc_a09afee1'
/usr/bin/ld.gold: bazel-out/k8-fastbuild/bin/tmp/_objs/muladd/muladd_dlink.rdc.pic.o: previous definition here

| target | transitive_objects | objects (exposed via CcInfo) | rdc_objects (exposed via CudaInfo)
|--------|--------------------|------------------------------|------------------------------------
| mul    |                    | mul.o,mul_dlink.o            | mul.o
| muladd | mul.o              | muladd.o,mul.o,muladd_dlink.o| muladd.o,mul.o

And cc_shared_library(name = "t", deps = [":mul", ":muladd"]) is valid because from user side, they want symbols in target mul to be included in target t, so they whole archive link mul, and this applies to muladd respectively.

cloudhan · 2023-09-15T17:13:03Z

@garymm #164 (comment) This should address all the issue we are facing. Let me fix this.

garymm requested review from cloudhan, jsharpe and ryanleary as code owners July 6, 2023 21:41

garymm force-pushed the garymm/transitive-cuda branch from 535c076 to ce78428 Compare July 6, 2023 22:55

cloudhan reviewed Jul 12, 2023

View reviewed changes

garymm force-pushed the garymm/transitive-cuda branch from ce78428 to 104b3bc Compare July 12, 2023 17:21

cloudhan approved these changes Jul 13, 2023

View reviewed changes

cloudhan merged commit 555f88f into bazel-contrib:main Jul 13, 2023
12 checks passed

garymm deleted the garymm/transitive-cuda branch July 13, 2023 15:56

cloudhan mentioned this pull request Sep 12, 2023

Can't link a shared library using clang #162

Closed

cloudhan mentioned this pull request Sep 17, 2023

Improve RDC #167

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

have cuda_library output RDC #125

have cuda_library output RDC #125

garymm commented Jul 6, 2023

cloudhan commented Jul 11, 2023

garymm commented Jul 11, 2023

cloudhan Jul 12, 2023 •

edited

Loading

garymm Jul 12, 2023

cloudhan Jul 12, 2023 •

edited

Loading

garymm Jul 12, 2023

cloudhan commented Jul 12, 2023

cloudhan commented Jul 12, 2023

cloudhan left a comment

garymm commented Jul 13, 2023

cloudhan commented Jul 14, 2023

garymm commented Jul 14, 2023

cloudhan commented Sep 13, 2023 •

edited

Loading

garymm commented Sep 13, 2023

cloudhan commented Sep 14, 2023

cloudhan commented Sep 15, 2023

have cuda_library output RDC #125

have cuda_library output RDC #125

Conversation

garymm commented Jul 6, 2023

cloudhan commented Jul 11, 2023

garymm commented Jul 11, 2023

cloudhan Jul 12, 2023 • edited Loading

Choose a reason for hiding this comment

garymm Jul 12, 2023

Choose a reason for hiding this comment

cloudhan Jul 12, 2023 • edited Loading

Choose a reason for hiding this comment

garymm Jul 12, 2023

Choose a reason for hiding this comment

cloudhan commented Jul 12, 2023

cloudhan commented Jul 12, 2023

cloudhan left a comment

Choose a reason for hiding this comment

garymm commented Jul 13, 2023

cloudhan commented Jul 14, 2023

garymm commented Jul 14, 2023

cloudhan commented Sep 13, 2023 • edited Loading

garymm commented Sep 13, 2023

cloudhan commented Sep 14, 2023

cloudhan commented Sep 15, 2023

cloudhan Jul 12, 2023 •

edited

Loading

cloudhan Jul 12, 2023 •

edited

Loading

cloudhan commented Sep 13, 2023 •

edited

Loading