Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC]: Hide the visibility of Thrust/CUB symbols in shared object libraries #166

Closed
6 tasks done
elstehle opened this issue Jul 5, 2023 · 3 comments · Fixed by #443
Closed
6 tasks done

[EPIC]: Hide the visibility of Thrust/CUB symbols in shared object libraries #166

elstehle opened this issue Jul 5, 2023 · 3 comments · Fixed by #443
Assignees
Labels
bug Something isn't working right. cub For all items related to CUB

Comments

@elstehle
Copy link
Collaborator

elstehle commented Jul 5, 2023

Is this a duplicate?

Area

CUB

Is your feature request related to a problem? Please describe.

Due to kernels having weak external linkage, we have run into problems in the past, like NVIDIA/cub#545

More recently, we ran into another issue that may have the same root cause, where two shared libraries define symbols for the same kernels and the kernels never actually get launched - with no cuda error being reported.
NVIDIA/cub#719

Describe the solution you'd like

Ensuring that our kernels do not have external linkage should resolve the issue. However we want to ensure some unforeseen side-effects may cause issues or regressions.

Tasks

Describe alternatives you've considered

No response

Additional context

No response

@miscco miscco added cub For all items related to CUB bug Something isn't working right. labels Jul 12, 2023
@jrhemstad
Copy link
Collaborator

jrhemstad commented Aug 4, 2023

Okay, I had to prove to myself once and for all the difference between static and __attribute__((visibility("hidden"))).

TL;DR: When building a shared object, they have the same end result of making the symbol unique to the shared object. The difference is that static symbols are unique to each TU that is linked together to form a .so. A hidden symbol is not unique to each TU (but is still hidden in the final .so).

To prove this to myself, I have the following:

// B.cpp
template <typename T>
ANNOTATION void foo(){ }

ANNOTATION void B(){
	foo<int>();
}

// C.cpp
template <typename T>
ANNOTATION void foo(){ }

ANNOTATION void C(){
	foo<int>();
}

I build a libBC.so three different ways:

g++ -DANNOTATION="" -c B.cpp
g++ -DANNOTATION="" -c C.cpp
g++ -shared -o libBC.so B.o C.o

g++ -DANNOTATION="static" -c B.cpp
g++ -DANNOTATION="static" -c C.cpp
g++ -shared -o libBC_static.so B.o C.o

g++ -DANNOTATION="__attribute__((visibility(\"hidden\")))" -c B.cpp
g++ -DANNOTATION="__attribute__((visibility(\"hidden\")))" -c C.cpp
g++ -shared -o libBC_hidden.so B.o C.o

Inspecting the symbol table we see:

nm -C libBC.so
0000000000001119 T B()
0000000000001134 T C()
0000000000001129 W void foo<int>()
nm -C libBC_static.so
0000000000001109 t void foo<int>()
0000000000001124 t void foo<int>()
00000000000010f9 t B()
0000000000001114 t C()
nm -C libBC_hidden.so
00000000000010f9 t B()
0000000000001114 t C()
0000000000001109 t void foo<int>()

Important takeaways:

  • foo<int>() is a weak symbol in libBC.so (meaning the symbol is not hidden)
  • In libBC_static.so, foo<int>() is hidden (t means it is in the text section), but there are two copies of the symbol, one from B.o and another from C.o
  • In libBC_hidden.so, foo<int>() is hidden, but there is only one copy of the symbol.

Therefore, my conclusion is that we should annotate everything in CUB with __attribute__((visibility("hidden"))) and I don't see a reason to use static at all?

libcu++ already has a _LIBCUDACXX_HIDDEN macro that we could use for this purpose: https://github.com/NVIDIA/cccl/blob/b6f7b962c29f1ebde16888bcea50b41247ff06e9/libcudacxx/include/cuda/std/detail/libcxx/include/__config#L1290C32-L1290C74

@jrhemstad
Copy link
Collaborator

jrhemstad commented Aug 4, 2023

Note, that this annotation needs to be used not just on CUB's kernels, but all CUB template functions.

@jrhemstad
Copy link
Collaborator

jrhemstad commented Aug 4, 2023

It turns out hidden is not sufficient for all cases. Consider:

   nvcc -c -arch=sm_52  a.cu
   nvcc -c -arch=sm_70  b.cu
   nvcc -c main.cpp
   nvcc -o test_static main.o a.o b.o 

If a.o and b.o reference the same kernel symbol, then this will have unexpected results where you may end up invoking either the sm_52 or sm_70 version of the kernel.

As a result, we should annotate __global__ functions as static.

However, we'd need to special case it so we don't make a __global__ function static with -rdc=true. Only make it have hidden visibility. Nevermind, since we control all the definition/declerations of all CUB kernels, we aren't trying to split them. So no need for this switch logic.

In short, I think we want this macro:

#if defined(__CUDACC__)
#       define CUB_KERNEL \
            static __global__
#else
#   define CUB_KERNEL
#endif

In addition we use _LIBCUDACXX_HIDDEN on all non-kernel template functions.

As far as testing goes to ensure all our symbols are hidden as expected, I think a relatively easy thing we could do is write a script that dumps the symbol table for all your existing tests with nm -C and then checks that all CUB symbols have T/t visibility.

The namespace is included in the symbol name dumped by nm, so we could just grep for any symbol with CUB:: and verify it's visibility

0000000000001129 t A::bar()
000000000000114d t void A::foo<int>()

@jrhemstad jrhemstad changed the title [FEA]: Make sure CUB kernels do not have external linkage [FEA]: Hide the visibility of CUB symbols in shared object libraries Aug 9, 2023
@jrhemstad jrhemstad changed the title [FEA]: Hide the visibility of CUB symbols in shared object libraries [FEA]: Hide the visibility of Thrust/CUB symbols in shared object libraries Aug 9, 2023
@jrhemstad jrhemstad assigned gevtushenko and unassigned elstehle Aug 9, 2023
@jrhemstad jrhemstad changed the title [FEA]: Hide the visibility of Thrust/CUB symbols in shared object libraries [EPIC]: Hide the visibility of Thrust/CUB symbols in shared object libraries Aug 9, 2023
PointKernel added a commit to NVIDIA/cuCollections that referenced this issue Jan 19, 2024
This marks all kernels in CUCO as `static` so that they have internal
linkage and won't conflict when used by multiple DSOs.

I didn't see a single shared/common header in cuco where I could place a
`CUCO_KERNEL` macro so I modified each instance instead.
While `cccl` went with a `__attribute__ ((visibility ("hidden")))`
approach to help reduce RDC size, this approach seemed very invasive for
cuco. This is due to the fact that we would need to pragma push and pop
both gcc warnings and nvcc warnings in each cuco header so that we don't
introduce any warnings. This is needed as the compiler incorrectly state
that the `__attribute__ ((visibility ("hidden")))` has no side-effect.

Context:
rapidsai/cudf#14726
NVIDIA/cccl#166
rapidsai/raft#1722

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yunsong Wang <yunsongw@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right. cub For all items related to CUB
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants