cudaPackages.cudatoolkit: switch to autoPatchelf #178440

SomeoneSerge · 2022-06-21T12:52:57Z

Description of changes

Rewrites cudatoolkit expression to use autoPatchelf instead of manually constructing and writing the rpath.
Using autoPatchelf ensures that we're at least not missing dependencies that upstream has marked as "needed".

This is a narrow-scoped part of #178439
For instance, this PR ensures "correctness" (amend missing rpaths) but increases the actual closure size.
The next PR should split the output to reduce closure sizes, while preserving "correctness"

Things done

Built on platform(s)
- x86_64-linux

CC @NixOS/cuda-maintainers

SomeoneSerge · 2022-06-21T13:25:54Z

  dontPatchELF = true;
  dontStrip = true;

These two are just hanging around. I'm almost certain I should remove dontPatchELF. I'm not so sure about dontStrip

samuela

Looks like a great upgrade to cudaPackages.cudatoolkit! I'm running nixpkgs-review now...

pkgs/development/compilers/cudatoolkit/common.nix

samuela · 2022-06-27T23:43:07Z

Result of nixpkgs-review pr 178440 run on x86_64-linux 1

2 packages marked as broken and skipped:

python310Packages.caffeWithCuda
truecrack-cuda

5 packages failed to build:

caffeWithCuda
ethminer (ethminer-cuda)
gpu-screen-recorder
gpu-screen-recorder-gtk
python39Packages.caffeWithCuda

33 packages built:

colmapWithCuda
cudaPackages.cuda-samples
cudatoolkit (cudaPackages.cudatoolkit ,cudatoolkit_11)
cudaPackages.cutensor
cudaPackages.nccl
forge
gpu-burn
gromacsCudaMpi
gwe
katagoWithCuda
librealsenseWithCuda
magma
nvtop
nvtop-nvidia
python310Packages.TheanoWithCuda
python310Packages.cupy
python310Packages.jaxlibWithCuda
python310Packages.numbaWithCuda
python310Packages.pycuda
python310Packages.pynvml
python310Packages.pyrealsense2WithCuda
python310Packages.pytorchWithCuda
python39Packages.TheanoWithCuda
python39Packages.cupy
python39Packages.jaxlibWithCuda
python39Packages.numbaWithCuda
python39Packages.pycuda
python39Packages.pynvml
python39Packages.pyrealsense2WithCuda
python39Packages.pytorchWithCuda
python39Packages.tensorflowWithCuda
xgboostWithCuda
xpraWithNvenc

samuela · 2022-06-27T23:46:29Z

Here are the errors:

error: builder for '/nix/store/iw6f89qja74akzqv9gl22vi26qpdrqlz-ethminer-0.19.0.drv' failed with exit code 2;
       last 10 log lines:
       > /nix/store/bv8qjsgd8ngjbazj3h5swfwb0sydy14n-cli11-2.2.0/include/CLI/App.hpp:594:35: note:   no known conversion for argument 2 from 'unsigned int' to 'CLI::callback_t' {aka 'std::function<bool(const std::vector<std::__cxx11::basic_string<char> >&)>'}
       >   594 |                        callback_t option_callback,
       >       |                        ~~~~~~~~~~~^~~~~~~~~~~~~~~
       > /nix/store/bv8qjsgd8ngjbazj3h5swfwb0sydy14n-cli11-2.2.0/include/CLI/App.hpp:701:13: note: candidate: 'CLI::Option* CLI::App::add_option(std::string)'
       >   701 |     Option *add_option(std::string option_name) {
       >       |             ^~~~~~~~~~
       > /nix/store/bv8qjsgd8ngjbazj3h5swfwb0sydy14n-cli11-2.2.0/include/CLI/App.hpp:701:13: note:   candidate expects 1 argument, 4 provided
       > make[2]: *** [ethminer/CMakeFiles/ethminer.dir/build.make:76: ethminer/CMakeFiles/ethminer.dir/main.cpp.o] Error 1
       > make[1]: *** [CMakeFiles/Makefile2:516: ethminer/CMakeFiles/ethminer.dir/all] Error 2
       > make: *** [Makefile:156: all] Error 2
       For full logs, run 'nix log /nix/store/iw6f89qja74akzqv9gl22vi26qpdrqlz-ethminer-0.19.0.drv'.
error: builder for '/nix/store/qli4pxxmhqbqic19qa0hwr3i3ixvc58a-cudatoolkit-10.1.243.drv' failed with exit code 1;
       last 10 log lines:
       > auto-patchelf: 7 dependencies could not be satisfied
       > warn: auto-patchelf ignoring missing libcuda.so.1 wanted by /nix/store/242ijwn14sjvpsl3694jk5j8fbc8hbpv-cudatoolkit-10.1.243/targets/x86_64-linux/lib/libcuinj64.so.10.1.243
       > error: auto-patchelf could not satisfy dependency libGLU.so.1 wanted by /nix/store/242ijwn14sjvpsl3694jk5j8fbc8hbpv-cudatoolkit-10.1.243/extras/demo_suite/oceanFFT
       > error: auto-patchelf could not satisfy dependency libglut.so.3 wanted by /nix/store/242ijwn14sjvpsl3694jk5j8fbc8hbpv-cudatoolkit-10.1.243/extras/demo_suite/oceanFFT
       > error: auto-patchelf could not satisfy dependency libGLU.so.1 wanted by /nix/store/242ijwn14sjvpsl3694jk5j8fbc8hbpv-cudatoolkit-10.1.243/extras/demo_suite/randomFog
       > error: auto-patchelf could not satisfy dependency libglut.so.3 wanted by /nix/store/242ijwn14sjvpsl3694jk5j8fbc8hbpv-cudatoolkit-10.1.243/extras/demo_suite/randomFog
       > error: auto-patchelf could not satisfy dependency libGLU.so.1 wanted by /nix/store/242ijwn14sjvpsl3694jk5j8fbc8hbpv-cudatoolkit-10.1.243/extras/demo_suite/nbody
       > error: auto-patchelf could not satisfy dependency libglut.so.3 wanted by /nix/store/242ijwn14sjvpsl3694jk5j8fbc8hbpv-cudatoolkit-10.1.243/extras/demo_suite/nbody
       > auto-patchelf failed to find all the required dependencies.
       > Add the missing dependencies to --libs or use `--ignore-missing="foo.so.1 bar.so etc.so"`.
       For full logs, run 'nix log /nix/store/qli4pxxmhqbqic19qa0hwr3i3ixvc58a-cudatoolkit-10.1.243.drv'.
error: 1 dependencies of derivation '/nix/store/d0dsn8qz3ym77lfg43ksi13wp9qxyb9c-cudatoolkit-10-cudnn-7.6.5.drv' failed to build
error: 2 dependencies of derivation '/nix/store/d4j3zxg2rgvp290bxs3swcfdqkda6k2s-caffe-1.0.drv' failed to build
error: 2 dependencies of derivation '/nix/store/db9d8zi0jvnm0kh90ml6m9b06qg5zsyb-caffe-1.0.drv' failed to build
error: builder for '/nix/store/61qxgja8gs96d78kw1bzlrr6fk3ygdpp-cudatoolkit-10.2.89.drv' failed with exit code 1;
       last 10 log lines:
       > error: auto-patchelf could not satisfy dependency libQt5WebEngineCore.so.5 wanted by /nix/store/n9xpr40wamx3iswvixvglixc8sl5d5pv-cudatoolkit-10.2.89/nsight-compute-2019.5.0/host/linux-desktop-glibc_2_11_3-x64/libexec/QtWebEngineProcess
       > warn: auto-patchelf ignoring missing libcuda.so.1 wanted by /nix/store/n9xpr40wamx3iswvixvglixc8sl5d5pv-cudatoolkit-10.2.89/targets/x86_64-linux/lib/libcuinj64.so.10.2.89
       > error: auto-patchelf could not satisfy dependency libGLU.so.1 wanted by /nix/store/n9xpr40wamx3iswvixvglixc8sl5d5pv-cudatoolkit-10.2.89/extras/demo_suite/oceanFFT
       > error: auto-patchelf could not satisfy dependency libglut.so.3 wanted by /nix/store/n9xpr40wamx3iswvixvglixc8sl5d5pv-cudatoolkit-10.2.89/extras/demo_suite/oceanFFT
       > error: auto-patchelf could not satisfy dependency libGLU.so.1 wanted by /nix/store/n9xpr40wamx3iswvixvglixc8sl5d5pv-cudatoolkit-10.2.89/extras/demo_suite/randomFog
       > error: auto-patchelf could not satisfy dependency libglut.so.3 wanted by /nix/store/n9xpr40wamx3iswvixvglixc8sl5d5pv-cudatoolkit-10.2.89/extras/demo_suite/randomFog
       > error: auto-patchelf could not satisfy dependency libGLU.so.1 wanted by /nix/store/n9xpr40wamx3iswvixvglixc8sl5d5pv-cudatoolkit-10.2.89/extras/demo_suite/nbody
       > error: auto-patchelf could not satisfy dependency libglut.so.3 wanted by /nix/store/n9xpr40wamx3iswvixvglixc8sl5d5pv-cudatoolkit-10.2.89/extras/demo_suite/nbody
       > auto-patchelf failed to find all the required dependencies.
       > Add the missing dependencies to --libs or use `--ignore-missing="foo.so.1 bar.so etc.so"`.
       For full logs, run 'nix log /nix/store/61qxgja8gs96d78kw1bzlrr6fk3ygdpp-cudatoolkit-10.2.89.drv'.
error: 1 dependencies of derivation '/nix/store/v7ymn5dd6m2z9lg96dn6vqr9r4hc162i-gpu-screen-recorder-1.0.0.drv' failed to build
error: 1 dependencies of derivation '/nix/store/c3gac5rhf14272mqnwy7xszglq8gg3ag-gpu-screen-recorder-gtk-0.1.0.drv' failed to build
error: 5 dependencies of derivation '/nix/store/jby3dh2knf5wrinpjb8zr3m3xwr24pmq-review-shell.drv' failed to build

Looks like cudatoolkit 10.1 and 10.2 are broken. Not sure if we're still trying to keep those working? I assume a number of packages still rely on them though.

samuela · 2022-06-27T23:47:37Z

OTOH the failures are only in the extras/demo_suite/*folder. We could just remove those binaries or skip them.

...to ensure correctness (in the sense that all DT_NEEDED libraries are verified to be discoverable through the runpaths)

...the same logic is handled by autoPatchelf

SomeoneSerge · 2023-04-01T22:23:20Z

I re-based on current master, and ran

❯ nix-build with-my-cuda.nix -A cudaPackages_10.cudatoolkit.out -A cudaPackages_10_1.cudatoolkit.lib -A cudaPackages.cudatoolkit
/nix/store/80q92g4mw49cifxfzhk3xhfmhcq7635p-cudatoolkit-10.2.89
/nix/store/vfkbm851wjfpw1pc2mdxq5v7d49plkps-cudatoolkit-10.1.243-lib
/nix/store/chzf3k3s07wd9i7xgzg6ha667bjhpc51-cudatoolkit-11.7.0

...nixpkgs-review would be nice, but I probably can't run it any time soon

SomeoneSerge · 2023-04-01T22:25:35Z

I think it would be pragmatic to just merge, relying on autoPatchelf having verified all the declared dependencies. There may be hidden dlopen() errors, but more likely in tools, not in libraries used by our ML stack. We can address these errors as they appear

MrFoxPro · 2023-04-05T11:10:00Z

Dobriy den' @SomeoneSerge. I've recently updated nixpkgs channel confugration and xmrig-cuda library that depends on cuda - just stopped working with this error: failed to open libnvrtc-builtins.so.11.7, even if this file is presented in nvidia_x11 output: /nix/store/9pp2hm8y83zi523shr6lli1jsaqd6krg-nvidia-x11-525.89.02-6.1.15/lib/libnvidia-ml.so

I fixed it by downgrading nixpkgs to 7018cf78c618e0a8ec4369c587319f51cb7b19b0
You can see my derivation here: https://github.com/MrFoxPro/nix/blob/cuda-bug/drv/xmrig-cuda.nix
It builds fine, but fails at runtime.

Any ideas how it could be related to this changes? How to fix?

SomeoneSerge · 2023-04-05T13:03:26Z

@MrFoxPro Hey-hey, и Вам доброго дня!

First off, I see that you're linking to nvidia_x11 directly, which is something we try to avoid in nixpkgs: we deploy libcuda.so at /run/opengl-driver/lib, because it's driver-dependend. You might want to replace that with autoAddOpenGLRunpathHook.

As for the libnvrtc error and whether it's not being found or being rejected by the dynamic linker, we'll need to see more logs. I'd start with running xmrig with LD_DEBUG=libs environment variable set. The error could indeed be related to this PR, because we're now setting Runpaths more consistently. The only regression we have noticed ourselves so far is one linked from the pytorch PR. Obviously, though, we can only really see how we affect packages that are in nixpkgs, and we can sometimes break things out-of-tree even if we're careful 🙃

I also see that there is an xmrig derivation in nixpkgs, only without cuda support yet. Maybe you could open a PR adding CUDA support to that derivation, and we could navigate from there?

SomeoneSerge · 2023-04-05T13:05:14Z

@MrFoxPro On a side note, though... I don't know what you mean by the "PMC Balloon", but to me it sounds a little bit provoking, and not in a good way. This is going off-topic though

MrFoxPro · 2023-04-05T13:33:06Z

@MrFoxPro Hey-hey, и Вам доброго дня!

First off, I see that you're linking to nvidia_x11 directly, which is something we try to avoid in nixpkgs: we deploy libcuda.so at /run/opengl-driver/lib, because it's driver-dependend. You might want to replace that with autoAddOpenGLRunpathHook.

As for the libnvrtc error and whether it's not being found or being rejected by the dynamic linker, we'll need to see more logs. I'd start with running xmrig with LD_DEBUG=libs environment variable set. The error could indeed be related to this PR, because we're now setting Runpaths more consistently. The only regression we have noticed ourselves so far is one linked from the pytorch PR. Obviously, though, we can only really see how we affect packages that are in nixpkgs, and we can sometimes break things out-of-tree even if we're careful 🙃

I also see that there is an xmrig derivation in nixpkgs, only without cuda support yet. Maybe you could open a PR adding CUDA support to that derivation, and we could navigate from there?

I'm not sure about /run/opengl-driver/lib. Does this exist only when hardware.opengl.enableis true? I mean I'm running miner on my machine headlessly and starting fake xserver only for overclocking via nvidia-settings, so I'm not sure why this option should be mandatory then.

MrFoxPro · 2023-04-05T13:35:46Z

@MrFoxPro On a side note, though... I don't know what you mean by the "PMC Balloon", but to me it sounds a little bit provoking, and not in a good way. This is going off-topic though

just a meme :) you're welcome to join https://t.me/ru_nixos btw, so we can discuss it more closely if you want

SomeoneSerge · 2023-04-05T13:48:42Z

I'm not sure about /run/opengl-driver/lib. Does this exist only when hardware.opengl.enableis true? I mean I'm running miner on my machine headlessly and starting fake xserver only for overclocking via nvidia-settings, so I'm not sure why this option should be mandatory then.

The name hardware.opengl.enable is historical legacy and subject to change: #141803. One is expected to use hardware.opengl.enable (and videoDrivers = [ "nvidia" ], iirc) even in headless mode, so that programs from nixpkgs will know to use libcuda.so that is compatible with your system's driver. This does not imply enabling the X server. In fact, you don't necessarily need the X server even to use OpenGL (cf. EGL). For why we need to deploy libcuda.so impurely, cf. this comment: #224294 (comment)

But let's draft a PR and move the conversation there!

just a meme :)

Alright, alright, I didn't mean to imply anything. Just that there are times when waving a white flag high above your head before approaching people is suddenly a very common sense thing to do, lest you catch friendly fire

MrFoxPro · 2023-04-05T18:14:09Z

@SomeoneSerge lets discuss it #224848

SomeoneSerge changed the title ~~cudaPackages.cudatoolkit: siwtch to autoPatchelf~~ cudaPackages.cudatoolkit: switch to autoPatchelf Jun 21, 2022

SomeoneSerge force-pushed the cudatoolkit-autopatchelf branch from 3bdbc07 to 08e6a91 Compare June 21, 2022 12:56

SomeoneSerge requested review from FRidh and jonringer as code owners June 21, 2022 12:56

github-actions bot added the 6.topic: python label Jun 21, 2022

SomeoneSerge force-pushed the cudatoolkit-autopatchelf branch from 08e6a91 to b08df4e Compare June 21, 2022 12:57

github-actions bot removed the 6.topic: python label Jun 21, 2022

SomeoneSerge added the 6.topic: cuda label Jun 21, 2022

ofborg bot added 10.rebuild-darwin: 11-100 10.rebuild-linux: 11-100 labels Jun 21, 2022

samuela reviewed Jun 27, 2022

View reviewed changes

pkgs/development/compilers/cudatoolkit/common.nix Outdated Show resolved Hide resolved

pkgs/development/compilers/cudatoolkit/common.nix Show resolved Hide resolved

stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jan 7, 2023

SomeoneSerge self-assigned this Apr 1, 2023

stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Apr 1, 2023

SomeoneSerge added 2 commits April 2, 2023 01:08

cudaPackages.cudatoolkit: use autoPatchelf

6dc9088

...to ensure correctness (in the sense that all DT_NEEDED libraries are verified to be discoverable through the runpaths)

cudaPackages.cudatoolkit: rm preFixup rpath code

15848ff

...the same logic is handled by autoPatchelf

SomeoneSerge force-pushed the cudatoolkit-autopatchelf branch from b08df4e to 15848ff Compare April 1, 2023 22:11

SomeoneSerge mentioned this pull request Apr 1, 2023

cudaPackages: use the same libstdc++ as the rest of nixpkgs #223664

Merged

13 tasks

samuela merged commit 2a969fc into NixOS:master Apr 2, 2023

This was referenced Apr 4, 2023

python3Packages.torch: 1.13.1 -> 2.0.0 #222273

Merged

cudaPackages.cudatoolkit: use nix-built dependencies to avoid spurious failures #224646

Closed

SomeoneSerge mentioned this pull request Apr 6, 2023

cudaPackages_12.cudatoolkit: hotfix after switching to autoPatchelfHook #224986

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cudaPackages.cudatoolkit: switch to autoPatchelf #178440

cudaPackages.cudatoolkit: switch to autoPatchelf #178440

SomeoneSerge commented Jun 21, 2022 •

edited

Loading

SomeoneSerge commented Jun 21, 2022

samuela left a comment

samuela commented Jun 27, 2022

samuela commented Jun 27, 2022

samuela commented Jun 27, 2022

SomeoneSerge commented Apr 1, 2023

SomeoneSerge commented Apr 1, 2023

MrFoxPro commented Apr 5, 2023 •

edited

Loading

SomeoneSerge commented Apr 5, 2023

SomeoneSerge commented Apr 5, 2023 •

edited

Loading

MrFoxPro commented Apr 5, 2023

MrFoxPro commented Apr 5, 2023

SomeoneSerge commented Apr 5, 2023

MrFoxPro commented Apr 5, 2023

cudaPackages.cudatoolkit: switch to autoPatchelf #178440

cudaPackages.cudatoolkit: switch to autoPatchelf #178440

Conversation

SomeoneSerge commented Jun 21, 2022 • edited Loading

Description of changes

Things done

SomeoneSerge commented Jun 21, 2022

samuela left a comment

Choose a reason for hiding this comment

samuela commented Jun 27, 2022

samuela commented Jun 27, 2022

samuela commented Jun 27, 2022

SomeoneSerge commented Apr 1, 2023

SomeoneSerge commented Apr 1, 2023

MrFoxPro commented Apr 5, 2023 • edited Loading

SomeoneSerge commented Apr 5, 2023

SomeoneSerge commented Apr 5, 2023 • edited Loading

MrFoxPro commented Apr 5, 2023

MrFoxPro commented Apr 5, 2023

SomeoneSerge commented Apr 5, 2023

MrFoxPro commented Apr 5, 2023

SomeoneSerge commented Jun 21, 2022 •

edited

Loading

MrFoxPro commented Apr 5, 2023 •

edited

Loading

SomeoneSerge commented Apr 5, 2023 •

edited

Loading