Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudaPackages_12.cudatoolkit: hotfix after switching to autoPatchelfHook #224986

Merged

Conversation

SomeoneSerge
Copy link
Contributor

@SomeoneSerge SomeoneSerge commented Apr 6, 2023

Description of changes

"Regression" introduced in #178440, I forgot to test cudaPackages_12
Error reported in the comment: #224646 (comment)

I had to do something ugly about the qt dependencies, because NVidia builds against an out-of-date libtiff which we haven't got. Once we have a proper solution to #224646 we can remove that hack from the installPhase. Obviously, we need to verify if we this work at runtime as well.

For more context: this errors shows that cudaPackages_12.cudatoolkit and cudaPackages_11_8.cudatoolkit have technically always been "broken", since they shipped executables and/or libraries that weren't properly patchelf-ed. What #224646 changes is that this kind of omissions cannot go unnoticed

The legacy cudaPackages.cudatoolkit getting uglier is also "OK": one more reason to migrate to the redist cudaPackages.{cuda_,lib}*

CC @aaronmondal CC @NixOS/cuda-maintainers

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.05 Release Notes (or backporting 22.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@SomeoneSerge
Copy link
Contributor Author

Result of nixpkgs-review pr 224986 --extra-nixpkgs-config '{ cudaCapabilities = [ "8.6" ]; }' run on x86_64-linux 1

11 packages failed to build:
  • cudaPackages.tensorrt
  • cudaPackages.tensorrt.dev
  • python310Packages.tensorrt
  • python310Packages.tensorrt.dist
  • python310Packages.theanoWithCuda
  • python310Packages.theanoWithCuda.dist
  • python311Packages.tensorrt
  • python311Packages.tensorrt.dist
  • python311Packages.theanoWithCuda
  • python311Packages.theanoWithCuda.dist
  • truecrack-cuda
52 packages built:
  • caffeWithCuda
  • caffeWithCuda.bin
  • colmapWithCuda
  • cudaPackages.cudatoolkit
  • cudaPackages.cudatoolkit.doc
  • cudaPackages.cudatoolkit.lib
  • cudaPackages.cutensor
  • cudaPackages.cutensor.dev
  • forge
  • gpu-burn
  • gpu-screen-recorder
  • gpu-screen-recorder-gtk
  • gromacsCudaMpi
  • gwe
  • hip-nvidia
  • hip-nvidia.doc
  • katagoWithCuda
  • librealsenseWithCuda
  • librealsenseWithCuda.dev
  • nvtop
  • nvtop-nvidia
  • python310Packages.caffeWithCuda
  • python310Packages.caffeWithCuda.bin
  • python310Packages.cupy
  • python310Packages.cupy.dist
  • python310Packages.jaxlibWithCuda
  • python310Packages.jaxlibWithCuda.dist
  • python310Packages.numbaWithCuda
  • python310Packages.numbaWithCuda.dist
  • python310Packages.pycuda
  • python310Packages.pycuda.dist
  • python310Packages.pynvml
  • python310Packages.pynvml.dist
  • python310Packages.pyrealsense2WithCuda
  • python310Packages.pyrealsense2WithCuda.dev
  • python310Packages.torchWithCuda
  • python310Packages.torchWithCuda.dev
  • python310Packages.torchWithCuda.dist
  • python310Packages.torchWithCuda.lib
  • python311Packages.cupy
  • python311Packages.cupy.dist
  • python311Packages.jaxlibWithCuda
  • python311Packages.jaxlibWithCuda.dist
  • python311Packages.pycuda
  • python311Packages.pycuda.dist
  • python311Packages.pynvml
  • python311Packages.pynvml.dist
  • python311Packages.pyrealsense2WithCuda
  • python311Packages.pyrealsense2WithCuda.dev
  • xgboostWithCuda
  • xpraWithNvenc
  • xpraWithNvenc.dist

@SomeoneSerge
Copy link
Contributor Author

Since I broke both cudaPackages_11_8 and cudaPackages_12, should we consider recurseIntoAttrs for minor releases of cudaPackages? This would add a lot of noise to nix search, but IIUC would force nixpkgs-review to build all these flavours of cuda

@SomeoneSerge SomeoneSerge force-pushed the cudatoolkit-autopatchelf-followup branch from 361abc3 to 6dcf460 Compare April 6, 2023 13:41
@SomeoneSerge SomeoneSerge marked this pull request as ready for review April 6, 2023 13:41
@aaronmondal
Copy link
Contributor

Just tested it. Seems to work 😊

pkgs/development/compilers/cudatoolkit/common.nix Outdated Show resolved Hide resolved
Comment on lines +224 to +229
${lib.optionalString (lib.versionAtLeast version "11.8")
# error: auto-patchelf could not satisfy dependency libtiff.so.5 wanted by /nix/store/.......-cudatoolkit-12.0.1/host-linux-x64/Plugins/imageformats/libqtiff.so
# we only ship libtiff.so.6, so let's use qt plugins built by Nix.
# TODO: don't copy, come up with a symlink-based "merge"
''
rsync ${lib.getLib qt6Packages.qtimageformats}/lib/qt-6/plugins/ $out/host-linux-x64/Plugins/ -aP
''}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow the logic of this comment... how does using Qt plugins built by nix alleviate the .so.5 vs .so.6 version issue?

Also, why use rsync instead of simply ln -s? Ideally we'd like to avoid copying files b/w outputs.

Copy link
Contributor Author

@SomeoneSerge SomeoneSerge Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does using Qt plugins built by nix

...because Nix builds Qt plugins against nixpkgs' libtiff and they end up linking to .so.6

Also, why use rsync instead of simply ln -s? Ideally we'd like to avoid copying files b/w outputs.

No good reason at all, other than I wanted a "synchronize these two directory trees recursively" operation, and rsync is one. Either way, we want to be removing all of these plugins, and for all versions of cuda, so we'll have rewritten this patch before we can close #224646

@SomeoneSerge SomeoneSerge force-pushed the cudatoolkit-autopatchelf-followup branch from 5d0d4c8 to b100b0a Compare April 6, 2023 19:30
Copy link
Member

@samuela samuela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM shall we merge?

@SomeoneSerge
Copy link
Contributor Author

@samuela yup, let's go

@samuela samuela merged commit 1a14c8d into NixOS:master Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants