Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudaPackages.autoAddOpenGLRunpathHook: don't skip shared libraries #250639

Merged
merged 1 commit into from
Aug 22, 2023

Conversation

SomeoneSerge
Copy link
Contributor

@SomeoneSerge SomeoneSerge commented Aug 22, 2023

Description of changes

Hotfix for #245789 (comment):

I was wrong about [the] .interp [section of an ELF file]: only dynamically linked executables [, as opposed to libraries,] got it, as far as I now understand

On the other hand, testing for --print-rpath (the .dynamic section [of the ELF]) gives:

❯ patchelf --print-rpath /nix/store/yzi9jfm4faias1zmf91iwlrd90r433rz-cuda_cudart-11.8.89/lib/libcudart.so.11.8.89
$ORIGIN
❯ # On a static binary:
❯ patchelf --print-rpath ./result/bin/wpa_supplicant
patchelf: cannot find section '.dynamic'. The input file is most likely statically linked

Things done

❯ nix build -f with-my-cuda.nix cudaPackages.cuda_cudart 
❯ patchelf --print-rpath result/lib/libcudart.so.11.0
/run/opengl-driver/lib:$ORIGIN

Things NOT done

I didn't test if @de11n's original use-case (static libraries) is still handled correctly

We should write an automated test which would verify that the hook

  • when used with a shared (i.e. dynamic) library in the outputs does actually prefix the runpath;
  • when used with a static library and NIX_DEBUG=1 does produce a message about the library presumed static, and does indeed leave the file intact;
  • same for executables

We should also set up GPU-enabled tests, including one that would verify torch.cuda.is_available()

@ConnorBaker
Copy link
Contributor

Testing now, will merge if it fixes it. Thank you for the help debugging and the quick fix :)

@ConnorBaker ConnorBaker self-requested a review August 22, 2023 00:28
@ConnorBaker
Copy link
Contributor

Worked for me!

Using nix-cuda-test with commit 3074628f4a0ece4928e032f9e2f2f1307b6ed22d: https://github.com/ConnorBaker/nix-cuda-test/tree/3074628f4a0ece4928e032f9e2f2f1307b6ed22d.

The parent commit of the PR change which caused the breakage works as expected:

[connorbaker@nixos-desktop:~/nix-cuda-test]$ nix run .#torch-cuda-is-available --override-input nixpkgs github:nixos/nixpkgs/6a830314d4da9242e61a485f050d4fe6c4d1dba6
warning: Git tree '/home/connorbaker/nix-cuda-test' is dirty
warning: not writing modified lock file of flake 'git+file:///home/connorbaker/nix-cuda-test':
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/cf795c35039b544a66a29a9ad3cd4d37bd1d6cb3' (2023-08-21)
  → 'github:nixos/nixpkgs/6a830314d4da9242e61a485f050d4fe6c4d1dba6' (2023-07-27)
True

The change which broke things is broke as expected:

[connorbaker@nixos-desktop:~/nix-cuda-test]$ nix run .#torch-cuda-is-available --override-input nixpkgs github:nixos/nixpkgs/03e72e46cbbde146046bbc85fca41e90bcebad57
warning: Git tree '/home/connorbaker/nix-cuda-test' is dirty
warning: not writing modified lock file of flake 'git+file:///home/connorbaker/nix-cuda-test':
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/cf795c35039b544a66a29a9ad3cd4d37bd1d6cb3' (2023-08-21)
  → 'github:nixos/nixpkgs/03e72e46cbbde146046bbc85fca41e90bcebad57' (2023-08-06)
False

And with this PR, it's fixed again!

[connorbaker@nixos-desktop:~/nix-cuda-test]$ nix run .#torch-cuda-is-available --override-input nixpkgs github:nixos/nixpkgs/065f90d25c2265f6daf2fe2ba4c56c579eba3d48
warning: Git tree '/home/connorbaker/nix-cuda-test' is dirty
warning: not writing modified lock file of flake 'git+file:///home/connorbaker/nix-cuda-test':
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/cf795c35039b544a66a29a9ad3cd4d37bd1d6cb3' (2023-08-21)
  → 'github:nixos/nixpkgs/065f90d25c2265f6daf2fe2ba4c56c579eba3d48' (2023-08-22)
True

@ConnorBaker ConnorBaker merged commit 46e5286 into NixOS:master Aug 22, 2023
11 checks passed
@de11n
Copy link

de11n commented Aug 22, 2023

Thanks for the fix!

@de11n
Copy link

de11n commented Aug 22, 2023

I didn't test if @de11n's original use-case (static libraries) is still handled correctly

My original implementation used --print-rpath so I'm confident it works for my use cases. I apparently did not sufficiently test the switch to --print-interpreter.

@@ -2,17 +2,21 @@
# Run addOpenGLRunpath on all dynamically linked, ELF files
echo "Sourcing auto-add-opengl-runpath-hook"

elfHasDynamicSection() {
patchelf --print-rpath "$1" >& /dev/null
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file uses 2-space tabs. But not worth a big rebuild just to fix that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Though I guess occasional inconsistencies are expected when there's no automation to enforce the conventions

@SomeoneSerge
Copy link
Contributor Author

My original implementation used --print-rpath

The more silly of me... :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants