Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudaPackages: fix #220357; use -Xfatbin=-compress-all; prune default cudaCapabilities #220402

Merged
merged 1 commit into from
Mar 13, 2023

Conversation

ConnorBaker
Copy link
Contributor

@ConnorBaker ConnorBaker commented Mar 9, 2023

Description of changes
  • adds -Xfatbin=-compress-all to NVCC_PREPEND_FLAGS to help ensure the generated libraries aren't so large that they fail to link
    • has the added benefit of making the NARs much smaller, and reducing closure size as well
  • prunes the default cudaCapabilities without removing any of the existing capabilities by introducing a dontDefaultAfter attribute to gpus.nix
    • this way, users can specifically target supported capabilities we don't build for by default
  • excludes 8.7 from the default capabilities because it is the Orin series of devices, which are very different
  • Minor cleanup of gpus.nix by making maxCudaVersion a string or null, indicating there is no maximum supported version
    • This change allows us to add new versions of CUDA without having to manually bump a bunch of maxCudaVersion attributes in gpus.nix every time
  • Fixes Build failure: magma #220357

We're experiencing the same problems Apache MXNet did due to the the number of capabilities we build for currently: apache/mxnet#19123. Unfortunately, just using -Xfatbin=-compress-all isn't enough -- the number of CUDA capabilities we support is still so large that, without some pruning, Magma will fail to build.

I think it'd be fair to remove support for Kepler and Maxwell since they're 11 and 9 years old, respectively. Kepler has been deprecated for most of CUDA 11 and is fully removed as of CUDA 12. Maxwell is still supported in CUDA 12, but they are increasingly old and rare.

Before:

nix-repl> legacyPackages.x86_64-linux.cudaPackages.cudaFlags.cudaCapabilities
[ "3.5" "3.7" "5.0" "5.2" "5.3" "6.0" "6.1" "6.2" "7.0" "7.2" "7.5" "8.0" "8.6" "8.7" ]

After:

nix-repl> legacyPackages.x86_64-linux.cudaPackages.cudaFlags.cudaCapabilities
[ "6.0" "6.1" "6.2" "7.0" "7.2" "7.5" "8.0" "8.6" ]

An added benefit of this PR: the closure and NAR serializations of CUDA-binaries and libraries are smaller.

Closure size with cudaCapabilities = [ "8.6" ];

From

/nix/store/ib9ckb796i7nqj0fy6iankvhyfc6az70-magma-2.7.1 429.4M 2.6G

to

/nix/store/r9dc4yf6nq17dl8bdav4hy7q0vd80s64-magma-2.7.1 233.6M 2.4G

With the following example (which, admittedly is just targeting a single capability), the size of the Magma NAR is nearly halved, from 429.4M to 233.6M.

With

# ~/.config/nixpkgs/config.nix 
{
  allowUnfree = true;
  cudaSupport = true;
  cudaCapabilities = [ "8.6" ];
  cudaForwardCompat = false;
}

the before closure:

/nix/store/qmnr18aqd08zdkhka695ici96k6nzirv-libunistring-1.0         	   1.7M	   1.7M
/nix/store/vv6rlzln7vhxk519rdsrzmhhlpyb5q2m-libidn2-2.3.2            	 254.1K	   2.0M
/nix/store/76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224           	  28.9M	  30.8M
/nix/store/205vsmbfhq1q2vhgskpqyymqvba4mscp-gcc-11.3.0-lib           	   7.5M	  38.3M
/nix/store/2w4k8nvdyiggz717ygbbxchpnxrqc6y9-gcc-12.2.0-lib           	   7.8M	  38.6M
/nix/store/x6nnam5hk44mljbk782rcbd92jlnz8r6-pcre-8.45                	 514.4K	  31.3M
/nix/store/4vkv3rzky44hp2b8r13d8hr4ykvqhvwh-gnugrep-3.7              	 773.2K	  32.1M
/nix/store/5ynbf6wszmggr0abwifdagrixgnya5vy-bash-5.2-p15             	   1.6M	  32.4M
/nix/store/mg9l7phyhvi16p9g8g3g8fbyj4mr79gq-zlib-1.2.13              	 125.6K	  31.0M
/nix/store/6g6d4la7xsizvr8qg91f56jiqx149iqq-binutils-2.40-lib        	   2.7M	  33.7M
/nix/store/izdfacm4wfmfjbgl1k3qlfjslkkrb2kg-gfortran-12.2.0-lib      	  10.7M	  41.6M
/nix/store/yzbam1rxs24z2apzpdpgqwi5fcwbid6z-openblas-0.3.21          	  26.1M	  67.7M
/nix/store/azvwwb4994qm7b1fcz8gc3gnlf8zwzi4-blas-3                   	  53.3M	 121.0M
/nix/store/ia34rbsa6d2dalzk5f7hy5jp9zazpv24-gmp-with-cxx-6.2.1       	 729.2K	  39.3M
/nix/store/b8wwwcng8c0snvmcawvhiny4b2gr6yhh-mpfr-4.2.0               	 774.0K	  40.1M
/nix/store/jn9kg98dsaajx4mh95rb9r5rf2idglqh-attr-2.5.1               	  78.8K	  30.9M
/nix/store/bw9s084fzmb5h40x98mfry25blj4cr9r-acl-2.3.1                	 108.9K	  31.0M
/nix/store/jvl8dr21nrwhqywwxcl8di4j55765gvy-gmp-with-cxx-stage4-6.2.1	 730.4K	  39.3M
/nix/store/bg8f47vihykgqcgblxkfk9sbvc4dnksa-coreutils-9.1            	   1.4M	  40.9M
/nix/store/lp8qrhb6hs42jwbapzq20l05jf4kyicq-glibc-2.35-224-bin       	   3.0M	  33.8M
/nix/store/wyzaa007anaxxhmrcbffm597v14n2mxs-linux-headers-6.1        	   6.0M	   6.0M
/nix/store/pqnd39aq2sksad2zvswjcpkqdc7ig3f9-glibc-2.35-224-dev       	   2.2M	  42.0M
/nix/store/fhzz4yrdy17czwc9i4swhlpcp445inzb-binutils-2.40            	  28.2M	  69.6M
/nix/store/xnrwkaidhxjsb70c2bnrl6j0mmr0y3qg-expand-response-params   	  16.4K	  30.9M
/nix/store/qkshx57xqsr4g4v8ga0jn8jrgnyaam3f-binutils-wrapper-2.40    	  48.5K	  84.8M
/nix/store/f6172g5agk13134pdx5hl1qjmw8a4sdw-libmpc-1.3.1             	 273.4K	  40.4M
/nix/store/fdfcva0q4zgnm1gpc7wmz4cgq7c3hxx1-isl-0.20                 	   2.5M	  41.8M
/nix/store/vns64fmwpqz8himgfdy8h7i29h9gbryc-gcc-11.3.0               	 180.6M	 242.2M
/nix/store/ds6ivg31k3l0pjhhf3s769bkpmafa54g-gcc-wrapper-11.3.0       	  55.2K	 278.4M
/nix/store/l0d6c6wa5sk4wx70x77k3d5clmny7sw6-cuda_cudart-11.7.60      	   6.2M	   6.2M
/nix/store/mxcbimi0gs021wwyfik63km6v26pzhp8-libcublas-11.10.1.25     	   1.1G	   1.2G
/nix/store/ss8jw175bhs3cmlzmmhsbpqihy457ids-libcusparse-11.7.3.50    	 488.1M	 488.1M
/nix/store/rbm214ji7jqk58g8shhkmhvl6xflblz0-cuda_cupti-11.7.50       	  88.4M	  88.4M
/nix/store/virnlrgvysrn1nl6bra798fy6x731nj0-cuda_nvprof-11.7.50      	   9.9M	 129.1M
/nix/store/zrpivbb39kiyij6dns4pz4wh8nik6m3y-cuda_nvcc-11.7.64        	 114.8M	 393.2M
/nix/store/mpj8ij172g466nncmnfqpz694pa746dr-cuda-redist-11.7         	  63.0K	   2.1G
/nix/store/ib9ckb796i7nqj0fy6iankvhyfc6az70-magma-2.7.1              	 429.4M	   2.6G

and afterwards:

/nix/store/qmnr18aqd08zdkhka695ici96k6nzirv-libunistring-1.0         	   1.7M	   1.7M
/nix/store/vv6rlzln7vhxk519rdsrzmhhlpyb5q2m-libidn2-2.3.2            	 254.1K	   2.0M
/nix/store/76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224           	  28.9M	  30.8M
/nix/store/205vsmbfhq1q2vhgskpqyymqvba4mscp-gcc-11.3.0-lib           	   7.5M	  38.3M
/nix/store/2w4k8nvdyiggz717ygbbxchpnxrqc6y9-gcc-12.2.0-lib           	   7.8M	  38.6M
/nix/store/x6nnam5hk44mljbk782rcbd92jlnz8r6-pcre-8.45                	 514.4K	  31.3M
/nix/store/4vkv3rzky44hp2b8r13d8hr4ykvqhvwh-gnugrep-3.7              	 773.2K	  32.1M
/nix/store/5ynbf6wszmggr0abwifdagrixgnya5vy-bash-5.2-p15             	   1.6M	  32.4M
/nix/store/mg9l7phyhvi16p9g8g3g8fbyj4mr79gq-zlib-1.2.13              	 125.6K	  31.0M
/nix/store/6g6d4la7xsizvr8qg91f56jiqx149iqq-binutils-2.40-lib        	   2.7M	  33.7M
/nix/store/izdfacm4wfmfjbgl1k3qlfjslkkrb2kg-gfortran-12.2.0-lib      	  10.7M	  41.6M
/nix/store/yzbam1rxs24z2apzpdpgqwi5fcwbid6z-openblas-0.3.21          	  26.1M	  67.7M
/nix/store/azvwwb4994qm7b1fcz8gc3gnlf8zwzi4-blas-3                   	  53.3M	 121.0M
/nix/store/ia34rbsa6d2dalzk5f7hy5jp9zazpv24-gmp-with-cxx-6.2.1       	 729.2K	  39.3M
/nix/store/b8wwwcng8c0snvmcawvhiny4b2gr6yhh-mpfr-4.2.0               	 774.0K	  40.1M
/nix/store/jn9kg98dsaajx4mh95rb9r5rf2idglqh-attr-2.5.1               	  78.8K	  30.9M
/nix/store/bw9s084fzmb5h40x98mfry25blj4cr9r-acl-2.3.1                	 108.9K	  31.0M
/nix/store/jvl8dr21nrwhqywwxcl8di4j55765gvy-gmp-with-cxx-stage4-6.2.1	 730.4K	  39.3M
/nix/store/bg8f47vihykgqcgblxkfk9sbvc4dnksa-coreutils-9.1            	   1.4M	  40.9M
/nix/store/lp8qrhb6hs42jwbapzq20l05jf4kyicq-glibc-2.35-224-bin       	   3.0M	  33.8M
/nix/store/wyzaa007anaxxhmrcbffm597v14n2mxs-linux-headers-6.1        	   6.0M	   6.0M
/nix/store/pqnd39aq2sksad2zvswjcpkqdc7ig3f9-glibc-2.35-224-dev       	   2.2M	  42.0M
/nix/store/fhzz4yrdy17czwc9i4swhlpcp445inzb-binutils-2.40            	  28.2M	  69.6M
/nix/store/xnrwkaidhxjsb70c2bnrl6j0mmr0y3qg-expand-response-params   	  16.4K	  30.9M
/nix/store/qkshx57xqsr4g4v8ga0jn8jrgnyaam3f-binutils-wrapper-2.40    	  48.5K	  84.8M
/nix/store/f6172g5agk13134pdx5hl1qjmw8a4sdw-libmpc-1.3.1             	 273.4K	  40.4M
/nix/store/fdfcva0q4zgnm1gpc7wmz4cgq7c3hxx1-isl-0.20                 	   2.5M	  41.8M
/nix/store/vns64fmwpqz8himgfdy8h7i29h9gbryc-gcc-11.3.0               	 180.6M	 242.2M
/nix/store/ds6ivg31k3l0pjhhf3s769bkpmafa54g-gcc-wrapper-11.3.0       	  55.2K	 278.4M
/nix/store/l0d6c6wa5sk4wx70x77k3d5clmny7sw6-cuda_cudart-11.7.60      	   6.2M	   6.2M
/nix/store/mxcbimi0gs021wwyfik63km6v26pzhp8-libcublas-11.10.1.25     	   1.1G	   1.2G
/nix/store/q8dcnsj9nh10z4vzfpdsmqgxr2q3ppkj-cuda_nvcc-11.7.64        	 114.8M	 393.2M
/nix/store/ss8jw175bhs3cmlzmmhsbpqihy457ids-libcusparse-11.7.3.50    	 488.1M	 488.1M
/nix/store/rbm214ji7jqk58g8shhkmhvl6xflblz0-cuda_cupti-11.7.50       	  88.4M	  88.4M
/nix/store/virnlrgvysrn1nl6bra798fy6x731nj0-cuda_nvprof-11.7.50      	   9.9M	 129.1M
/nix/store/val7mwg8inz92ijwylcb6pm3mwr5qidx-cuda-redist-11.7         	  63.0K	   2.1G
/nix/store/r9dc4yf6nq17dl8bdav4hy7q0vd80s64-magma-2.7.1              	 233.6M	   2.4G

Closure size with cudaCapabilities = [ "6.0" "6.1" "6.2" "7.0" "7.2" "7.5" "8.0" "8.6" ];

NOTE: Even with the reduced number of capabilities this PR introduces, without -Xfatbin=-compress-all Magma still fails to build.

Master failed to build so there's only an after for this example:

/nix/store/dhqykhps4khbzr8aks84qlyxnhqxi066-magma-2.7.1 1.7G 3.9G

With a config to mimic the cudaCapabilities this PR would generate

# ~/.config/nixpkgs/config.nix 
{
  allowUnfree = true;
  cudaSupport = true;
  cudaCapabilities = [ "6.0" "6.1" "6.2" "7.0" "7.2" "7.5" "8.0" "8.6" ];
}

the build of Magma against master failed with the linking error again! The build against this PR however, succeeded:

/nix/store/qmnr18aqd08zdkhka695ici96k6nzirv-libunistring-1.0         	   1.7M	   1.7M
/nix/store/vv6rlzln7vhxk519rdsrzmhhlpyb5q2m-libidn2-2.3.2            	 254.1K	   2.0M
/nix/store/76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224           	  28.9M	  30.8M
/nix/store/205vsmbfhq1q2vhgskpqyymqvba4mscp-gcc-11.3.0-lib           	   7.5M	  38.3M
/nix/store/2w4k8nvdyiggz717ygbbxchpnxrqc6y9-gcc-12.2.0-lib           	   7.8M	  38.6M
/nix/store/x6nnam5hk44mljbk782rcbd92jlnz8r6-pcre-8.45                	 514.4K	  31.3M
/nix/store/4vkv3rzky44hp2b8r13d8hr4ykvqhvwh-gnugrep-3.7              	 773.2K	  32.1M
/nix/store/5ynbf6wszmggr0abwifdagrixgnya5vy-bash-5.2-p15             	   1.6M	  32.4M
/nix/store/mg9l7phyhvi16p9g8g3g8fbyj4mr79gq-zlib-1.2.13              	 125.6K	  31.0M
/nix/store/6g6d4la7xsizvr8qg91f56jiqx149iqq-binutils-2.40-lib        	   2.7M	  33.7M
/nix/store/izdfacm4wfmfjbgl1k3qlfjslkkrb2kg-gfortran-12.2.0-lib      	  10.7M	  41.6M
/nix/store/yzbam1rxs24z2apzpdpgqwi5fcwbid6z-openblas-0.3.21          	  26.1M	  67.7M
/nix/store/azvwwb4994qm7b1fcz8gc3gnlf8zwzi4-blas-3                   	  53.3M	 121.0M
/nix/store/ia34rbsa6d2dalzk5f7hy5jp9zazpv24-gmp-with-cxx-6.2.1       	 729.2K	  39.3M
/nix/store/b8wwwcng8c0snvmcawvhiny4b2gr6yhh-mpfr-4.2.0               	 774.0K	  40.1M
/nix/store/jn9kg98dsaajx4mh95rb9r5rf2idglqh-attr-2.5.1               	  78.8K	  30.9M
/nix/store/bw9s084fzmb5h40x98mfry25blj4cr9r-acl-2.3.1                	 108.9K	  31.0M
/nix/store/jvl8dr21nrwhqywwxcl8di4j55765gvy-gmp-with-cxx-stage4-6.2.1	 730.4K	  39.3M
/nix/store/bg8f47vihykgqcgblxkfk9sbvc4dnksa-coreutils-9.1            	   1.4M	  40.9M
/nix/store/l0d6c6wa5sk4wx70x77k3d5clmny7sw6-cuda_cudart-11.7.60      	   6.2M	   6.2M
/nix/store/mxcbimi0gs021wwyfik63km6v26pzhp8-libcublas-11.10.1.25     	   1.1G	   1.2G
/nix/store/lp8qrhb6hs42jwbapzq20l05jf4kyicq-glibc-2.35-224-bin       	   3.0M	  33.8M
/nix/store/wyzaa007anaxxhmrcbffm597v14n2mxs-linux-headers-6.1        	   6.0M	   6.0M
/nix/store/pqnd39aq2sksad2zvswjcpkqdc7ig3f9-glibc-2.35-224-dev       	   2.2M	  42.0M
/nix/store/fhzz4yrdy17czwc9i4swhlpcp445inzb-binutils-2.40            	  28.2M	  69.6M
/nix/store/xnrwkaidhxjsb70c2bnrl6j0mmr0y3qg-expand-response-params   	  16.4K	  30.9M
/nix/store/qkshx57xqsr4g4v8ga0jn8jrgnyaam3f-binutils-wrapper-2.40    	  48.5K	  84.8M
/nix/store/f6172g5agk13134pdx5hl1qjmw8a4sdw-libmpc-1.3.1             	 273.4K	  40.4M
/nix/store/fdfcva0q4zgnm1gpc7wmz4cgq7c3hxx1-isl-0.20                 	   2.5M	  41.8M
/nix/store/vns64fmwpqz8himgfdy8h7i29h9gbryc-gcc-11.3.0               	 180.6M	 242.2M
/nix/store/ds6ivg31k3l0pjhhf3s769bkpmafa54g-gcc-wrapper-11.3.0       	  55.2K	 278.4M
/nix/store/q8dcnsj9nh10z4vzfpdsmqgxr2q3ppkj-cuda_nvcc-11.7.64        	 114.8M	 393.2M
/nix/store/ss8jw175bhs3cmlzmmhsbpqihy457ids-libcusparse-11.7.3.50    	 488.1M	 488.1M
/nix/store/rbm214ji7jqk58g8shhkmhvl6xflblz0-cuda_cupti-11.7.50       	  88.4M	  88.4M
/nix/store/virnlrgvysrn1nl6bra798fy6x731nj0-cuda_nvprof-11.7.50      	   9.9M	 129.1M
/nix/store/val7mwg8inz92ijwylcb6pm3mwr5qidx-cuda-redist-11.7         	  63.0K	   2.1G
/nix/store/dhqykhps4khbzr8aks84qlyxnhqxi066-magma-2.7.1              	   1.7G	   3.9G

Closure size with cudaCapabilities = [ "7.5" "8.0" "8.6" ];

From

/nix/store/8q090bm2r3n4b58ygsbsnlj68rxb92vx-magma-2.7.1 1.2G 3.4G

to

/nix/store/1f78c0ihfrxdfbmc56jp4s9ql4kdd3bw-magma-2.7.1 659.9M 2.8G

And one last comparison, this time building against only three capabilities so master can succeed as well. With this configuration:

# ~/.config/nixpkgs/config.nix 
{
  allowUnfree = true;
  cudaSupport = true;
  cudaCapabilities = [ "7.5" "8.0" "8.6" ];
}

Before closure:

/nix/store/qmnr18aqd08zdkhka695ici96k6nzirv-libunistring-1.0         	   1.7M	   1.7M
/nix/store/vv6rlzln7vhxk519rdsrzmhhlpyb5q2m-libidn2-2.3.2            	 254.1K	   2.0M
/nix/store/76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224           	  28.9M	  30.8M
/nix/store/205vsmbfhq1q2vhgskpqyymqvba4mscp-gcc-11.3.0-lib           	   7.5M	  38.3M
/nix/store/2w4k8nvdyiggz717ygbbxchpnxrqc6y9-gcc-12.2.0-lib           	   7.8M	  38.6M
/nix/store/x6nnam5hk44mljbk782rcbd92jlnz8r6-pcre-8.45                	 514.4K	  31.3M
/nix/store/4vkv3rzky44hp2b8r13d8hr4ykvqhvwh-gnugrep-3.7              	 773.2K	  32.1M
/nix/store/5ynbf6wszmggr0abwifdagrixgnya5vy-bash-5.2-p15             	   1.6M	  32.4M
/nix/store/mg9l7phyhvi16p9g8g3g8fbyj4mr79gq-zlib-1.2.13              	 125.6K	  31.0M
/nix/store/6g6d4la7xsizvr8qg91f56jiqx149iqq-binutils-2.40-lib        	   2.7M	  33.7M
/nix/store/izdfacm4wfmfjbgl1k3qlfjslkkrb2kg-gfortran-12.2.0-lib      	  10.7M	  41.6M
/nix/store/yzbam1rxs24z2apzpdpgqwi5fcwbid6z-openblas-0.3.21          	  26.1M	  67.7M
/nix/store/azvwwb4994qm7b1fcz8gc3gnlf8zwzi4-blas-3                   	  53.3M	 121.0M
/nix/store/l0d6c6wa5sk4wx70x77k3d5clmny7sw6-cuda_cudart-11.7.60      	   6.2M	   6.2M
/nix/store/mxcbimi0gs021wwyfik63km6v26pzhp8-libcublas-11.10.1.25     	   1.1G	   1.2G
/nix/store/ss8jw175bhs3cmlzmmhsbpqihy457ids-libcusparse-11.7.3.50    	 488.1M	 488.1M
/nix/store/rbm214ji7jqk58g8shhkmhvl6xflblz0-cuda_cupti-11.7.50       	  88.4M	  88.4M
/nix/store/virnlrgvysrn1nl6bra798fy6x731nj0-cuda_nvprof-11.7.50      	   9.9M	 129.1M
/nix/store/jn9kg98dsaajx4mh95rb9r5rf2idglqh-attr-2.5.1               	  78.8K	  30.9M
/nix/store/bw9s084fzmb5h40x98mfry25blj4cr9r-acl-2.3.1                	 108.9K	  31.0M
/nix/store/jvl8dr21nrwhqywwxcl8di4j55765gvy-gmp-with-cxx-stage4-6.2.1	 730.4K	  39.3M
/nix/store/bg8f47vihykgqcgblxkfk9sbvc4dnksa-coreutils-9.1            	   1.4M	  40.9M
/nix/store/lp8qrhb6hs42jwbapzq20l05jf4kyicq-glibc-2.35-224-bin       	   3.0M	  33.8M
/nix/store/wyzaa007anaxxhmrcbffm597v14n2mxs-linux-headers-6.1        	   6.0M	   6.0M
/nix/store/pqnd39aq2sksad2zvswjcpkqdc7ig3f9-glibc-2.35-224-dev       	   2.2M	  42.0M
/nix/store/fhzz4yrdy17czwc9i4swhlpcp445inzb-binutils-2.40            	  28.2M	  69.6M
/nix/store/xnrwkaidhxjsb70c2bnrl6j0mmr0y3qg-expand-response-params   	  16.4K	  30.9M
/nix/store/qkshx57xqsr4g4v8ga0jn8jrgnyaam3f-binutils-wrapper-2.40    	  48.5K	  84.8M
/nix/store/ia34rbsa6d2dalzk5f7hy5jp9zazpv24-gmp-with-cxx-6.2.1       	 729.2K	  39.3M
/nix/store/b8wwwcng8c0snvmcawvhiny4b2gr6yhh-mpfr-4.2.0               	 774.0K	  40.1M
/nix/store/f6172g5agk13134pdx5hl1qjmw8a4sdw-libmpc-1.3.1             	 273.4K	  40.4M
/nix/store/fdfcva0q4zgnm1gpc7wmz4cgq7c3hxx1-isl-0.20                 	   2.5M	  41.8M
/nix/store/vns64fmwpqz8himgfdy8h7i29h9gbryc-gcc-11.3.0               	 180.6M	 242.2M
/nix/store/ds6ivg31k3l0pjhhf3s769bkpmafa54g-gcc-wrapper-11.3.0       	  55.2K	 278.4M
/nix/store/zrpivbb39kiyij6dns4pz4wh8nik6m3y-cuda_nvcc-11.7.64        	 114.8M	 393.2M
/nix/store/mpj8ij172g466nncmnfqpz694pa746dr-cuda-redist-11.7         	  63.0K	   2.1G
/nix/store/8q090bm2r3n4b58ygsbsnlj68rxb92vx-magma-2.7.1              	   1.2G	   3.4G

After closure:

/nix/store/qmnr18aqd08zdkhka695ici96k6nzirv-libunistring-1.0         	   1.7M	   1.7M
/nix/store/vv6rlzln7vhxk519rdsrzmhhlpyb5q2m-libidn2-2.3.2            	 254.1K	   2.0M
/nix/store/76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224           	  28.9M	  30.8M
/nix/store/205vsmbfhq1q2vhgskpqyymqvba4mscp-gcc-11.3.0-lib           	   7.5M	  38.3M
/nix/store/izdfacm4wfmfjbgl1k3qlfjslkkrb2kg-gfortran-12.2.0-lib      	  10.7M	  41.6M
/nix/store/yzbam1rxs24z2apzpdpgqwi5fcwbid6z-openblas-0.3.21          	  26.1M	  67.7M
/nix/store/azvwwb4994qm7b1fcz8gc3gnlf8zwzi4-blas-3                   	  53.3M	 121.0M
/nix/store/l0d6c6wa5sk4wx70x77k3d5clmny7sw6-cuda_cudart-11.7.60      	   6.2M	   6.2M
/nix/store/mxcbimi0gs021wwyfik63km6v26pzhp8-libcublas-11.10.1.25     	   1.1G	   1.2G
/nix/store/5ynbf6wszmggr0abwifdagrixgnya5vy-bash-5.2-p15             	   1.6M	  32.4M
/nix/store/x6nnam5hk44mljbk782rcbd92jlnz8r6-pcre-8.45                	 514.4K	  31.3M
/nix/store/4vkv3rzky44hp2b8r13d8hr4ykvqhvwh-gnugrep-3.7              	 773.2K	  32.1M
/nix/store/jn9kg98dsaajx4mh95rb9r5rf2idglqh-attr-2.5.1               	  78.8K	  30.9M
/nix/store/bw9s084fzmb5h40x98mfry25blj4cr9r-acl-2.3.1                	 108.9K	  31.0M
/nix/store/2w4k8nvdyiggz717ygbbxchpnxrqc6y9-gcc-12.2.0-lib           	   7.8M	  38.6M
/nix/store/jvl8dr21nrwhqywwxcl8di4j55765gvy-gmp-with-cxx-stage4-6.2.1	 730.4K	  39.3M
/nix/store/bg8f47vihykgqcgblxkfk9sbvc4dnksa-coreutils-9.1            	   1.4M	  40.9M
/nix/store/lp8qrhb6hs42jwbapzq20l05jf4kyicq-glibc-2.35-224-bin       	   3.0M	  33.8M
/nix/store/wyzaa007anaxxhmrcbffm597v14n2mxs-linux-headers-6.1        	   6.0M	   6.0M
/nix/store/pqnd39aq2sksad2zvswjcpkqdc7ig3f9-glibc-2.35-224-dev       	   2.2M	  42.0M
/nix/store/mg9l7phyhvi16p9g8g3g8fbyj4mr79gq-zlib-1.2.13              	 125.6K	  31.0M
/nix/store/6g6d4la7xsizvr8qg91f56jiqx149iqq-binutils-2.40-lib        	   2.7M	  33.7M
/nix/store/fhzz4yrdy17czwc9i4swhlpcp445inzb-binutils-2.40            	  28.2M	  69.6M
/nix/store/xnrwkaidhxjsb70c2bnrl6j0mmr0y3qg-expand-response-params   	  16.4K	  30.9M
/nix/store/qkshx57xqsr4g4v8ga0jn8jrgnyaam3f-binutils-wrapper-2.40    	  48.5K	  84.8M
/nix/store/ia34rbsa6d2dalzk5f7hy5jp9zazpv24-gmp-with-cxx-6.2.1       	 729.2K	  39.3M
/nix/store/b8wwwcng8c0snvmcawvhiny4b2gr6yhh-mpfr-4.2.0               	 774.0K	  40.1M
/nix/store/f6172g5agk13134pdx5hl1qjmw8a4sdw-libmpc-1.3.1             	 273.4K	  40.4M
/nix/store/fdfcva0q4zgnm1gpc7wmz4cgq7c3hxx1-isl-0.20                 	   2.5M	  41.8M
/nix/store/vns64fmwpqz8himgfdy8h7i29h9gbryc-gcc-11.3.0               	 180.6M	 242.2M
/nix/store/ds6ivg31k3l0pjhhf3s769bkpmafa54g-gcc-wrapper-11.3.0       	  55.2K	 278.4M
/nix/store/q8dcnsj9nh10z4vzfpdsmqgxr2q3ppkj-cuda_nvcc-11.7.64        	 114.8M	 393.2M
/nix/store/ss8jw175bhs3cmlzmmhsbpqihy457ids-libcusparse-11.7.3.50    	 488.1M	 488.1M
/nix/store/rbm214ji7jqk58g8shhkmhvl6xflblz0-cuda_cupti-11.7.50       	  88.4M	  88.4M
/nix/store/virnlrgvysrn1nl6bra798fy6x731nj0-cuda_nvprof-11.7.50      	   9.9M	 129.1M
/nix/store/val7mwg8inz92ijwylcb6pm3mwr5qidx-cuda-redist-11.7         	  63.0K	   2.1G
/nix/store/1f78c0ihfrxdfbmc56jp4s9ql4kdd3bw-magma-2.7.1              	 659.9M	   2.8G
Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.05 Release Notes (or backporting 22.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@ConnorBaker ConnorBaker self-assigned this Mar 9, 2023
@ConnorBaker ConnorBaker force-pushed the fix/cuda-nvcc-compress-fatbins branch 3 times, most recently from a84ce56 to c6888a7 Compare March 10, 2023 02:19
@ConnorBaker ConnorBaker changed the title cudaPackages: always use -Xfatbin=-compress-all cudaPackages: use -Xfatbin=-compress-all; prune default cudaCapabilities; fix #220357 Mar 10, 2023
@ConnorBaker ConnorBaker changed the title cudaPackages: use -Xfatbin=-compress-all; prune default cudaCapabilities; fix #220357 cudaPackages: fix #220357; use -Xfatbin=-compress-all; prune default cudaCapabilities Mar 10, 2023
@ConnorBaker ConnorBaker marked this pull request as ready for review March 10, 2023 03:24
@ConnorBaker
Copy link
Contributor Author

cc @NixOS/cuda-maintainers

@samuela
Copy link
Member

samuela commented Mar 13, 2023

Result of nixpkgs-review pr 220402 run on x86_64-linux 1

7 packages marked as broken and skipped:
  • python310Packages.caffeWithCuda
  • python310Packages.distrax
  • python310Packages.mask-rcnn
  • python310Packages.optuna
  • python310Packages.rl-coach
  • python310Packages.rlax
  • truecrack-cuda
31 packages failed to build:
  • cudaPackages.tensorrt (cudaPackages.tensorrt_8_4_0)
  • gpt2tc
  • haskellPackages.tensorflow
  • haskellPackages.tensorflow-core-ops
  • haskellPackages.tensorflow-logging
  • haskellPackages.tensorflow-ops
  • libtensorflow
  • mathematica-cuda
  • python310Packages.baselines
  • python310Packages.dalle-mini
  • python310Packages.dm-sonnet
  • python310Packages.edward
  • python310Packages.elegy
  • python310Packages.flax
  • python310Packages.gpt-2-simple
  • python310Packages.n3fit
  • python310Packages.pot
  • python310Packages.pymanopt
  • python310Packages.scikit-tda
  • python310Packages.tensorflow (python310Packages.tensorflow-build ,python310Packages.tensorflowWithoutCuda)
  • python310Packages.tensorflow-datasets
  • python310Packages.tensorflow-probability
  • python310Packages.tensorrt
  • python310Packages.tflearn
  • python310Packages.treex
  • python310Packages.trfl
  • python310Packages.umap-learn
  • python310Packages.vqgan-jax
  • python311Packages.tensorrt
  • tests.pkg-config.defaultPkgConfigPackages.tensorflow
  • tts
36 packages built:
  • caffeWithCuda
  • colmapWithCuda
  • cudaPackages.cuda_nvcc
  • cudatoolkit (cudaPackages.cudatoolkit ,cudatoolkit_11)
  • cudaPackages.cutensor
  • cudaPackages.nccl
  • forge
  • gpu-burn
  • gpu-screen-recorder
  • gpu-screen-recorder-gtk
  • gromacsCudaMpi
  • gwe
  • hip-nvidia
  • katagoWithCuda
  • librealsenseWithCuda
  • magma (magma-cuda ,magma_2_7_1)
  • magma_2_6_2
  • nvtop
  • nvtop-nvidia
  • python310Packages.TheanoWithCuda
  • python310Packages.cupy
  • python310Packages.jaxlibWithCuda
  • python310Packages.numbaWithCuda
  • python310Packages.pycuda
  • python310Packages.pynvml
  • python310Packages.pyrealsense2WithCuda
  • python310Packages.tensorflowWithCuda
  • python310Packages.torchWithCuda
  • python311Packages.TheanoWithCuda
  • python311Packages.cupy
  • python311Packages.jaxlibWithCuda
  • python311Packages.pycuda
  • python311Packages.pynvml
  • python311Packages.pyrealsense2WithCuda
  • xgboostWithCuda
  • xpraWithNvenc

@samuela
Copy link
Member

samuela commented Mar 13, 2023

Failures:

$ nix --experimental-features nix-command build --no-link --keep-going --no-allow-import-from-derivation --option build-use-sandbox relaxed -f /home/sam.ainsworth/.cache/nixpkgs-review/pr-220402/build.nix --max-jobs 32 --cores 48
error: builder for '/nix/store/rzi92yd2ckr8w8migm0xjndrwaf7ybbj-Mathematica_13.2.1_BNDL_LINUX.sh.drv' failed with exit code 1;
       last 8 log lines:
       >
       > ***
       > This nix expression requires that Mathematica_13.2.1_BNDL_LINUX.sh is
       > already part of the store. Find the file on your Mathematica CD
       > and add it to the nix store with nix-store --add-fixed sha256 <FILE>.
       >
       > ***
       >
       For full logs, run 'nix log /nix/store/rzi92yd2ckr8w8migm0xjndrwaf7ybbj-Mathematica_13.2.1_BNDL_LINUX.sh.drv'.
error: builder for '/nix/store/58j9s2fcvpg84p1i6z1l0g55wqgwr1ym-TensorRT-8.4.0.6.Linux.x86_64-gnu.cuda-11.6.cudnn8.3.tar.gz.drv' failed with exit code 1;
       last 10 log lines:
       > download the 8.4.0.6 Linux x86_64 TAR package for CUDA 11.7 from
       > https://developer.nvidia.com/tensorrt.
       >
       > Once you have downloaded the file, add it to the store with the following
       > command, and try building this derivation again.
       >
       > $ nix-store --add-fixed sha256 TensorRT-8.4.0.6.Linux.x86_64-gnu.cuda-11.6.cudnn8.3.tar.gz
       >
       > ***
       >
       For full logs, run 'nix log /nix/store/58j9s2fcvpg84p1i6z1l0g55wqgwr1ym-TensorRT-8.4.0.6.Linux.x86_64-gnu.cuda-11.6.cudnn8.3.tar.gz.drv'.
error: builder for '/nix/store/jn28csn3wzslc1ryjznq1b7rdgl94pil-python3.10-kmapper-2.0.1.drv' failed with exit code 1;
       last 10 log lines:
       >
       >   invalid value encountered in divide
       >
       > -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
       > =========================== short test summary info ============================
       > FAILED test/test_coverer.py::TestCover::test_radius_dist - sklearn.utils._param_validation.InvalidParameterError: The 'feature_range' ...
       > FAILED test/test_mapper.py::TestLens::test_sparse_array - TypeError: np.matrix is not supported. Please convert to a numpy array with...
       > FAILED test/test_mapper.py::TestLens::test_map_sparse - TypeError: np.matrix is not supported. Please convert to a numpy array with...
       > ============ 3 failed, 98 passed, 16 warnings in 164.78s (0:02:44) =============
       > /nix/store/c3f4jdwzn8fm9lp72m91ffw524bakp6v-stdenv-linux/setup: line 1593: pop_var_context: head of shell_variables not a function context
       For full logs, run 'nix log /nix/store/jn28csn3wzslc1ryjznq1b7rdgl94pil-python3.10-kmapper-2.0.1.drv'.
error: 1 dependencies of derivation '/nix/store/yxwq1zp77qhr2lr8c38qpfcs2681ndz5-mathematica-cuda-13.2.1.drv' failed to build
error: 1 dependencies of derivation '/nix/store/8qwcznrxwrw0bnq1bwjqypqnmkbgdqrw-cudatoolkit-11.7-tensorrt-8.4.0.6.drv' failed to build
error: 2 dependencies of derivation '/nix/store/wz48992h9gw9xyhpvkycz4iklmp6jyi5-python3.10-tensorrt-8.4.0.6.drv' failed to build
error: 2 dependencies of derivation '/nix/store/jzav6488djr5zkz39xz64v86nkms1146-python3.11-tensorrt-8.4.0.6.drv' failed to build
error: builder for '/nix/store/m6fvh2bb6m72hswaszaqaay2m8d1qlv0-tensorflow-2.11.0.drv' failed with exit code 137;
       last 10 log lines:
       > [2,295 / 5,696] Compiling tensorflow/core/kernels/scatter_op.cc; 366s local ... (48 actions, 33 running)
       > [2,477 / 5,696] Compiling tensorflow/compiler/xla/service/hlo_evaluator.cc; 177s local ... (48 actions, 33 running)
       > [2,825 / 5,697] Compiling stablehlo/dialect/StablehloOps.cpp; 384s local ... (48 actions, 33 running)
       > [3,152 / 5,697] Compiling tensorflow/compiler/mlir/tensorflow/ir/tf_ops.cc; 248s local ... (48 actions, 33 running)
       > [3,612 / 5,697] Compiling tensorflow/compiler/mlir/tensorflow/ir/tf_ops_a_m.cc; 585s local ... (47 actions, 32 running)
       > [3,889 / 5,697] Compiling tensorflow/compiler/mlir/tensorflow/ir/tf_ops_n_z.cc; 400s local ... (48 actions, 33 running)
       > [4,164 / 5,723] Compiling tensorflow/compiler/mlir/tensorflow/ir/tf_ops_n_z.cc; 935s local ... (48 actions, 32 running)
       > [4,546 / 5,723] Compiling tensorflow/compiler/mlir/tensorflow/ir/tf_ops_n_z.cc; 1545s local ... (48 actions, 33 running)
       > [4,767 / 5,723] Compiling tensorflow/core/kernels/segment_reduction_ops_impl_5.cc; 305s local ... (48 actions, 33 running)
       > /nix/store/c3f4jdwzn8fm9lp72m91ffw524bakp6v-stdenv-linux/setup: line 1637:  6042 Killed                  BAZEL_USE_CPP_ONLY_TOOLCHAIN=1 USER=homeless-shelter bazel --batch --output_base="$bazelOut" --output_user_root="$bazelUserRoot" build --curses=no -j $NIX_BUILD_CORES "${copts[@]}" "${host_copts[@]}" "${linkopts[@]}" "${host_linkopts[@]}" $bazelFlags --config=opt //tensorflow/tools/pip_package:build_pip_package //tensorflow/tools/lib_package:libtensorflow
       For full logs, run 'nix log /nix/store/m6fvh2bb6m72hswaszaqaay2m8d1qlv0-tensorflow-2.11.0.drv'.
error: 1 dependencies of derivation '/nix/store/5brdrq9jm1mcq0lp2s33cq13mzhx0i5z-check-pkg-config-tensorflow.drv' failed to build
error: 1 dependencies of derivation '/nix/store/15bfk4hdm226f2xq9z7dpk8smq3jc2yp-python3.10-tensorflow-2.11.0.drv' failed to build
error: 1 dependencies of derivation '/nix/store/n2ycg220lr6xrwg20494xr3pnklp0qji-tensorflow-0.2.0.1.drv' failed to build
error: 1 dependencies of derivation '/nix/store/cr5rkg91vhhbvfq7j96pvld65inh993w-python3-3.10.10-env.drv' failed to build
error: 1 dependencies of derivation '/nix/store/qdai4mmx298vja7vp7mmmp13j5r2wzrz-python3.10-baselines-0.1.6.drv' failed to build
error: 1 dependencies of derivation '/nix/store/ja46b8i1pdphzh9hbvjnrz3z1sds3c06-python3.10-edward-1.3.5.drv' failed to build
error: 1 dependencies of derivation '/nix/store/x0fxdc2q0hm7ba25kjp98mdzv32ar1rc-python3.10-flax-0.6.1.drv' failed to build
error: 1 dependencies of derivation '/nix/store/1lcfh0bk3ywb0rbxv680saylmllp3qqh-python3.10-gpt-2-simple-0.8.1.drv' failed to build
error: 1 dependencies of derivation '/nix/store/b9bmsswblxd4xvvxm3qgchxgfadwx0bm-python3.10-n3fit-4.0.drv' failed to build
error: 1 dependencies of derivation '/nix/store/9brizfjxal7w6pcrjhyxnl0akglhhqxi-python3.10-pot-0.8.2.drv' failed to build
error: 1 dependencies of derivation '/nix/store/r2hz6qf0jn7c8c9a4g9lmd1pwrqnjngb-python3.10-pymanopt-2.0.1.drv' failed to build
error: 1 dependencies of derivation '/nix/store/pa4c7r71y3h0gi6n4d4y1wg43jf1y2w7-python3.10-tensorflow-datasets-4.8.0.drv' failed to build
error: 1 dependencies of derivation '/nix/store/0kznpfx3rz7mk1j11ki27v87l9xjb5i8-python3.10-tflearn-0.5.0.drv' failed to build
error: 1 dependencies of derivation '/nix/store/3cqisd0bycgfrqw2d5pj7kk1bp1r6wlj-python3.10-umap-learn-0.5.3.drv' failed to build
error: 1 dependencies of derivation '/nix/store/pdsv1q3j1wk9jwbhr2kvac9qs5zv9nfs-tensorflow-core-ops-0.2.0.1.drv' failed to build
error: 1 dependencies of derivation '/nix/store/8c7lz80nm5ajahvhrd5sdsarnxwqbg4m-tensorflow_probability-0.19.0-py2.py3-none-any.whl.drv' failed to build
error: 1 dependencies of derivation '/nix/store/2sf54wc7diaq6h6fb67xqbvkw9d9lc6z-tts-0.11.1.drv' failed to build
error: 1 dependencies of derivation '/nix/store/r9z0gvk72psb00f3b9gx4ic968rszzx9-gpt2tc-2021-04-24.drv' failed to build
error: 1 dependencies of derivation '/nix/store/fkp1nskr88gc6v1c8hz2wbjzlwp60wvw-python3.10-dalle-mini-0.1.1.drv' failed to build
error: 1 dependencies of derivation '/nix/store/5qg7z696rz5vg8abj00n8azcqv3kznp4-python3.10-dm-sonnet-2.0.0.drv' failed to build
error: 2 dependencies of derivation '/nix/store/kalrg805wfgls42zk0nd8ckd1p956jaq-python3.10-scikit-tda-1.0.0.drv' failed to build
error: 2 dependencies of derivation '/nix/store/i2nbgjiipbsvvv069nfbrdnajs82zagz-python3.10-tensorflow_probability-0.19.0.drv' failed to build
error: 2 dependencies of derivation '/nix/store/k95lrj979nl8bnsziisin9c4giy621hq-python3.10-treex-0.6.11.drv' failed to build
error: 1 dependencies of derivation '/nix/store/y611hlmpzw2x39q9y228ai0r3lrs8awn-python3.10-vqgan-jax-unstable-2022-04-20.drv' failed to build
error: 2 dependencies of derivation '/nix/store/9pd0kggrmmab5nnr4zc305xgrsc6wq7n-tensorflow-ops-0.2.0.1.drv' failed to build
error: 2 dependencies of derivation '/nix/store/r7qxmw8nv1al2sliwihd5mv16rkmbrkk-python3.10-elegy-0.8.6.drv' failed to build
error: 2 dependencies of derivation '/nix/store/v82gbxmp3cifas6aqmfjk7w2kgnzv1sd-python3.10-trfl-1.2.0.drv' failed to build
error: 3 dependencies of derivation '/nix/store/l53kmviqsnw4am2sy1g01w9rgcxdkaxa-tensorflow-logging-0.2.0.1.drv' failed to build
error: 31 dependencies of derivation '/nix/store/sq60pbvf1js7a6r2b40dwjvxibkya0wx-env.drv' failed to build
error: 1 dependencies of derivation '/nix/store/gm0i6dqg89v35wayvbsxnzcnn2gli3rw-review-shell.drv' failed to build

look reasonable to me. python310Packages.tensorflow was OOM killed but everything else is fine. python310Packages.tensorflowWithCuda built fine.

@samuela samuela merged commit 13939e2 into NixOS:master Mar 13, 2023
@ConnorBaker ConnorBaker deleted the fix/cuda-nvcc-compress-fatbins branch March 13, 2023 22:38
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-45/26397/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Build failure: magma
3 participants