Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix CUDA build of recent TensorFlow easyconfigs when using compiler symlinks #18235

Merged
merged 1 commit into from
Jul 1, 2023

Conversation

Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Jun 29, 2023

Add the TensorFlow-2.1.0_fix-cuda-build.patch to the TensorFlow-CUDA ECs to fix failure when compilers are on symlinked paths and e.g. ccache or rpath wrappers are used.

Fixes #17892

Followup to #17058

I don't think a full test build is required as we already have experience with that patch and it used to be in 2.8.4. I verified that it applies to a git checkout of 2.9.1 and 2.11.0 as --stop=patch doesn't work for extensions.

Add the TensorFlow-2.1.0_fix-cuda-build.patch to the TensorFlow-CUDA ECs
to fix failure when compilers are on symlinked paths and e.g. ccache or
rpath wrappers are used.

Fixes easybuilders#17892
@branfosj
Copy link
Member

Test report by @branfosj
FAILED
Build succeeded for 2 out of 3 (3 easyconfigs in total)
bear-pg0103u11a.bear.cluster - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz (icelake), 1 x NVIDIA NVIDIA A100-PCIE-40GB, 520.61.05, Python 3.6.8
See https://gist.github.com/branfosj/bc31080326642c2efab56a01b330906f for a full test report.

@Flamefire
Copy link
Contributor Author

Flamefire commented Jun 30, 2023

Test report by @branfosj FAILED

1 failed test in 1 EC without any further information. :-(
I'd still consider that a success for this PR as this is only affecting compilation which worked for all ECs

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
taurusi8007 - Linux CentOS Linux 7.9.2009, x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 470.57.02, Python 2.7.5
See https://gist.github.com/Flamefire/d796d9e002bc1e156a3723c5e5d4b1a0 for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
taurusml3 - Linux RHEL 7.6, POWER, 8335-GTX (power9le), 6 x NVIDIA Tesla V100-SXM2-32GB, 440.64.00, Python 2.7.5
See https://gist.github.com/Flamefire/0342c180e876e32b874cf990c7d2e1ad for a full test report.

@SebastianAchilles
Copy link
Member

Test report by @SebastianAchilles
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
skl-rockylinux-88 - Linux Rocky Linux 8.8, x86_64, Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz (skylake), 1 x NVIDIA NVIDIA RTX A4000, 530.30.02, Python 3.6.8
See https://gist.github.com/SebastianAchilles/8506e590d2a13c69c23a6e919a3d0fab for a full test report.

@branfosj
Copy link
Member

branfosj commented Jul 1, 2023

Test report by @branfosj FAILED

1 failed test in 1 EC without any further information. :-( I'd still consider that a success for this PR as this is only affecting compilation which worked for all ECs

The same failure happens without this PR, so I am happy to proceed here.

@branfosj branfosj added this to the next release (4.7.3?) milestone Jul 1, 2023
@branfosj
Copy link
Member

branfosj commented Jul 1, 2023

Going in, thanks @Flamefire!

@branfosj branfosj merged commit 9bba305 into easybuilders:develop Jul 1, 2023
@Flamefire Flamefire deleted the tf-cuda-fix branch July 1, 2023 14:33
@boegel boegel changed the title Fix CUDA build of TensorFlow when using compiler symlinks Fix CUDA build of recent TensorFlow easyconfigs when using compiler symlinks Jul 5, 2023
@boegel boegel changed the title Fix CUDA build of recent TensorFlow easyconfigs when using compiler symlinks fix CUDA build of recent TensorFlow easyconfigs when using compiler symlinks Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed to build TensorFlow-2.11.0-foss-2022a-CUDA-11.7.0.eb
3 participants