-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{lib}[foss/2022a] TensorFlow v2.9.1 w/ Python 3.10.4 #17092
{lib}[foss/2022a] TensorFlow v2.9.1 w/ Python 3.10.4 #17092
Conversation
Test report by @jfgrimm |
@jfgrimm I'm a bit surprised of this failure:
Also in the source of the failing Are you using any modifications to the toolchain or so which pulls in MKL? Or have |
@Flamefire my site doesn't modify toolchains, and I really shouldn't have any MKL stuff loaded |
Test report by @Flamefire |
Test report by @Flamefire |
Test report by @Flamefire |
Test report by @Flamefire |
I'd just like to second that I'm seeing this same Failure on our system as well. |
Test report by @Flamefire |
Test report by @SebastianAchilles |
I worked with @jfgrimm a bit trying to catch a log of the failing tests but it suddenly succeeded (i.e. no error anymore, all tests green). So if anyone (@VRehnberg ?) is seeing those MKL-related failures consistently, maybe he can run a modified version with I also opened a PR for TF 2.10.1 and TF 2.11.0 The upstream issue I reported is tensorflow/tensorflow#59252 |
yeah no idea what changed, but my builds (both CPU and CUDA) are happy now 🤷 |
I'm rerunning the builds now. Should be finished tomorrow and will try to remember to upload them then. |
Still getting MKL test failure: Tried to include the logging patch with |
Nope that didn't work. I guess yet another instance of easybuilders/easybuild-framework#3358 / easybuilders/easybuild-framework#2222 Best is to download the files and modify it directly. E.g. via |
Second attempt at including patch went better, thanks for the tip for easily fetching the relevant files. |
…foss-2022a-CUDA-11.7.0.eb and patches: TensorFlow-2.9.1_fix-PPC-Eigen-build.patch, TensorFlow-2.9.1_remove-duplicate-gpu-tests.patch, TensorFlow-2.9.1_remove-libclang-and-io-gcs-deps.patch, TensorFlow-2.9.1_support_flatbuffers_2.0.patch, TensorFlow-2.8.4_exclude-xnnpack-on-ppc.patch, TensorFlow-2.8.4_fix-PPC-JIT.patch, TensorFlow-2.8.4_resolve-gcc-symlinks.patch
dacfb36
to
af2b983
Compare
b0583d6
to
3317a0d
Compare
@VRehnberg I think I found the issue: They were inconsistent checking for MKL and some place was using a runtime check for CPU features while another place was only checking for defines. This was fixed in TF 2.11 by tensorflow/tensorflow@5ec3d2e I included the patch in the new commit, so that version should now build for you. Can you verify? |
A test report will hopefully come in a few hours with the results. |
@Flamefire: Tests failed in GitHub Actions, see https://github.com/easybuilders/easybuild-easyconfigs/actions/runs/4948100242
bleep, bloop, I'm just a bot (boegelbot v20200716.01) |
Test report by @VRehnberg |
That again? Are you using the latest easyblock? I.e. Easybuild 4.7.1 or easybuilders/easybuild-easyblocks#2854 |
It looks like some other ECs were merged already using flatbuffers 2.0.6 while TF 2.9 barely worked with 2.0.0. However I was able to backport the changes so it should work now. |
Test report by @Flamefire |
I'm still seeing the MKL failure on this one. Though, TensorFlow-2.11.0 I don't think had this problem. At least I've installed the CUDA version of 2.11.0 on our systems since some time ago. Log for latest failure: |
@VRehnberg Looks like this PR wasn't merged yet and your log file looks like you didn't build from this PR so the required patch for the MKL failure isn't included. TF 2.11 already includes the patch (upstream) but here my backport-patch is required @boegel Can this be merged? |
Test report by @SebastianAchilles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Going in, thanks @Flamefire! Failed with HTTP Error |
…asyconfigs into 20230112164524_new_pr_TensorFlow291
Going in, thanks @Flamefire! |
(created using
eb --new-pr
)Based on #16008 by @alinelena and #16620 by @VRehnberg
Note that there will be
This is due to Abseil being a possible
$TF_SYSTEM_LIBS
since 2.9 but it looks like it got broken between when that PR was opened and 2.9 released: tensorflow/tensorflow#53765 (comment)Might be worth resolving in the EasyBlock