-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{ai}[foss/2022b] PyTorch v2.1.0 #19087
{ai}[foss/2022b] PyTorch v2.1.0 #19087
Conversation
Test report by @branfosj |
This comment was marked as outdated.
This comment was marked as outdated.
@boegelbot please test @ jsc-zen2 |
@branfosj: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster PR test command '
Test results coming soon (I hope)... - notification for comment with ID 1781528056 processed Message to humans: this is just bookkeeping information for me, |
@boegelbot please test @ generoso |
@branfosj: Request for testing this PR well received on login1 PR test command '
Test results coming soon (I hope)... - notification for comment with ID 1781531861 processed Message to humans: this is just bookkeeping information for me, |
Test report by @branfosj |
Test report by @boegelbot |
Test report by @boegelbot |
Test report by @Flamefire |
Test report by @Flamefire |
I see these failing (AMD EPYC 7313 16-Core Processor):
|
@Flamefire: Tests failed in GitHub Actions, see https://github.com/easybuilders/easybuild-easyconfigs/actions/runs/6773491023
bleep, bloop, I'm just a bot (boegelbot v20200716.01) |
@Flamefire That new PyTorch-2.0.1_disable-gcc12-warning.patch fails to apply on 2.1.0 |
Test report by @VRehnberg |
Test report by @VRehnberg |
…1.7.0_disable-dev-shm-test.patch, PyTorch-1.11.1_skip-test_init_from_local_shards.patch, PyTorch-1.12.1_add-hypothesis-suppression.patch, PyTorch-1.12.1_fix-test_cpp_extensions_jit.patch, PyTorch-1.12.1_fix-TestTorch.test_to.patch, PyTorch-1.12.1_skip-test_round_robin.patch, PyTorch-1.13.1_fix-gcc-12-warning-in-fbgemm.patch, PyTorch-1.13.1_fix-protobuf-dependency.patch, PyTorch-1.13.1_fix-warning-in-test-cpp-api.patch, PyTorch-1.13.1_skip-failing-singular-grad-test.patch, PyTorch-1.13.1_skip-tests-without-fbgemm.patch, PyTorch-2.0.1_avoid-test_quantization-failures.patch, PyTorch-2.0.1_fix-skip-decorators.patch, PyTorch-2.0.1_fix-ub-in-inductor-codegen.patch, PyTorch-2.0.1_fix-vsx-loadu.patch, PyTorch-2.0.1_no-cuda-stubs-rpath.patch, PyTorch-2.0.1_skip-failing-gradtest.patch, PyTorch-2.0.1_skip-test_shuffle_reproducibility.patch, PyTorch-2.0.1_skip-tests-skipped-in-subprocess.patch, PyTorch-2.1.0_fix-vsx-vector-shift-functions.patch, PyTorch-2.1.0_remove-test-requiring-online-access.patch, PyTorch-2.1.0_skip-diff-test-on-ppc.patch
ba37611
to
af59878
Compare
Test report by @casparvl |
Test failures:
All of those also failed for #19087 (comment) |
More detail on the failures: dynamo/test_dynamic_shapes:
inductor/test_mkldnn_pattern_matcher:
test_proxy_tensor:
distributed/elastic/multiprocessing/api_test
test_sparse_csr:
|
Test report by @Flamefire |
@Flamefire: Tests failed in GitHub Actions, see https://github.com/easybuilders/easybuild-easyconfigs/actions/runs/7141071902
bleep, bloop, I'm just a bot (boegelbot v20200716.01) |
Test report by @Flamefire |
Test report by @Flamefire |
@boegelbot please test @ generoso |
@branfosj: Request for testing this PR well received on login1 PR test command '
Test results coming soon (I hope)... - notification for comment with ID 1859087036 processed Message to humans: this is just bookkeeping information for me, |
@boegelbot please test @ jsc-zen2 |
@branfosj: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster PR test command '
Test results coming soon (I hope)... - notification for comment with ID 1859088543 processed Message to humans: this is just bookkeeping information for me, |
Test report by @branfosj |
Test report by @boegelbot |
Test report by @boegelbot |
Test report by @akesandgren |
Test report by @VRehnberg Missing OS deps |
Test report by @VRehnberg |
That looks odd. I remember having seen the c10d failures too but not recently. More confusingly 1 is listed twice. Are you using the latest develop easyblock? Also I don't know what has failed in test_functional_api as I haven't seen that before. Could you upload the log please? |
@Flamefire Ah, no, this picked up the default easyblocks for eb 4.8.2 instead. Here are logs for a successful build and a failed build respectively if you're still interested |
FYI, running this with CUDA enabled (and the necessary packages) results in the following failed tests,
So not that bad... (Didn't have the "Skip flaky test in test_nn" fix in my version when doing this so that one might be fixed already) |
I'm wondering if that is fixed in 2.1.2. Does 2.1.0 fail for you too in test_nn? Can you try #19445 and check the log for that test too? BTW: I have the CUDA versions prepared locally and am running them too (still waiting for the results though) but we need the CPU versions ready first in order to reduce the failures and address them individually. That's why I haven't uploaded them yet. |
I've just restarted q test report for this PR. As you can see above my previous build (prior to the test_nn fix) didn't see any problems. I just wanted to see how good/bad the CUDA version would be based on this PR, and it looks fairly good. |
The test_c10d tests seem to always fail but not due to any real issue but because it is seemingly run in a SLURM env:
So don't try to run it in a Slurm job env but "escape" it by either
Both works for me. test_functional_api might be the same issue: If it still fails outside the Slurm env I can add a patch to skip this test.
Please also check the build log once done if that test failed even though the test report is ok. We allow some tests to fail so it might report SUCCESS even though that specific test failed. |
Test report by @akesandgren |
Test report by @casparvl |
Closing this after 2.1.2 in #19445 has been merged |
(created using
eb --new-pr
)Requires (with rebuild)
And