Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance PyTorch easyblock to also capture tests failing with signal #2768

Merged

Conversation

Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Aug 4, 2022

(created using eb --new-pr)

This now captures test like:

	distributed/rpc/test_tensorpipe_agent failed!
	test_fx failed! Received signal: SIGSEGV

See https://github.com/pytorch/pytorch/blob/8be853025cb9fe7dd165924957b01984b46b9459/test/run_test.py#L915

@Flamefire Flamefire force-pushed the 20220804151713_new_pr_pytorch branch from 706d230 to 30b4e02 Compare August 4, 2022 13:40
@boegel boegel added this to the next release (4.6.1?) milestone Aug 6, 2022
@boegel boegel changed the title PyTorch: Also capture tests failing with signal enhance PyTorch easyblock to also capture tests failing with signal Aug 6, 2022
@boegel
Copy link
Member

boegel commented Aug 7, 2022

Test report by @boegel

Overview of tested easyconfigs (in order)

  • SUCCESS PyTorch-1.11.0-foss-2021a-CUDA-11.3.1.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
node3900.accelgor.os - Linux RHEL 8.4, x86_64, AMD EPYC 7413 24-Core Processor (zen3), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 510.73.08, Python 3.6.8
See https://gist.github.com/57645bd053424943389adbd0d8e168fc for a full test report.

@boegel
Copy link
Member

boegel commented Aug 7, 2022

List of failing tests is still reported correctly:

WARNING: 3 tests (out of 89226) failed:
* distributed/fsdp/test_fsdp_input
* distributed/test_c10d_gloo
* test_autograd

Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel boegel merged commit 39785da into easybuilders:develop Aug 7, 2022
@Flamefire Flamefire deleted the 20220804151713_new_pr_pytorch branch August 10, 2022 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants