Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaky] Possible flaky tests from Paddle #9976

Closed
masahi opened this issue Jan 19, 2022 · 11 comments
Closed

[Flaky] Possible flaky tests from Paddle #9976

masahi opened this issue Jan 19, 2022 · 11 comments
Labels
needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it test: flaky

Comments

@masahi
Copy link
Member

masahi commented Jan 19, 2022

https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-9954/7/pipeline/ tests/python/frontend/paddlepaddle/test_forward.py::test_forward_math_api tests/scripts/setup-pytest-env.sh: line 35: 2565288 Segmentation fault (Saw it multiple times)

https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-9954/9/pipeline/ FAILED tests/python/frontend/paddlepaddle/test_forward.py::test_forward_ones_like (Just got this the first time)

@jiangjiajun

@jiangjiajun
Copy link
Contributor

This is really flaky..., Is this reproducible?

@masahi
Copy link
Member Author

masahi commented Jan 19, 2022

I don't know, have you tried running these tests many times on your end? It could be a CI env issue.

@heliqi
Copy link
Contributor

heliqi commented Jan 20, 2022

After repeated attempts, I reproduce one of the problems. I'm solving it...

@heliqi
Copy link
Contributor

heliqi commented Jan 20, 2022

Paddle test_forwar use 'tvm.testing.enabled_targets()' and DEFAULT_TEST_TARGETS contain the 'nvptx' target.

when target == 'nvptx', the 'acos' of relay op run error. There are many op's that are no longer supported by 'nvptx' targe?
I see that some of the 'nvptx' target tests have also been removed from the PR.

@masahi

@masahi
Copy link
Member Author

masahi commented Jan 20, 2022

Yeah, nvptx causes some issues (not supporting RTX 3000 series in recent LLVM etc). I think it's totally fine to customize what targets to test on per frontend, and per individual test.

But this is not the cause of segfault and other flaky-test in question, right?

@heliqi
Copy link
Contributor

heliqi commented Jan 20, 2022

After I filtered the 'nvptx' target, I repeated the code (tests/python/frontend/paddlepaddle/test_forward.py::test_forward_math_api) several times without a Segfault.
image

The previous error was also not Segfault and I didn't reproduce the CI mistake... 😅

@electriclilies
Copy link
Contributor

I noticed that some std::moves in uses of the WithFields COW constuctors were causing potential use-after-frees. I haven't been following this thread that closely, but I figured that this might be the cause of some of the flaky segfaults we've been seeing recently. In #10009 I remove these std::moves-- might be worth seeing if it improves anything.

@areusch
Copy link
Contributor

areusch commented Jun 1, 2022

this is continuing to happen on main: https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/3415/pipeline

(the error is hard to find, but if you click Show complete log, you see:

[2022-05-31T20:31:52.944Z] tests/python/frontend/paddlepaddle/test_forward.py::test_forward_math_api tests/scripts/setup-pytest-env.sh: line 49: 44482 Segmentation fault      (core dumped) TVM_FFI=${ffi_type} python3 -m pytest -o "junit_suite_name=${suite_name}" "--junit-xml=${TVM_PYTEST_RESULT_DIR}/${suite_name}.xml" "--junit-prefix=${ffi_type}" "$@"
[2022-05-31T20:31:52.944Z] + exit_code=139

@jiangjiajun any ideas here?

@jiangjiajun
Copy link
Contributor

jiangjiajun commented Jun 2, 2022

The test_forward_math_api includes lots of operators's unit test, it will loop for many times, maybe it's one of the reason, let's split this big unit test first, this will also give us a clear error message.

We will send a pull request to split test_forward_math_api try to fix this problem. @heliqi

@heliqi
Copy link
Contributor

heliqi commented Jun 2, 2022

@areusch @jiangjiajun
#11537

@areusch areusch added the needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it label Oct 19, 2022
@driazati
Copy link
Member

closing since we haven't seen this failure lately in CI as far as I know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it test: flaky
Projects
None yet
Development

No branches or pull requests

6 participants