Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testSCCMLTests1_openj9_1 failed with openjdk11 Aarch64 #10741

Open
LongyuZhang opened this issue Sep 29, 2020 · 15 comments
Open

testSCCMLTests1_openj9_1 failed with openjdk11 Aarch64 #10741

LongyuZhang opened this issue Sep 29, 2020 · 15 comments

Comments

@LongyuZhang
Copy link
Contributor

Failure link:

https://ci.adoptopenjdk.net/job/Test_openjdk11_j9_sanity.functional_aarch64_linux/1

testSCCMLTests1_openj9_1 Test 26 failed with timeout issue for openjdk11 Aarch64.

Failure output (captured from console output)

21:24:09  Testing: Test 26: CMVC 168131 : Create a non persistent cache
21:24:09  Test start time: 2020/09/29 01:24:08 Coordinated Universal Time
21:24:09  Running command: "/home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_aarch64_linux/openjdkbinary/j2sdk-image/bin/java"  -Xcompressedrefs -Xjit -Xgcpolicy:gencon  -Xshareclasses:name=ShareClassesCMLTests,nonpersistent -version
21:24:09  Time spent starting: 1 milliseconds
21:34:08  ***[TEST INFO 2020/09/29 01:34:08] ProcessKiller detected a timeout after 600000 milliseconds!***
21:34:08  INFO: getUnixPID() has failed indicating this is not a UNIX System.'Debug on timeout' is currently only supported on Linux.
21:34:08  
21:34:08  
Cancelling nested steps due to timeout
06:34:24  Sending interrupt signal to process
06:34:26  Time spent executing: 33016381 milliseconds
06:34:26  Test result: FAILED
@hangshao0
Copy link
Contributor

Is this failure intermittent or consistent ? Does it fail on Java 8 ?

@hangshao0
Copy link
Contributor

FYI @knn-k

@LongyuZhang
Copy link
Contributor Author

LongyuZhang commented Sep 29, 2020

Is this failure intermittent or consistent ? Does it fail on Java 8 ?

The jdk11 Aarch64 pipeline was just enabled and only has a build so far. I tested with personal build and it passed, so I think it is intermittent.
JDK 8 nightly has not been enabled due to machine resources, it also passed personal build.

@llxia
Copy link
Contributor

llxia commented Sep 29, 2020

@LongyuZhang Could you try using the same SDK as the test build (OpenJDK Runtime Environment Openj9 (build 11.0.9+8-202009282343))? The Ginder runs that you have is an older version from Adopt API (OpenJDK Runtime Environment Openj9 (build 11.0.9+8-202009252344))

@LongyuZhang
Copy link
Contributor Author

@LongyuZhang Could you try using the same SDK as the test build (OpenJDK Runtime Environment Openj9 (build 11.0.9+8-202009282343))? The Ginder runs that you have is an older version from Adopt API (OpenJDK Runtime Environment Openj9 (build 11.0.9+8-202009252344))

@llxia Thanks for the reminder, I have updated the SDK to the same version as the nightly build and tested on multiple machines.
Only the same machine as nightly build (test-aws-ubuntu1804-armv8-1) also failed at Test 26 (I manually cancelled after hanging for 1 hour) , other machines (test-packet-ubuntu1604-armv8-2, test-aws-rhel76-armv8-2, test-aws-rhel76-armv8-4, test-packet-ubuntu1604-armv8-1) all passed the test, with links https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/4002 - 4005 . So it should be a machine issue.

@pshipton
Copy link
Member

FYI https://openj9.slack.com/archives/C8312LCV9/p1602551615048900

there’s a pthread_cond_signal bug affecting glibc 2.27. Worth being aware of this if unexplained deadlocks are occurring: https://sourceware.org/bugzilla/show_bug.cgi?id=25847

@andrew-m-leonard
Copy link
Contributor

Re-occurred on GA jdk-11.0.9+11_openj9-0.23.0 : https://ci.adoptopenjdk.net/job/Test_openjdk11_j9_sanity.functional_aarch64_linux/19/consoleFull

@LongyuZhang
Copy link
Contributor Author

Hi @andrew-m-leonard, I discussed with @llxia about the testSCCMLTests1_openj9_1 failure you mentioned above in https://ci.adoptopenjdk.net/job/Test_openjdk11_j9_sanity.functional_aarch64_linux/19/consoleFull, it fails with test 57-63, not the same test 26 failure in this issue. Test 57-63 was newly enabled by Hang Shao’s PR, and has already passed all night build #17 and #18.

Build #19, #20 and #22 you mentioned failed because they are not triggered by nightly, which causes the known issue of Functional testing uses wrong test material in release testing, for which Lan has a WIP PR, that has not been merged yet.

@andrew-m-leonard
Copy link
Contributor

This happened twice last night, aarch64 and Windows:
adoptium/infrastructure#1579 (comment)
adoptium/infrastructure#1579 (comment)

Can we get more debug added to getUnixPid() please? as how can that fail?

@andrew-m-leonard
Copy link
Contributor

See: #11177

@pshipton
Copy link
Member

Can we get more debug added to getUnixPid() please? as how can that fail?

FYI As I recall getUnixPid() is a hack that reaches into the implementation using reflect to find the pid, as there was no API to get it in Java 8. I believe Java 11 does provide an API.

@knn-k
Copy link
Contributor

knn-k commented Nov 15, 2020

Has the test server (test-aws-ubuntu1804-armv8-1) been rebooted recently?
If not, I want it to be rebooted to see whether the failure disappears or not.

@andrew-m-leonard
Copy link
Contributor

andrew-m-leonard commented Nov 16, 2020

@pshipton The logic here https://github.com/eclipse/openj9/blob/efdb86514d722cf83747d9d8badc449fe6121658/test/functional/cmdline_options_tester/src/Test.java#L415
will only work on jdk8 as UNIXProcess.java does not exist in jdk11+. For jdk11+ it should use the jdk11 API to get the pid.

The failure on jdk8 Windows is likely to be because the ProcessKiller logic failed to kill the process, Windows processes can be stubborn at being killed, we are seeing many orphaned testcase Processes in Windows. We are looking at adding some post-testcase Process cleanup to avoid this.
It would be beneificial though in this situation if the proc.waitFor() did not wait forever for it to finish, as it won't! So maybe some arbitrary 30 minutes timeout or something?
https://github.com/eclipse/openj9/blob/efdb86514d722cf83747d9d8badc449fe6121658/test/functional/cmdline_options_tester/src/Test.java#L227
As it stands it is causing the whole build pipeline to hang all night.....!

@pshipton
Copy link
Member

Created #11196 for the getUnixPid() issue.

@pshipton
Copy link
Member

Created #11197 for the cmdlinetests waiting forever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants