Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] LongGCDisruptionTests testNotBlockingUnsafeStackTraces failed #35686

Closed
droberts195 opened this issue Nov 19, 2018 · 10 comments · Fixed by #39261
Closed

[CI] LongGCDisruptionTests testNotBlockingUnsafeStackTraces failed #35686

droberts195 opened this issue Nov 19, 2018 · 10 comments · Fixed by #39261
Assignees
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >test-failure Triaged test failures from CI

Comments

@droberts195
Copy link
Contributor

org.elasticsearch.test.disruption.LongGCDisruptionTests testNotBlockingUnsafeStackTraces failed in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+intake/239/console

The error is:

06:45:56 ERROR   32.0s J3 | LongGCDisruptionTests.testNotBlockingUnsafeStackTraces <<< FAILURES!
06:45:56    > Throwable #1: java.lang.RuntimeException: suspending node threads took too long
06:45:56    > 	at __randomizedtesting.SeedInfo.seed([398753F563ED0502:3B941B4A47C3BD6B]:0)
06:45:56    > 	at org.elasticsearch.test.disruption.LongGCDisruption.startDisrupting(LongGCDisruption.java:118)
06:45:56    > 	at org.elasticsearch.test.disruption.LongGCDisruptionTests.testNotBlockingUnsafeStackTraces(LongGCDisruptionTests.java:149)
06:45:56    > 	at java.lang.Thread.run(Thread.java:748)

This doesn't reproduce locally for me:

./gradlew :test:framework:test \
  -Dtests.seed=398753F563ED0502 \
  -Dtests.class=org.elasticsearch.test.disruption.LongGCDisruptionTests \
  -Dtests.method="testNotBlockingUnsafeStackTraces" \
  -Dtests.security.manager=true \
  -Dtests.locale=en-ZA \
  -Dtests.timezone=America/Fort_Nelson \
  -Dcompiler.java=11 \
  -Druntime.java=8
@droberts195 droberts195 added >test-failure Triaged test failures from CI :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. labels Nov 19, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@droberts195 droberts195 changed the title LongGCDisruptionTests testNotBlockingUnsafeStackTraces failed [CI] LongGCDisruptionTests testNotBlockingUnsafeStackTraces failed Nov 19, 2018
@ywelsch
Copy link
Contributor

ywelsch commented Nov 19, 2018

@original-brownbear can you have a look at this? Thank you.

@original-brownbear original-brownbear self-assigned this Nov 19, 2018
@original-brownbear
Copy link
Member

On it :)

@droberts195
Copy link
Contributor Author

The same test also failed last Saturday on the 6.x branch: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+matrix-java-periodic/ES_BUILD_JAVA=java11,ES_RUNTIME_JAVA=java8,nodes=virtual&&linux/65/console

It's the same exception:

22:15:09 ERROR   32.0s J1 | LongGCDisruptionTests.testNotBlockingUnsafeStackTraces <<< FAILURES!
22:15:09    > Throwable #1: java.lang.RuntimeException: suspending node threads took too long
22:15:09    > 	at __randomizedtesting.SeedInfo.seed([233B903DA45135A3:2128D882807F8DCA]:0)
22:15:09    > 	at org.elasticsearch.test.disruption.LongGCDisruption.startDisrupting(LongGCDisruption.java:118)
22:15:09    > 	at org.elasticsearch.test.disruption.LongGCDisruptionTests.testNotBlockingUnsafeStackTraces(LongGCDisruptionTests.java:149)
22:15:09    > 	at java.lang.Thread.run(Thread.java:748)

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Nov 19, 2018
* The existing logging is not helpful enough to track down which threads hang, we need the hanging thread's stacktraces too
* Relates elastic#35686
@original-brownbear
Copy link
Member

It does not seem entirely impossible that we are just getting unlucky here and the suspender thread simply doesn't get scheduled often enough when running multiple parallel JVMs. I added some logging in #35702 that should make it completely clear where the problem lies here.

original-brownbear added a commit that referenced this issue Nov 20, 2018
* The existing logging is not helpful enough to track down which threads hang, we need the hanging thread's stacktraces too
* Relates #35686
@original-brownbear
Copy link
Member

This is a really strange failure. It happened once in master and once in 6.x in a 24h period on the 19th and 20th of Nov. Before that it happened in 2017 more than a year earlier.
=> I would vote closing this. We now have better logging that should allow for tracking the issue down if needed, but I don't see what action I can take here unless it fails again.

@ywelsch
Copy link
Contributor

ywelsch commented Nov 28, 2018

We now have better logging that should allow for tracking the issue down if needed, but I don't see what action I can take here unless it fails again.

I agree

@ywelsch ywelsch closed this as completed Nov 28, 2018
@benwtrent
Copy link
Member

Failed again in build:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+multijob-unix-compatibility/os=centos-7&&immutable/30/consoleFull

02:37:31 ERROR   31.9s J4 | LongGCDisruptionTests.testNotBlockingUnsafeStackTraces <<< FAILURES!
02:37:31    > Throwable #1: java.lang.RuntimeException: suspending node threads took too long
02:37:31    > 	at __randomizedtesting.SeedInfo.seed([6AB1F3531F128FA:4B8578A15DF9093]:0)
02:37:31    > 	at org.elasticsearch.test.disruption.LongGCDisruption.startDisrupting(LongGCDisruption.java:124)
02:37:31    > 	at org.elasticsearch.test.disruption.LongGCDisruptionTests.testNotBlockingUnsafeStackTraces(LongGCDisruptionTests.java:149)
02:37:31    > 	at java.lang.Thread.run(Thread.java:748)

@markharwood
Copy link
Contributor

Build stats tracking

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Feb 21, 2019
* The lambda invoked by the `lockedExecutor` eventually gets JITed (which runs a static initializer that we will suspend in with a very tiny chance).
   * Fixed by creating the `Runnable` in the main test thread and using the same instance in all threads
* Closes elastic#35686
@original-brownbear
Copy link
Member

I think I was able to track this down and fix it in #39261:)

original-brownbear added a commit that referenced this issue Feb 22, 2019
* The lambda invoked by the `lockedExecutor` eventually gets JITed (which runs a static initializer that we will suspend in with a very tiny chance).
   * Fixed by creating the `Runnable` in the main test thread and using the same instance in all threads
* Closes #35686
weizijun pushed a commit to weizijun/elasticsearch that referenced this issue Feb 22, 2019
* The lambda invoked by the `lockedExecutor` eventually gets JITed (which runs a static initializer that we will suspend in with a very tiny chance).
   * Fixed by creating the `Runnable` in the main test thread and using the same instance in all threads
* Closes elastic#35686
weizijun pushed a commit to weizijun/elasticsearch that referenced this issue Feb 22, 2019
* The lambda invoked by the `lockedExecutor` eventually gets JITed (which runs a static initializer that we will suspend in with a very tiny chance).
   * Fixed by creating the `Runnable` in the main test thread and using the same instance in all threads
* Closes elastic#35686
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Feb 25, 2019
* The lambda invoked by the `lockedExecutor` eventually gets JITed (which runs a static initializer that we will suspend in with a very tiny chance).
   * Fixed by creating the `Runnable` in the main test thread and using the same instance in all threads
* Closes elastic#35686
original-brownbear added a commit that referenced this issue Feb 25, 2019
* The lambda invoked by the `lockedExecutor` eventually gets JITed (which runs a static initializer that we will suspend in with a very tiny chance).
   * Fixed by creating the `Runnable` in the main test thread and using the same instance in all threads
* Closes #35686
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants