Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate flaky test-util-callbackify #25068

Closed
Trott opened this issue Dec 15, 2018 · 8 comments
Closed

investigate flaky test-util-callbackify #25068

Trott opened this issue Dec 15, 2018 · 8 comments
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI.

Comments

@Trott
Copy link
Member

Trott commented Dec 15, 2018

Like a number of flaky tests right now, this one looks like it is caused by #25007.

https://ci.nodejs.org/job/node-test-commit-aix/19751/nodes=aix61-ppc64/console

test-osuosl-aix61-ppc64_be-1

10:18:13 not ok 1990 parallel/test-util-callbackify
10:18:13   ---
10:18:13   duration_ms: 3.731
10:18:13   severity: fail
10:18:13   exitcode: 1
10:18:13   stack: |-
10:18:13     assert.js:86
10:18:13       throw new AssertionError(obj);
10:18:13       ^
10:18:13     
10:18:13     AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
10:18:13     
10:18:13     null !== 1
10:18:13     
10:18:13         at common.mustCall (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-util-callbackify.js:205:14)
10:18:13         at /home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/common/index.js:335:15
10:18:13         at ChildProcess.exithandler (child_process.js:301:5)
10:18:13         at ChildProcess.emit (events.js:189:13)
10:18:13         at maybeClose (internal/child_process.js:978:16)
10:18:13         at Process.ChildProcess._handle.onexit (internal/child_process.js:265:5)
10:18:13   ...
@Trott Trott added the flaky-test Issues and PRs related to the tests with unstable failures on the CI. label Dec 15, 2018
@Trott
Copy link
Member Author

Trott commented Dec 30, 2018

https://ci.nodejs.org/job/node-test-commit-aix/20112/nodes=aix61-ppc64/console

test-osuosl-aix61-ppc64_be-2

00:54:27 not ok 2000 parallel/test-util-callbackify
00:54:27   ---
00:54:27   duration_ms: 7.620
00:54:27   severity: fail
00:54:27   exitcode: 1
00:54:27   stack: |-
00:54:27     assert.js:86
00:54:27       throw new AssertionError(obj);
00:54:27       ^
00:54:27     
00:54:27     AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
00:54:27     
00:54:27     null !== 1
00:54:27     
00:54:27         at common.mustCall (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-util-callbackify.js:205:14)
00:54:27         at /home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/common/index.js:377:15
00:54:27         at ChildProcess.exithandler (child_process.js:304:5)
00:54:27         at ChildProcess.emit (events.js:188:13)
00:54:27         at maybeClose (internal/child_process.js:978:16)
00:54:27         at Process.ChildProcess._handle.onexit (internal/child_process.js:265:5)
00:54:27   ...

@Trott
Copy link
Member Author

Trott commented Dec 30, 2018

Probably caused by #25007.

Trott added a commit to Trott/io.js that referenced this issue Dec 30, 2018
Trott added a commit to Trott/io.js that referenced this issue Jan 2, 2019
Refs: nodejs#25068

PR-URL: nodejs#25284
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
addaleax pushed a commit that referenced this issue Jan 5, 2019
Refs: #25068

PR-URL: #25284
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@gireeshpunathil
Copy link
Member

unable to reproduce in a 1000 runs. Looks like this has vanished?

@Trott
Copy link
Member Author

Trott commented Jan 9, 2019

unable to reproduce in a 1000 runs. Looks like this has vanished?

@gireeshpunathil Here it is failing on FreeBSD in CI yesterday:

https://ci.nodejs.org/job/node-test-commit-freebsd/nodes=freebsd11-x64/23246/console

test-digitalocean-freebsd11-x64-2

22:55:45 not ok 1996 parallel/test-util-callbackify
22:55:45   ---
22:55:45   duration_ms: 0.357
22:55:45   severity: fail
22:55:45   exitcode: 1
22:55:45   stack: |-
22:55:45     assert.js:86
22:55:45       throw new AssertionError(obj);
22:55:45       ^
22:55:45     
22:55:45     AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
22:55:45     
22:55:45     null !== 1
22:55:45     
22:55:45         at common.mustCall (/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/test/parallel/test-util-callbackify.js:205:14)
22:55:45         at /usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/test/common/index.js:376:15
22:55:45         at ChildProcess.exithandler (child_process.js:302:5)
22:55:45         at ChildProcess.emit (events.js:188:13)
22:55:45         at maybeClose (internal/child_process.js:978:16)
22:55:45         at Process.ChildProcess._handle.onexit (internal/child_process.js:265:5)
22:55:45   ...

To reproduce it, you might have more luck with something like tools/test.py -j 16 --repeat 100 test/parallel/test-util-callbackify.js. (But you also might not...)

@gireeshpunathil
Copy link
Member

thanks @Trott for the info; I will try my luck on freebsd today.

@gireeshpunathil
Copy link
Member

gireeshpunathil commented Jan 10, 2019

able to reproduce in __freebsd__

(gdb) thr 1
[Switching to thread 1 (process 101816)]#0  0x000000080323d277 in pthread_mutex_destroy ()
   from /lib/libthr.so.3
(gdb) where
#0  0x000000080323d277 in pthread_mutex_destroy () from /lib/libthr.so.3
#1  0x000000080323c7e6 in pthread_mutex_lock () from /lib/libthr.so.3
#2  0x000000000145b86d in v8::platform::tracing::TracingController::GetCategoryGroupEnabledInternal ()
#3  0x0000000000e60316 in v8::internal::CancelableLambdaTask<v8::internal::ArrayBufferCollector::FreeAllocations()::$_0>::RunInternal ()
#4  0x0000000000980fb6 in node::(anonymous namespace)::PlatformWorkerThread ()
#5  0x0000000803235c06 in pthread_create () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()
(gdb) thr 2
[Switching to thread 2 (process 100871)]
(gdb) where
#0  0x00000000008fdeb5 in std::__1::__tree< ...> ::destroy ()
#1  0x00000000008fdebd in std::__1::__tree< ...> ::destroy ()
#2  0x00000000008fdec9 in std::__1::__tree< ...> ::destroy ()
#3  0x00000000008fdebd in std::__1::__tree< ...> ::destroy ()
#4  0x00000000008fdec9 in std::__1::__tree< ...> ::destroy ()
#5  0x00000000008fdec9 in std::__1::__tree< ...> ::destroy ()
#6  0x00000000008fdec9 in std::__1::__tree< ...> ::destroy ()
#8  0x00000000008fc5f3 in node::native_module::NativeModuleLoader::~NativeModuleLoader ()
#9  0x0000000803591911 in __cxa_finalize () from /lib/libc.so.7
#10 0x000000080351f001 in exit () from /lib/libc.so.7
#11 0x000000000091e5a7 in node::FatalException ()
#12 0x0000000000f886d2 in v8::internal::MessageHandler::ReportMessageNoExceptions ()
#13 0x0000000000f88578 in v8::internal::MessageHandler::ReportMessage ()
#14 0x0000000000f57d1f in v8::internal::Isolate::ReportPendingMessagesImpl ()
#15 0x0000000000e41e70 in v8::internal::(anonymous namespace)::Invoke ()
#16 0x0000000000e41b38 in v8::internal::Execution::Call ()
#17 0x0000000000abea61 in v8::Function::Call ()
#18 0x00000000008f9e3d in node::LoadEnvironment ()
#19 0x00000000008feb13 in node::Start ()
#20 0x00000000008fdd8a in node::Start ()
#21 0x00000000008fbfdb in node::Start ()
#22 0x00000000008ae095 in _start ()
#23 0x00000008022f0000 in ?? ()
#24 0x0000000000000000 in ?? ()

this is essentially the same as #24921 , tracked through #25007 with 2 visible differences:

  • in thread 2 is in its exit route due to a fatalException in the JS land - which is expected as the child ('test/fixtures/uncaught-exceptions/callbackify1.js') is actually throwing an exception that leads to its termination.
  • while thread 2 is thus cleaning up things in one side, thread 1 is progressing with tracing. In Investigate flaky sequential/test-child-process-execsync on AIX #24921 the TracingController object itself was stae due to the Agent destruction, here it went ahead one or two steps but succumbed to crash when it attempted to access a mutex that was already destroyed (by thread 2)

I will produce and analyze a couple of more core files to see if there are other pattern(s).

@gireeshpunathil
Copy link
Member

got a bunch of those. So while the victim thread shows always the same stack at tracing, the offender thread (exiting thread) are seen at these different stages, with __cxa_finalize as a common vortex:

pattern 2:

#0  0x0000000000974ba4 in node::options_parser::OptionsParser<node::PerProcessOptions>::~OptionsParser ()
#1  0x0000000803591911 in __cxa_finalize () from /lib/libc.so.7
#2  0x000000080351f001 in exit () from /lib/libc.so.7
#3  0x000000000091e5a7 in node::FatalException ()
#4  0x0000000000f886d2 in v8::internal::MessageHandler::ReportMessageNoExceptions ()
#5  0x0000000000f88578 in v8::internal::MessageHandler::ReportMessage ()
#6  0x0000000000f57d1f in v8::internal::Isolate::ReportPendingMessagesImpl ()
#7  0x0000000000e41e70 in v8::internal::(anonymous namespace)::Invoke ()
#8  0x0000000000e41b38 in v8::internal::Execution::Call ()
#9  0x0000000000abea61 in v8::Function::Call ()
#10 0x00000000008f9e3d in node::LoadEnvironment ()
#11 0x00000000008feb13 in node::Start ()
#12 0x00000000008fdd8a in node::Start ()
#13 0x00000000008fbfdb in node::Start ()
#14 0x00000000008ae095 in _start ()
#15 0x00000008022f0000 in ?? ()
#16 0x0000000000000000 in ?? ()

pattern 3:

#0  0x0000000803517544 in free () from /lib/libc.so.7
#1  0x0000000000974cf2 in node::options_parser::OptionsParser<node::PerProcessOptions>::~OptionsParser ()
#2  0x0000000803591911 in __cxa_finalize () from /lib/libc.so.7
#3  0x000000080351f001 in exit () from /lib/libc.so.7
#4  0x000000000091e5a7 in node::FatalException ()
#5  0x0000000000f886d2 in v8::internal::MessageHandler::ReportMessageNoExceptions ()
#6  0x0000000000f88578 in v8::internal::MessageHandler::ReportMessage ()
#7  0x0000000000f57d1f in v8::internal::Isolate::ReportPendingMessagesImpl ()
#8  0x0000000000e41e70 in v8::internal::(anonymous namespace)::Invoke ()
#9  0x0000000000e41b38 in v8::internal::Execution::Call ()
#10 0x0000000000abea61 in v8::Function::Call ()
#11 0x00000000008f9e3d in node::LoadEnvironment ()
#12 0x00000000008feb13 in node::Start ()
#13 0x00000000008fdd8a in node::Start ()
#14 0x00000000008fbfdb in node::Start ()
#15 0x00000000008ae095 in _start ()
#16 0x00000008022f0000 in ?? ()
#17 0x0000000000000000 in ?? ()

pattern 4:

#0  0x000000080350c197 in _pthread_mutex_init_calloc_cb () from /lib/libc.so.7
#1  0x00000008034f1cd7 in _malloc_thread_cleanup () from /lib/libc.so.7
#2  0x0000000803517224 in realloc () from /lib/libc.so.7
#3  0x00000008035175b1 in free () from /lib/libc.so.7
#4  0x0000000000974cd1 in node::options_parser::OptionsParser<node::PerProcessOptions>::~OptionsParser ()
#5  0x0000000803591911 in __cxa_finalize () from /lib/libc.so.7
#6  0x000000080351f001 in exit () from /lib/libc.so.7
#7  0x000000000091e5a7 in node::FatalException ()
#8  0x0000000000f886d2 in v8::internal::MessageHandler::ReportMessageNoExceptions ()
#9  0x0000000000f88578 in v8::internal::MessageHandler::ReportMessage ()
#10 0x0000000000f57d1f in v8::internal::Isolate::ReportPendingMessagesImpl ()
#11 0x0000000000e41e70 in v8::internal::(anonymous namespace)::Invoke ()
#12 0x0000000000e41b38 in v8::internal::Execution::Call ()
#13 0x0000000000abea61 in v8::Function::Call ()
#14 0x00000000008f9e3d in node::LoadEnvironment ()
#15 0x00000000008feb13 in node::Start ()
#16 0x00000000008fdd8a in node::Start ()
#17 0x00000000008fbfdb in node::Start ()
#18 0x00000000008ae095 in _start ()
#19 0x00000008022f0000 in ?? ()
#20 0x0000000000000000 in ?? ()

Just like #24921 , this should be fixed by #25061 .

refack pushed a commit to refack/node that referenced this issue Jan 14, 2019
Refs: nodejs#25068

PR-URL: nodejs#25284
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
BridgeAR pushed a commit to BridgeAR/node that referenced this issue Jan 16, 2019
Refs: nodejs#25068

PR-URL: nodejs#25284
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@gireeshpunathil
Copy link
Member

the test is active for the last one week with no failures, so as expected #25061 has fixed the underlying issue. Closing.

BethGriggs pushed a commit that referenced this issue Apr 28, 2019
Refs: #25068

PR-URL: #25284
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
BethGriggs pushed a commit that referenced this issue May 10, 2019
Refs: #25068

PR-URL: #25284
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
MylesBorins pushed a commit that referenced this issue May 16, 2019
Refs: #25068

PR-URL: #25284
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI.
Projects
None yet
Development

No branches or pull requests

2 participants