Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate flaky buffer-free-callback addon test on AIX in CI #31061

Closed
Trott opened this issue Dec 22, 2019 · 21 comments
Closed

investigate flaky buffer-free-callback addon test on AIX in CI #31061

Trott opened this issue Dec 22, 2019 · 21 comments
Labels
buffer Issues and PRs related to the buffer subsystem. flaky-test Issues and PRs related to the tests with unstable failures on the CI.

Comments

@Trott
Copy link
Member

Trott commented Dec 22, 2019

https://ci.nodejs.org/job/node-test-commit-aix/27470/nodes=aix61-ppc64/console

test-osuosl-aix61-ppc64_be-2

00:10:52 not ok 2527 addons/buffer-free-callback/test
00:10:52   ---
00:10:52   duration_ms: 0.169
00:10:52   severity: crashed
00:10:52   exitcode: -5
00:10:52   stack: |-
00:10:52     
00:10:52     
00:10:52     #
00:10:52     # Fatal error in , line 0
00:10:52     # Check failed: result.second.
00:10:52     #
00:10:52     #
00:10:52     #
00:10:52     #FailureMessage Object: fffffffffffd0a0
00:10:52   ...

This was on a node-daily-master run, not a CI for a PR or anything like that.

@nodejs/platform-aix

@gireeshpunathil
Copy link
Member

00:10:52 # Fatal error in , line 0

this is weird, even the C++ debug variables (filename and line number) seem to have been corrupted!

@gireeshpunathil
Copy link
Member

#find . | xargs grep "CHECK" 2> /dev/null | grep "result" | grep "second"
#
#find . | xargs grep "CHECK" 2> /dev/null | grep "second"
./node_binding.cc:    CHECK_GE(it->second.refcount, 1);
./env-inl.h:  CHECK_EQ(insertion_info.second, true);
./api/environment.cc:    CHECK_EQ(it->second, size);
./node_platform.cc:  CHECK(insertion.second);
./node_platform.cc:  CHECK(insertion.second);
./node_platform.cc:  CHECK(it->second.second);
./node_platform.cc:  CHECK(data.second);
./node_options-inl.h:  CHECK_EQ(it->second.type, kBoolean);
./node_options-inl.h:  CHECK_EQ(it->second.type, kBoolean);
#

Is the result a variable name, or a typename?

@richardlau
Copy link
Member

#find . | xargs grep "CHECK" 2> /dev/null | grep "result" | grep "second"
#
#find . | xargs grep "CHECK" 2> /dev/null | grep "second"
./node_binding.cc:    CHECK_GE(it->second.refcount, 1);
./env-inl.h:  CHECK_EQ(insertion_info.second, true);
./api/environment.cc:    CHECK_EQ(it->second, size);
./node_platform.cc:  CHECK(insertion.second);
./node_platform.cc:  CHECK(insertion.second);
./node_platform.cc:  CHECK(it->second.second);
./node_platform.cc:  CHECK(data.second);
./node_options-inl.h:  CHECK_EQ(it->second.type, kBoolean);
./node_options-inl.h:  CHECK_EQ(it->second.type, kBoolean);
#

Is the result a variable name, or a typename?

@gireeshpunathil Strange the find/grep didn’t find

CHECK(result.second);
(not sure if others — I’m on a tablet and found that one through the GitHub web UI).

@richardlau
Copy link
Member

Ah looking again you ran find in src.

@richardlau
Copy link
Member

Also seen on a test build for the new AIX machines in the CI (nodejs/build#2110 (comment)): https://ci.nodejs.org/job/node-test-commit-aix/27426/nodes=aix71-ppc64/testReport/junit/(root)/test/addons_buffer_free_callback_test/
test-ibm-aix71-ppc64_be-1

@gireeshpunathil
Copy link
Member

thanks @richardlau !

so this would mean that we try to insert duplicate entries (of backing_stores) into the map. Don't know what that means.

There is a trace point in the neighborhood that might give more clues and that gets enabled with --trace_backing_store (FLAG_trace_backing_store) but the CLI parser does not accept such an option!

cc @nodejs/v8

@richardlau
Copy link
Member

Cc @addaleax and @thangktran as they’ve been doing some work recently to adapt Node.js to the changes V8 have made to backing store management.

@thangktran
Copy link
Contributor

@richardlau i submitted #31053 recently that might fix this problem.

@thangktran
Copy link
Contributor

@gireeshpunathil it's due to a behaviour change in V8 ArrayBuffer. Previously, when a static buffer (or one that outlives the ArrayBuffer) is used, the BackingStore will not be registered. Now that it is registered, problem might occur when new ArrayBuffer is allocated at the same place of previous ArrayBuffer that is still being tracked in BackingStore table.

@richardlau
Copy link
Member

Stress test on master: https://ci.nodejs.org/view/Stress/job/node-stress-single-test/29/nodes=aix61-ppc64/
80 failures in 1000 runs.

@richardlau
Copy link
Member

@richardlau
Copy link
Member

Stress run with #31053: https://ci.nodejs.org/view/Stress/job/node-stress-single-test/nodes=aix61-ppc64/31/

85 failures in 1000 runs so it doesn’t look like #31053 fixes this 😞.
cc @thangktran

@richardlau
Copy link
Member

Attempting to bisect.

1000 runs against 2dff8dd (commit before 4f523c2) had no failures: https://ci.nodejs.org/job/node-stress-single-test/32/nodes=aix61-ppc64/

Started 1000 runs against 4f523c2: https://ci.nodejs.org/job/node-stress-single-test/33/nodes=aix61-ppc64/

@richardlau
Copy link
Member

Attempting to bisect.

1000 runs against 2dff8dd (commit before 4f523c2) had no failures: https://ci.nodejs.org/job/node-stress-single-test/32/nodes=aix61-ppc64/

Started 1000 runs against 4f523c2: https://ci.nodejs.org/job/node-stress-single-test/33/nodes=aix61-ppc64/

Still running but 15 failures out of 100 runs so it looks like 4f523c2 is the cause and is not fixed by any of the follow up fixes (#30946, #31053).

@richardlau
Copy link
Member

There is a trace point in the neighborhood that might give more clues and that gets enabled with --trace_backing_store (FLAG_trace_backing_store) but the CLI parser does not accept such an option!

I think the flag is only available in debug builds:

DEFINE_DEBUG_BOOL(trace_backing_store, false, "trace backing store events")

@addaleax addaleax added the buffer Issues and PRs related to the buffer subsystem. label Dec 24, 2019
@addaleax
Copy link
Member

This can easily be made reproducible by removing the global.gc() calls in the test.

The reason that this fails is that the test allocates multiple Buffers with the same base address, which, as @thangktran explained, is no longer allowed.

For fixing this, it should be enough to either move away from a static buffer for the underlying memory, or to modify the offset argument in the JS test file so that the Buffers never have the same base address.

@thangktran Are you interested in opening a PR for this?

@thangktran
Copy link
Contributor

This can easily be made reproducible by removing the global.gc() calls in the test.

The reason that this fails is that the test allocates multiple Buffers with the same base address, which, as @thangktran explained, is no longer allowed.

For fixing this, it should be enough to either move away from a static buffer for the underlying memory, or to modify the offset argument in the JS test file so that the Buffers never have the same base address.

@thangktran Are you interested in opening a PR for this?

@addaleax sure i'd love to. I'm on it.

@gireeshpunathil
Copy link
Member

fwiw, consistent reproduce - on any platform:

$ cat test/addons/buffer-free-callback/foo.js

const common = require('../../common')
const binding = require(`./build/${common.buildType}/binding`)
let buf1 = binding.alloc(64, 1, 0)
let buf2 = binding.alloc(64, 1, 0)
#tools/test.py addons/buffer-free-callback/foo
=== release foo ===                   
Path: addons/buffer-free-callback/foo
#
# Fatal error in , line 0
# Check failed: result.second.
#
#
#
#FailureMessage Object: 0x7ffeefbfe5c0

the only missing puzzle now is, why the C++ assertion check is failing to get the file name and line number. @addaleax - do you know?

@addaleax
Copy link
Member

the only missing puzzle now is, why the C++ assertion check is failing to get the file name and line number. @addaleax - do you know?

V8 is intentionally omitting these from non-debug builds, see

#ifdef DEBUG
void V8_Fatal(const char* file, int line, const char* format, ...) {
#else
void V8_Fatal(const char* format, ...) {
const char* file = "";
int line = 0;
#endif

That being said, here’s a PR that would enable stack trace printing for V8 check failures, which should already help a ton: #31079

@Trott Trott added the flaky-test Issues and PRs related to the tests with unstable failures on the CI. label Dec 25, 2019
@richardlau
Copy link
Member

@gireeshpunathil it's due to a behaviour change in V8 ArrayBuffer. Previously, when a static buffer (or one that outlives the ArrayBuffer) is used, the BackingStore will not be registered. Now that it is registered, problem might occur when new ArrayBuffer is allocated at the same place of previous ArrayBuffer that is still being tracked in BackingStore table.

Are we calling this out anywhere as a notable change? If the behaviour change is breaking for one of our addon tests it’s potentially breaking for addons out in the wild.

@addaleax
Copy link
Member

addaleax commented Jan 4, 2020

@richardlau It’s definitely breaking addons in the wild, not just potentially. 😕

I can try to make sure we include it in the 14.0.0 changelog.

thangktran added a commit to thangktran/node that referenced this issue Jan 4, 2020
to avoid problem with the new behaviour of new V8 BackingStore API. By
changing the offset, the base address of each test case will be
different.

Fixes: nodejs#31061
@Trott Trott closed this as completed in 783f8c6 Jan 6, 2020
targos pushed a commit that referenced this issue Jan 6, 2020
To avoid problem with the behavior of new V8 BackingStore API,
change the offset. The base address of each test case will be
different.

Fixes: #31061

PR-URL: #31171
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
targos pushed a commit that referenced this issue Jan 14, 2020
To avoid problem with the behavior of new V8 BackingStore API,
change the offset. The base address of each test case will be
different.

Fixes: #31061

PR-URL: #31171
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
targos pushed a commit that referenced this issue Jan 14, 2020
To avoid problem with the behavior of new V8 BackingStore API,
change the offset. The base address of each test case will be
different.

Fixes: #31061

PR-URL: #31171
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
BethGriggs pushed a commit that referenced this issue Feb 6, 2020
To avoid problem with the behavior of new V8 BackingStore API,
change the offset. The base address of each test case will be
different.

Fixes: #31061

PR-URL: #31171
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. flaky-test Issues and PRs related to the tests with unstable failures on the CI.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants