Revert #17914 #18238

joyeecheung · 2018-01-18T17:19:42Z

See if this fixes #18190

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
documentation is changed or added
commit message follows commit guidelines

Affected core subsystem(s)

This reverts commit 1312db5.

This reverts commit 5eccbb0.

This reverts commit 8c00a80.

This reverts commit da7804f.

This reverts commit 57d7638.

This reverts commit 791975d.

joyeecheung · 2018-01-18T17:21:17Z

CI: https://ci.nodejs.org/job/node-test-pull-request/12613/

apapirovski · 2018-01-18T18:01:41Z

A couple more runs just on Linux:
https://ci.nodejs.org/job/node-test-commit-linux/15669/
https://ci.nodejs.org/job/node-test-commit-linux/15671/

jasnell · 2018-01-18T18:02:50Z

That test is particularly sensitive. I'd prefer not to revert these commits. The test likely needs to be updated instead.

apapirovski · 2018-01-18T18:04:24Z

@jasnell I think @joyeecheung is just trying to trace down the cause, rather than submit an official PR to revert. Right now we're not even sure which commit is causing it as there's some contradictory evidence in past CI runs.

jasnell · 2018-01-18T18:07:34Z

Yep, I get that. There are several things that could cause this. In my experience, it has typically been the addition/removal of a new AsyncWrap Provider type, but there are several other reasons why this can fail in not so obvious ways due to the magic of how async_hooks are implemented. It can be difficult to track down.

joyeecheung · 2018-01-18T18:12:01Z

@apapirovski @jasnell Yeah I opened this because I had no idea how to run https://ci.nodejs.org/job/node-stress-single-test on shas so I kinda have to do the bisecting this way...

(Although I still have no idea how to run the flaky tests with PR refs, there are tons of EACCESS on common.PIPE..)

apapirovski · 2018-01-18T18:17:07Z

Well, 4 CI runs on Linux and this test hasn't failed a single time on alpine... it seems like the issue is somewhere in these commits and their interaction with that test. I still don't see any offending code tho... 🤔

Edit: it does seem like the Windows failure could be unrelated tho. So we potentially have two different bugs in that test.

jasnell · 2018-01-18T18:26:40Z

That particular bit fails when something keeps the event loop open... ping @addaleax who tracked this down previously. When we've seen this before, it was caused by very subtle differences in timing of garbage collection. In one recent case, a change in the node_perf.cc on how the gc callback was scheduled to run caused a bug that only became apparent because the completely unrelated http2 module happened to load just enough string constants that the gc was triggered, and it caused beforeExit to fire twice. Fun eh? @addaleax may be able to give some pointers on how to track this down.

joyeecheung · 2018-01-18T18:27:06Z

#17914 actually only touches sync APIs after a rebase against #18144 , and the only sync API used in that test is openSync which is not touched by #17914.... :/? So I think this probably has something to do with the timings/load, like the previous flake, not the APIs..

apapirovski · 2018-01-18T19:15:54Z

@jasnell yeah, I worked on that particular issue so it was the first thing that came to mind but I don't see any changes that are responsible for keeping the event loop open. I can't replicate locally either which makes it all the more difficult to debug.

On that note: is there a particular reason we're checking in beforeExit? Could that check be moved to exit?

Although... anything that keeps the event loop open after beforeExit fires but doesn't originate from the listener is concerning.

jasnell · 2018-01-18T19:18:53Z

Unfortunately it could be any unref'd handle, it doesn't have to be limited to this change. For instance, in the http2 PR that triggered this, there was zero code in the PR that was directly responsible. Hmm.. going to have to think about that a bit more.

Re: why beforeExit it being used in the test, I'm not sure. You'd have to ask @trevnorris

apapirovski · 2018-01-18T19:34:01Z

I think we're on the same page. I just find it concerning that we have another instance of some side-effect that brings the loop alive. The SetImmediate bit was definitely a bug. Even if this PR just happened to surface an unrelated bug, it's important to not just sweep it under the rug by reverting, working around it or changing to use an exit listener.

Will try to spend some time digging around to see what other code we have that can bring a loop alive as a side-effect.

apapirovski · 2018-01-18T22:17:25Z

@joyeecheung @addaleax @jasnell I believe I have a fix in #18241

joyeecheung · 2018-01-22T03:43:10Z

Closed now that #18241 landed

joyeecheung added 6 commits January 19, 2018 01:18

Revert "test: test error messages from fs.realpath{Sync}"

45b7c1b

This reverts commit 1312db5.

Revert "test: verify errors thrown from fs stat APIs"

d26057b

This reverts commit 5eccbb0.

Revert "fs: throw fs.fstat{Sync} errors in JS"

3d2b73c

This reverts commit 8c00a80.

Revert "fs: throw fs.lstat{Sync} errors in JS"

87b0f31

This reverts commit da7804f.

Revert "fs: throw fs.stat{Sync} errors in JS"

579b69c

This reverts commit 57d7638.

Revert "fs: return errno and take fs_req_wrap in SyncCall"

7555104

This reverts commit 791975d.

nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. fs Issues and PRs related to the fs subsystem / file system. labels Jan 18, 2018

joyeecheung mentioned this pull request Jan 18, 2018

Investigate flaky sequential/test-async-wrap-getasyncid #18190

Closed

joyeecheung added the wip Issues and PRs that are still a work in progress. label Jan 18, 2018

joyeecheung closed this Jan 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert #17914 #18238

Revert #17914 #18238

joyeecheung commented Jan 18, 2018

joyeecheung commented Jan 18, 2018

apapirovski commented Jan 18, 2018

jasnell commented Jan 18, 2018

apapirovski commented Jan 18, 2018 •

edited

Loading

jasnell commented Jan 18, 2018

joyeecheung commented Jan 18, 2018 •

edited

Loading

apapirovski commented Jan 18, 2018 •

edited

Loading

jasnell commented Jan 18, 2018

joyeecheung commented Jan 18, 2018 •

edited

Loading

apapirovski commented Jan 18, 2018 •

edited

Loading

jasnell commented Jan 18, 2018

apapirovski commented Jan 18, 2018

apapirovski commented Jan 18, 2018

joyeecheung commented Jan 22, 2018

Revert #17914 #18238

Revert #17914 #18238

Conversation

joyeecheung commented Jan 18, 2018

Checklist

Affected core subsystem(s)

joyeecheung commented Jan 18, 2018

apapirovski commented Jan 18, 2018

jasnell commented Jan 18, 2018

apapirovski commented Jan 18, 2018 • edited Loading

jasnell commented Jan 18, 2018

joyeecheung commented Jan 18, 2018 • edited Loading

apapirovski commented Jan 18, 2018 • edited Loading

jasnell commented Jan 18, 2018

joyeecheung commented Jan 18, 2018 • edited Loading

apapirovski commented Jan 18, 2018 • edited Loading

jasnell commented Jan 18, 2018

apapirovski commented Jan 18, 2018

apapirovski commented Jan 18, 2018

joyeecheung commented Jan 22, 2018

apapirovski commented Jan 18, 2018 •

edited

Loading

joyeecheung commented Jan 18, 2018 •

edited

Loading

apapirovski commented Jan 18, 2018 •

edited

Loading

joyeecheung commented Jan 18, 2018 •

edited

Loading

apapirovski commented Jan 18, 2018 •

edited

Loading