-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert #17914 #18238
Revert #17914 #18238
Conversation
A couple more runs just on Linux: |
That test is particularly sensitive. I'd prefer not to revert these commits. The test likely needs to be updated instead. |
@jasnell I think @joyeecheung is just trying to trace down the cause, rather than submit an official PR to revert. Right now we're not even sure which commit is causing it as there's some contradictory evidence in past CI runs. |
Yep, I get that. There are several things that could cause this. In my experience, it has typically been the addition/removal of a new AsyncWrap Provider type, but there are several other reasons why this can fail in not so obvious ways due to the magic of how async_hooks are implemented. It can be difficult to track down. |
@apapirovski @jasnell Yeah I opened this because I had no idea how to run https://ci.nodejs.org/job/node-stress-single-test on shas so I kinda have to do the bisecting this way... (Although I still have no idea how to run the flaky tests with PR refs, there are tons of EACCESS on common.PIPE..) |
Well, 4 CI runs on Linux and this test hasn't failed a single time on alpine... it seems like the issue is somewhere in these commits and their interaction with that test. I still don't see any offending code tho... 🤔 Edit: it does seem like the Windows failure could be unrelated tho. So we potentially have two different bugs in that test. |
That particular bit fails when something keeps the event loop open... ping @addaleax who tracked this down previously. When we've seen this before, it was caused by very subtle differences in timing of garbage collection. In one recent case, a change in the node_perf.cc on how the gc callback was scheduled to run caused a bug that only became apparent because the completely unrelated http2 module happened to load just enough string constants that the gc was triggered, and it caused beforeExit to fire twice. Fun eh? @addaleax may be able to give some pointers on how to track this down. |
@jasnell yeah, I worked on that particular issue so it was the first thing that came to mind but I don't see any changes that are responsible for keeping the event loop open. I can't replicate locally either which makes it all the more difficult to debug. On that note: is there a particular reason we're checking in Although... anything that keeps the event loop open after |
Unfortunately it could be any unref'd handle, it doesn't have to be limited to this change. For instance, in the http2 PR that triggered this, there was zero code in the PR that was directly responsible. Hmm.. going to have to think about that a bit more. Re: why beforeExit it being used in the test, I'm not sure. You'd have to ask @trevnorris |
I think we're on the same page. I just find it concerning that we have another instance of some side-effect that brings the loop alive. The Will try to spend some time digging around to see what other code we have that can bring a loop alive as a side-effect. |
@joyeecheung @addaleax @jasnell I believe I have a fix in #18241 |
Closed now that #18241 landed |
See if this fixes #18190
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)