-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: make timers-blocking-callback more reliable #14831
Conversation
test-timers-blocking-callback may fail erroneously on resource-constrained machines due to the timing nature of the test. There is likely no way around the timing issue. This change tries to decrease the probability of the test failing erroneously by having it retry a small number of times on failure. Tested on 0.10.38 (which has a bug that this test was written for) and (modifying the test slightly to remove ES6 stuff) the test still seems to fail 100% of the time there, which is what we want/expect. Fixes: nodejs#14792
/cc @aqrln @misterdjules |
For anyone else wanting to verify this still fails 100% in Node.js 0.10.38 and passes 100% in Node.js 0.10.39, here's the version of this test that I used with ES6-isms removed: 'use strict';
/*
* This is a regression test for
* https://github.com/nodejs/node-v0.x-archive/issues/15447 and
* and https://github.com/nodejs/node-v0.x-archive/issues/9333.
*
* When a timer is added in another timer's callback, its underlying timer
* handle was started with a timeout that was actually incorrect.
*
* The reason was that the value that represents the current time was not
* updated between the time the original callback was called and the time
* the added timer was processed by timers.listOnTimeout. That led the
* logic in timers.listOnTimeout to do an incorrect computation that made
* the added timer fire with a timeout of scheduledTimeout +
* timeSpentInCallback.
*
* This test makes sure that a timer added by another timer's callback
* fires with the expected timeout.
*
* It makes sure that it works when the timers list for a given timeout is
* empty (see testAddingTimerToEmptyTimersList) and when the timers list
* is not empty (see testAddingTimerToNonEmptyTimersList).
*/
var assert = require('assert');
var Timer = process.binding('timer_wrap').Timer;
var TIMEOUT = 100;
var nbBlockingCallbackCalls;
var latestDelay;
var timeCallbackScheduled;
// These tests are somewhat probablistic so they may fail even when the bug is
// not present. However, they fail 100% of the time when the bug *is* present,
// so to increase reliability, allow for a small number of retries. (Keep it
// small because as currently written, one failure could result in multiple
// simultaneous retries of the test. Don't want to timer-bomb ourselves.
// Observed failures are infrequent anyway, so only a small number of retries
// is hopefully more than sufficient.)
var retries = 2;
function busyLoop(time) {
var startTime = Timer.now();
var stopTime = startTime + time;
while (Timer.now() < stopTime) {}
};
function initTest() {
nbBlockingCallbackCalls = 0;
latestDelay = 0;
timeCallbackScheduled = 0;
}
function blockingCallback(retry, callback) {
++nbBlockingCallbackCalls;
if (nbBlockingCallbackCalls > 1) {
latestDelay = Timer.now() - timeCallbackScheduled;
// Even if timers can fire later than when they've been scheduled
// to fire, they shouldn't generally be more than 100% late in this case.
// But they are guaranteed to be at least 100ms late given the bug in
// https://github.com/nodejs/node-v0.x-archive/issues/15447 and
// https://github.com/nodejs/node-v0.x-archive/issues/9333.
if (latestDelay > TIMEOUT * 2) {
if (retries > 0) {
retries--;
return retry(callback);
}
assert.fail('timeout delayed by more than 100ms (${latestDelay}ms)');
}
if (callback)
return callback();
} else {
// block by busy-looping to trigger the issue
busyLoop(TIMEOUT);
timeCallbackScheduled = Timer.now();
setTimeout(blockingCallback.bind(null, retry, callback), TIMEOUT);
}
}
function testAddingTimerToEmptyTimersList(callback) {
initTest();
// Call setTimeout just once to make sure the timers list is
// empty when blockingCallback is called.
setTimeout(
blockingCallback.bind(null, testAddingTimerToEmptyTimersList, callback),
TIMEOUT
);
}
function testAddingTimerToNonEmptyTimersList() {
initTest();
// Call setTimeout twice with the same timeout to make
// sure the timers list is not empty when blockingCallback is called.
setTimeout(
blockingCallback.bind(null, testAddingTimerToNonEmptyTimersList),
TIMEOUT
);
setTimeout(
blockingCallback.bind(null, testAddingTimerToNonEmptyTimersList),
TIMEOUT
);
}
// Run the test for the empty timers list case, and then for the non-empty
// timers list one.
testAddingTimerToEmptyTimersList(
testAddingTimerToNonEmptyTimersList
); |
retries--; | ||
return retry(callback); | ||
} | ||
assert.fail(`timeout delayed by more than 100ms (${latestDelay}ms)`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
assert.fail(`timeout delayed by more than 100% (${latestDelay}ms)`);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure the bug that this is a regression test for was one that delayed callbacks by 100ms, not 100%. But I can't find the info. Maybe @misterdjules knows if I'm mistaken about that or not.
(It does suggest that the TIMEOUT * 2
line might be better as TIMEOUT + 100
. And maybe this needs a comment explaining that TIMEOUT
needs to be short.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought TIMEOUT * 2
at first, but then read L57, also the test is (latestDelay > TIMEOUT * 2)
implies 100%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I'm wrong, it really should be 100%. Will change, thanks.
TIMEOUT | ||
); | ||
setTimeout( | ||
blockingCallback.bind(null, testAddingTimerToNonEmptyTimersList), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CR: could you add a common.mustCall()
to assert the async task actually runs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not a callback. That's a "use this as a retry function if the test fails". So it almost never gets called.
EDIT: In the above comment, I assume you mean a common.mustCall()
around testAddingTimerToNonEmptyTimersList
. If you meant around blockingCallback.bind(...)
, I'd rather not. setTimeout()
holds the event loop open (and thus the process running) until the callback is called. If it fails to run, that's a The Whole World Is Broken And On Fire kind of problem and I don't think we need to flag it in this test. I don't think we tend to wrap setTimeout()
stuff in common.mustCall()
generally and I wouldn't want to create an impression that moving in that direction is desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blockingCallback
takes a 2nd arg callback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that it's a "Whole World Is Broken And On Fire" sitch, but adding it might help readability in that it will flag that this should be the end of the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems fine to me
probably cc @misterdjules tbh |
let latestDelay; | ||
let timeCallbackScheduled; | ||
|
||
// These tests are somewhat probablistic so they may fail even when the bug is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: somehow the term "probabilistic" seems a bit vague, and doesn't seem to add to the rest of the sentence ("they may fail even when the bug is not present"), so I'd be inclined to remove that term. Not a big deal though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@misterdjules I updated the comment to remove the term "probablistic" and to hopefully add a bit more clarity about what the problem is.
// https://github.com/nodejs/node-v0.x-archive/issues/9333.. | ||
assert(latestDelay < TIMEOUT * 2); | ||
// https://github.com/nodejs/node-v0.x-archive/issues/9333. | ||
if (latestDelay > TIMEOUT * 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The negation of < is >=, not >.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. Fixed.
// not present. However, they fail 100% of the time when the bug *is* present, | ||
// so to increase reliability, allow for a small number of retries. (Keep it | ||
// small because as currently written, one failure could result in multiple | ||
// simultaneous retries of the test. Don't want to timer-bomb ourselves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as currently written, one failure could result in multiple simultaneous retries of the test
Is there a way to rewrite that part of the test so that's not the case anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've fixed that now. Will run another stress test, but in the meantime, PTAL. Thanks!
@refack Updated the message per your nit. LGTY? |
FreeBSD CI failure is unrelated. CI is good. @refack Is this good by you? |
Landed in c473d80 |
test-timers-blocking-callback may fail erroneously on resource-constrained machines due to the timing nature of the test. There is likely no way around the timing issue. This change tries to decrease the probability of the test failing erroneously by having it retry a small number of times on failure. Tested on 0.10.38 (which has a bug that this test was written for) and (modifying the test slightly to remove ES6 stuff) the test still seems to fail 100% of the time there, which is what we want/expect. PR-URL: #14831 Fixes: #14792 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
test-timers-blocking-callback may fail erroneously on resource-constrained machines due to the timing nature of the test. There is likely no way around the timing issue. This change tries to decrease the probability of the test failing erroneously by having it retry a small number of times on failure. Tested on 0.10.38 (which has a bug that this test was written for) and (modifying the test slightly to remove ES6 stuff) the test still seems to fail 100% of the time there, which is what we want/expect. PR-URL: #14831 Fixes: #14792 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
test-timers-blocking-callback may fail erroneously on resource-constrained machines due to the timing nature of the test. There is likely no way around the timing issue. This change tries to decrease the probability of the test failing erroneously by having it retry a small number of times on failure. Tested on 0.10.38 (which has a bug that this test was written for) and (modifying the test slightly to remove ES6 stuff) the test still seems to fail 100% of the time there, which is what we want/expect. PR-URL: #14831 Fixes: #14792 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
test-timers-blocking-callback may fail erroneously on resource-constrained machines due to the timing nature of the test. There is likely no way around the timing issue. This change tries to decrease the probability of the test failing erroneously by having it retry a small number of times on failure. Tested on 0.10.38 (which has a bug that this test was written for) and (modifying the test slightly to remove ES6 stuff) the test still seems to fail 100% of the time there, which is what we want/expect. PR-URL: #14831 Fixes: #14792 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
test-timers-blocking-callback may fail erroneously on
resource-constrained machines due to the timing nature of the test.
There is likely no way around the timing issue. This change tries to
decrease the probability of the test failing erroneously by having it
retry a small number of times on failure.
Tested on 0.10.38 (which has a bug that this test was written for) and
(modifying the test slightly to remove ES6 stuff) the test still seems
to fail 100% of the time there, which is what we want/expect.
Fixes: #14792
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)
test timers