test: make timers-blocking-callback more reliable #14831

Trott · 2017-08-14T22:48:43Z

test-timers-blocking-callback may fail erroneously on
resource-constrained machines due to the timing nature of the test.

There is likely no way around the timing issue. This change tries to
decrease the probability of the test failing erroneously by having it
retry a small number of times on failure.

Tested on 0.10.38 (which has a bug that this test was written for) and
(modifying the test slightly to remove ES6 stuff) the test still seems
to fail 100% of the time there, which is what we want/expect.

Fixes: #14792

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
commit message follows commit guidelines

Affected core subsystem(s)

test timers

test-timers-blocking-callback may fail erroneously on resource-constrained machines due to the timing nature of the test. There is likely no way around the timing issue. This change tries to decrease the probability of the test failing erroneously by having it retry a small number of times on failure. Tested on 0.10.38 (which has a bug that this test was written for) and (modifying the test slightly to remove ES6 stuff) the test still seems to fail 100% of the time there, which is what we want/expect. Fixes: nodejs#14792

Trott · 2017-08-14T22:48:58Z

/cc @aqrln @misterdjules

Trott · 2017-08-14T22:50:15Z

For anyone else wanting to verify this still fails 100% in Node.js 0.10.38 and passes 100% in Node.js 0.10.39, here's the version of this test that I used with ES6-isms removed:

'use strict';

/*
 * This is a regression test for
 * https://github.com/nodejs/node-v0.x-archive/issues/15447 and
 * and https://github.com/nodejs/node-v0.x-archive/issues/9333.
 *
 * When a timer is added in another timer's callback, its underlying timer
 * handle was started with a timeout that was actually incorrect.
 *
 * The reason was that the value that represents the current time was not
 * updated between the time the original callback was called and the time
 * the added timer was processed by timers.listOnTimeout. That led the
 * logic in timers.listOnTimeout to do an incorrect computation that made
 * the added timer fire with a timeout of scheduledTimeout +
 * timeSpentInCallback.
 *
 * This test makes sure that a timer added by another timer's callback
 * fires with the expected timeout.
 *
 * It makes sure that it works when the timers list for a given timeout is
 * empty (see testAddingTimerToEmptyTimersList) and when the timers list
 * is not empty (see testAddingTimerToNonEmptyTimersList).
 */


var assert = require('assert');
var Timer = process.binding('timer_wrap').Timer;

var TIMEOUT = 100;

var nbBlockingCallbackCalls;
var latestDelay;
var timeCallbackScheduled;

// These tests are somewhat probablistic so they may fail even when the bug is
// not present. However, they fail 100% of the time when the bug *is* present,
// so to increase reliability, allow for a small number of retries. (Keep it
// small because as currently written, one failure could result in multiple
// simultaneous retries of the test. Don't want to timer-bomb ourselves.
// Observed failures are infrequent anyway, so only a small number of retries
// is hopefully more than sufficient.)
var retries = 2;

function busyLoop(time) {
  var startTime = Timer.now();
  var stopTime = startTime + time;
  while (Timer.now() < stopTime) {}
};

function initTest() {
  nbBlockingCallbackCalls = 0;
  latestDelay = 0;
  timeCallbackScheduled = 0;
}

function blockingCallback(retry, callback) {
  ++nbBlockingCallbackCalls;

  if (nbBlockingCallbackCalls > 1) {
    latestDelay = Timer.now() - timeCallbackScheduled;
    // Even if timers can fire later than when they've been scheduled
    // to fire, they shouldn't generally be more than 100% late in this case.
    // But they are guaranteed to be at least 100ms late given the bug in
    // https://github.com/nodejs/node-v0.x-archive/issues/15447 and
    // https://github.com/nodejs/node-v0.x-archive/issues/9333.
    if (latestDelay > TIMEOUT * 2) {
      if (retries > 0) {
        retries--;
        return retry(callback);
      }
      assert.fail('timeout delayed by more than 100ms (${latestDelay}ms)');
    }
    if (callback)
      return callback();
  } else {
    // block by busy-looping to trigger the issue
    busyLoop(TIMEOUT);

    timeCallbackScheduled = Timer.now();
    setTimeout(blockingCallback.bind(null, retry, callback), TIMEOUT);
  }
}

function testAddingTimerToEmptyTimersList(callback) {
  initTest();
  // Call setTimeout just once to make sure the timers list is
  // empty when blockingCallback is called.
  setTimeout(
    blockingCallback.bind(null, testAddingTimerToEmptyTimersList, callback),
    TIMEOUT
  );
}

function testAddingTimerToNonEmptyTimersList() {
  initTest();
  // Call setTimeout twice with the same timeout to make
  // sure the timers list is not empty when blockingCallback is called.
  setTimeout(
    blockingCallback.bind(null, testAddingTimerToNonEmptyTimersList),
    TIMEOUT
  );
  setTimeout(
    blockingCallback.bind(null, testAddingTimerToNonEmptyTimersList),
    TIMEOUT
  );
}

// Run the test for the empty timers list case, and then for the non-empty
// timers list one.
testAddingTimerToEmptyTimersList(
  testAddingTimerToNonEmptyTimersList
);

Trott · 2017-08-14T22:52:26Z

CI: https://ci.nodejs.org/job/node-test-pull-request/9663/

refack · 2017-08-15T15:56:11Z

test/sequential/test-timers-blocking-callback.js

+        retries--;
+        return retry(callback);
+      }
+      assert.fail(`timeout delayed by more than 100ms (${latestDelay}ms)`);


Maybe

assert.fail(`timeout delayed by more than 100% (${latestDelay}ms)`);

~~I'm pretty sure the bug that this is a regression test for was one that delayed callbacks by 100ms, not 100%. But I can't find the info. Maybe @misterdjules knows if I'm mistaken about that or not.~~

~~(It does suggest that the TIMEOUT * 2 line might be better as TIMEOUT + 100. And maybe this needs a comment explaining that TIMEOUT needs to be short.)~~

I thought TIMEOUT * 2 at first, but then read L57, also the test is (latestDelay > TIMEOUT * 2) implies 100%

Nope, I'm wrong, it really should be 100%. Will change, thanks.

refack · 2017-08-15T16:11:49Z

test/sequential/test-timers-blocking-callback.js

+    TIMEOUT
+  );
+  setTimeout(
+    blockingCallback.bind(null, testAddingTimerToNonEmptyTimersList),


CR: could you add a common.mustCall() to assert the async task actually runs

That's not a callback. That's a "use this as a retry function if the test fails". So it almost never gets called.

EDIT: In the above comment, I assume you mean a common.mustCall() around testAddingTimerToNonEmptyTimersList. If you meant around blockingCallback.bind(...), I'd rather not. setTimeout() holds the event loop open (and thus the process running) until the callback is called. If it fails to run, that's a The Whole World Is Broken And On Fire kind of problem and I don't think we need to flag it in this test. I don't think we tend to wrap setTimeout() stuff in common.mustCall() generally and I wouldn't want to create an impression that moving in that direction is desirable.

blockingCallback takes a 2nd arg callback

Agree that it's a "Whole World Is Broken And On Fire" sitch, but adding it might help readability in that it will flag that this should be the end of the test.

Fishrock123

seems fine to me

Fishrock123 · 2017-08-15T17:56:02Z

probably cc @misterdjules tbh

misterdjules · 2017-08-15T18:48:54Z

test/sequential/test-timers-blocking-callback.js

+let latestDelay;
+let timeCallbackScheduled;
+
+// These tests are somewhat probablistic so they may fail even when the bug is


nit: somehow the term "probabilistic" seems a bit vague, and doesn't seem to add to the rest of the sentence ("they may fail even when the bug is not present"), so I'd be inclined to remove that term. Not a big deal though.

@misterdjules I updated the comment to remove the term "probablistic" and to hopefully add a bit more clarity about what the problem is.

misterdjules · 2017-08-15T19:00:27Z

test/sequential/test-timers-blocking-callback.js

-    // https://github.com/nodejs/node-v0.x-archive/issues/9333..
-    assert(latestDelay < TIMEOUT * 2);
+    // https://github.com/nodejs/node-v0.x-archive/issues/9333.
+    if (latestDelay > TIMEOUT * 2) {


The negation of < is >=, not >.

Ack. Fixed.

misterdjules · 2017-08-15T19:17:04Z

test/sequential/test-timers-blocking-callback.js

+// not present. However, they fail 100% of the time when the bug *is* present,
+// so to increase reliability, allow for a small number of retries. (Keep it
+// small because as currently written, one failure could result in multiple
+// simultaneous retries of the test. Don't want to timer-bomb ourselves.


as currently written, one failure could result in multiple simultaneous retries of the test

Is there a way to rewrite that part of the test so that's not the case anymore?

I think I've fixed that now. Will run another stress test, but in the meantime, PTAL. Thanks!

Trott · 2017-08-18T04:20:53Z

@refack Updated the message per your nit. LGTY?

Trott · 2017-08-18T04:40:35Z

CI: https://ci.nodejs.org/job/node-test-pull-request/9727/

Trott · 2017-08-18T18:19:44Z

FreeBSD CI failure is unrelated. CI is good.

@refack Is this good by you?

Trott · 2017-08-20T22:42:32Z

Landed in c473d80

test-timers-blocking-callback may fail erroneously on resource-constrained machines due to the timing nature of the test. There is likely no way around the timing issue. This change tries to decrease the probability of the test failing erroneously by having it retry a small number of times on failure. Tested on 0.10.38 (which has a bug that this test was written for) and (modifying the test slightly to remove ES6 stuff) the test still seems to fail 100% of the time there, which is what we want/expect. PR-URL: #14831 Fixes: #14792 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>

Trott added test Issues and PRs related to the tests. timers Issues and PRs related to the timers subsystem / setImmediate, setInterval, setTimeout. labels Aug 14, 2017

nodejs-github-bot added the test Issues and PRs related to the tests. label Aug 14, 2017

Trott mentioned this pull request Aug 14, 2017

Investigate test-timers-blocking-callback #14792

Closed

refack suggested changes Aug 15, 2017

View reviewed changes

Fishrock123 approved these changes Aug 15, 2017

View reviewed changes

misterdjules reviewed Aug 15, 2017

View reviewed changes

squash: message nit

c208131

Trott added 3 commits August 17, 2017 21:23

squash: update comment

fce2712

squash: > to >=

ec83819

squash: do not timer-bomb ourselves

0f352fd

jasnell approved these changes Aug 18, 2017

View reviewed changes

refack approved these changes Aug 19, 2017

View reviewed changes

Trott closed this Aug 20, 2017

MylesBorins mentioned this pull request Sep 10, 2017

v8.5.0 proposal #15308

Merged

MylesBorins added the land-on-v6.x label Sep 20, 2017

MylesBorins mentioned this pull request Sep 20, 2017

v6.11.4 proposal #15506

Merged

Trott deleted the issue-14792 branch January 13, 2022 22:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: make timers-blocking-callback more reliable #14831

test: make timers-blocking-callback more reliable #14831

Trott commented Aug 14, 2017

Trott commented Aug 14, 2017

Trott commented Aug 14, 2017

Trott commented Aug 14, 2017

refack Aug 15, 2017

Trott Aug 15, 2017 •

edited

Loading

refack Aug 15, 2017

Trott Aug 15, 2017

refack Aug 15, 2017 •

edited

Loading

Trott Aug 15, 2017 •

edited

Loading

refack Aug 15, 2017 •

edited

Loading

refack Aug 15, 2017

Fishrock123 left a comment

Fishrock123 commented Aug 15, 2017

misterdjules Aug 15, 2017

Trott Aug 18, 2017

misterdjules Aug 15, 2017

Trott Aug 18, 2017

misterdjules Aug 15, 2017

Trott Aug 18, 2017

Trott commented Aug 18, 2017

Trott commented Aug 18, 2017

Trott commented Aug 18, 2017

Trott commented Aug 20, 2017

test: make timers-blocking-callback more reliable #14831

test: make timers-blocking-callback more reliable #14831

Conversation

Trott commented Aug 14, 2017

Checklist

Affected core subsystem(s)

Trott commented Aug 14, 2017

Trott commented Aug 14, 2017

Trott commented Aug 14, 2017

Choose a reason for hiding this comment

Trott Aug 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

refack Aug 15, 2017 • edited Loading

Choose a reason for hiding this comment

Trott Aug 15, 2017 • edited Loading

Choose a reason for hiding this comment

refack Aug 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fishrock123 left a comment

Choose a reason for hiding this comment

Fishrock123 commented Aug 15, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Trott commented Aug 18, 2017

Trott commented Aug 18, 2017

Trott commented Aug 18, 2017

Trott commented Aug 20, 2017

Trott Aug 15, 2017 •

edited

Loading

refack Aug 15, 2017 •

edited

Loading

Trott Aug 15, 2017 •

edited

Loading

refack Aug 15, 2017 •

edited

Loading