Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: make timers-blocking-callback more reliable #14831

Closed
wants to merge 5 commits into from

Conversation

Trott
Copy link
Member

@Trott Trott commented Aug 14, 2017

test-timers-blocking-callback may fail erroneously on
resource-constrained machines due to the timing nature of the test.

There is likely no way around the timing issue. This change tries to
decrease the probability of the test failing erroneously by having it
retry a small number of times on failure.

Tested on 0.10.38 (which has a bug that this test was written for) and
(modifying the test slightly to remove ES6 stuff) the test still seems
to fail 100% of the time there, which is what we want/expect.

Fixes: #14792

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines
Affected core subsystem(s)

test timers

test-timers-blocking-callback may fail erroneously on
resource-constrained machines due to the timing nature of the test.

There is likely no way around the timing issue. This change tries to
decrease the probability of the test failing erroneously by having it
retry a small number of times on failure.

Tested on 0.10.38 (which has a bug that this test was written for) and
(modifying the test slightly to remove ES6 stuff) the test still seems
to fail 100% of the time there, which is what we want/expect.

Fixes: nodejs#14792
@Trott Trott added test Issues and PRs related to the tests. timers Issues and PRs related to the timers subsystem / setImmediate, setInterval, setTimeout. labels Aug 14, 2017
@nodejs-github-bot nodejs-github-bot added the test Issues and PRs related to the tests. label Aug 14, 2017
@Trott
Copy link
Member Author

Trott commented Aug 14, 2017

/cc @aqrln @misterdjules

@Trott
Copy link
Member Author

Trott commented Aug 14, 2017

For anyone else wanting to verify this still fails 100% in Node.js 0.10.38 and passes 100% in Node.js 0.10.39, here's the version of this test that I used with ES6-isms removed:

'use strict';

/*
 * This is a regression test for
 * https://github.com/nodejs/node-v0.x-archive/issues/15447 and
 * and https://github.com/nodejs/node-v0.x-archive/issues/9333.
 *
 * When a timer is added in another timer's callback, its underlying timer
 * handle was started with a timeout that was actually incorrect.
 *
 * The reason was that the value that represents the current time was not
 * updated between the time the original callback was called and the time
 * the added timer was processed by timers.listOnTimeout. That led the
 * logic in timers.listOnTimeout to do an incorrect computation that made
 * the added timer fire with a timeout of scheduledTimeout +
 * timeSpentInCallback.
 *
 * This test makes sure that a timer added by another timer's callback
 * fires with the expected timeout.
 *
 * It makes sure that it works when the timers list for a given timeout is
 * empty (see testAddingTimerToEmptyTimersList) and when the timers list
 * is not empty (see testAddingTimerToNonEmptyTimersList).
 */


var assert = require('assert');
var Timer = process.binding('timer_wrap').Timer;

var TIMEOUT = 100;

var nbBlockingCallbackCalls;
var latestDelay;
var timeCallbackScheduled;

// These tests are somewhat probablistic so they may fail even when the bug is
// not present. However, they fail 100% of the time when the bug *is* present,
// so to increase reliability, allow for a small number of retries. (Keep it
// small because as currently written, one failure could result in multiple
// simultaneous retries of the test. Don't want to timer-bomb ourselves.
// Observed failures are infrequent anyway, so only a small number of retries
// is hopefully more than sufficient.)
var retries = 2;

function busyLoop(time) {
  var startTime = Timer.now();
  var stopTime = startTime + time;
  while (Timer.now() < stopTime) {}
};

function initTest() {
  nbBlockingCallbackCalls = 0;
  latestDelay = 0;
  timeCallbackScheduled = 0;
}

function blockingCallback(retry, callback) {
  ++nbBlockingCallbackCalls;

  if (nbBlockingCallbackCalls > 1) {
    latestDelay = Timer.now() - timeCallbackScheduled;
    // Even if timers can fire later than when they've been scheduled
    // to fire, they shouldn't generally be more than 100% late in this case.
    // But they are guaranteed to be at least 100ms late given the bug in
    // https://github.com/nodejs/node-v0.x-archive/issues/15447 and
    // https://github.com/nodejs/node-v0.x-archive/issues/9333.
    if (latestDelay > TIMEOUT * 2) {
      if (retries > 0) {
        retries--;
        return retry(callback);
      }
      assert.fail('timeout delayed by more than 100ms (${latestDelay}ms)');
    }
    if (callback)
      return callback();
  } else {
    // block by busy-looping to trigger the issue
    busyLoop(TIMEOUT);

    timeCallbackScheduled = Timer.now();
    setTimeout(blockingCallback.bind(null, retry, callback), TIMEOUT);
  }
}

function testAddingTimerToEmptyTimersList(callback) {
  initTest();
  // Call setTimeout just once to make sure the timers list is
  // empty when blockingCallback is called.
  setTimeout(
    blockingCallback.bind(null, testAddingTimerToEmptyTimersList, callback),
    TIMEOUT
  );
}

function testAddingTimerToNonEmptyTimersList() {
  initTest();
  // Call setTimeout twice with the same timeout to make
  // sure the timers list is not empty when blockingCallback is called.
  setTimeout(
    blockingCallback.bind(null, testAddingTimerToNonEmptyTimersList),
    TIMEOUT
  );
  setTimeout(
    blockingCallback.bind(null, testAddingTimerToNonEmptyTimersList),
    TIMEOUT
  );
}

// Run the test for the empty timers list case, and then for the non-empty
// timers list one.
testAddingTimerToEmptyTimersList(
  testAddingTimerToNonEmptyTimersList
);

@Trott
Copy link
Member Author

Trott commented Aug 14, 2017

retries--;
return retry(callback);
}
assert.fail(`timeout delayed by more than 100ms (${latestDelay}ms)`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

      assert.fail(`timeout delayed by more than 100% (${latestDelay}ms)`);

Copy link
Member Author

@Trott Trott Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure the bug that this is a regression test for was one that delayed callbacks by 100ms, not 100%. But I can't find the info. Maybe @misterdjules knows if I'm mistaken about that or not.

(It does suggest that the TIMEOUT * 2 line might be better as TIMEOUT + 100. And maybe this needs a comment explaining that TIMEOUT needs to be short.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought TIMEOUT * 2 at first, but then read L57, also the test is (latestDelay > TIMEOUT * 2) implies 100%

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I'm wrong, it really should be 100%. Will change, thanks.

TIMEOUT
);
setTimeout(
blockingCallback.bind(null, testAddingTimerToNonEmptyTimersList),
Copy link
Contributor

@refack refack Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CR: could you add a common.mustCall() to assert the async task actually runs

Copy link
Member Author

@Trott Trott Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a callback. That's a "use this as a retry function if the test fails". So it almost never gets called.

EDIT: In the above comment, I assume you mean a common.mustCall() around testAddingTimerToNonEmptyTimersList. If you meant around blockingCallback.bind(...), I'd rather not. setTimeout() holds the event loop open (and thus the process running) until the callback is called. If it fails to run, that's a The Whole World Is Broken And On Fire kind of problem and I don't think we need to flag it in this test. I don't think we tend to wrap setTimeout() stuff in common.mustCall() generally and I wouldn't want to create an impression that moving in that direction is desirable.

Copy link
Contributor

@refack refack Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blockingCallback takes a 2nd arg callback

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that it's a "Whole World Is Broken And On Fire" sitch, but adding it might help readability in that it will flag that this should be the end of the test.

Copy link
Contributor

@Fishrock123 Fishrock123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine to me

@Fishrock123
Copy link
Contributor

probably cc @misterdjules tbh

let latestDelay;
let timeCallbackScheduled;

// These tests are somewhat probablistic so they may fail even when the bug is

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: somehow the term "probabilistic" seems a bit vague, and doesn't seem to add to the rest of the sentence ("they may fail even when the bug is not present"), so I'd be inclined to remove that term. Not a big deal though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@misterdjules I updated the comment to remove the term "probablistic" and to hopefully add a bit more clarity about what the problem is.

// https://github.com/nodejs/node-v0.x-archive/issues/9333..
assert(latestDelay < TIMEOUT * 2);
// https://github.com/nodejs/node-v0.x-archive/issues/9333.
if (latestDelay > TIMEOUT * 2) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The negation of < is >=, not >.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Fixed.

// not present. However, they fail 100% of the time when the bug *is* present,
// so to increase reliability, allow for a small number of retries. (Keep it
// small because as currently written, one failure could result in multiple
// simultaneous retries of the test. Don't want to timer-bomb ourselves.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as currently written, one failure could result in multiple simultaneous retries of the test

Is there a way to rewrite that part of the test so that's not the case anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've fixed that now. Will run another stress test, but in the meantime, PTAL. Thanks!

@Trott
Copy link
Member Author

Trott commented Aug 18, 2017

@refack Updated the message per your nit. LGTY?

@Trott
Copy link
Member Author

Trott commented Aug 18, 2017

@Trott
Copy link
Member Author

Trott commented Aug 18, 2017

FreeBSD CI failure is unrelated. CI is good.

@refack Is this good by you?

@Trott
Copy link
Member Author

Trott commented Aug 20, 2017

Landed in c473d80

@Trott Trott closed this Aug 20, 2017
Trott added a commit that referenced this pull request Aug 20, 2017
test-timers-blocking-callback may fail erroneously on
resource-constrained machines due to the timing nature of the test.

There is likely no way around the timing issue. This change tries to
decrease the probability of the test failing erroneously by having it
retry a small number of times on failure.

Tested on 0.10.38 (which has a bug that this test was written for) and
(modifying the test slightly to remove ES6 stuff) the test still seems
to fail 100% of the time there, which is what we want/expect.

PR-URL: #14831
Fixes: #14792
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
MylesBorins pushed a commit that referenced this pull request Sep 10, 2017
test-timers-blocking-callback may fail erroneously on
resource-constrained machines due to the timing nature of the test.

There is likely no way around the timing issue. This change tries to
decrease the probability of the test failing erroneously by having it
retry a small number of times on failure.

Tested on 0.10.38 (which has a bug that this test was written for) and
(modifying the test slightly to remove ES6 stuff) the test still seems
to fail 100% of the time there, which is what we want/expect.

PR-URL: #14831
Fixes: #14792
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@MylesBorins MylesBorins mentioned this pull request Sep 10, 2017
MylesBorins pushed a commit that referenced this pull request Sep 12, 2017
test-timers-blocking-callback may fail erroneously on
resource-constrained machines due to the timing nature of the test.

There is likely no way around the timing issue. This change tries to
decrease the probability of the test failing erroneously by having it
retry a small number of times on failure.

Tested on 0.10.38 (which has a bug that this test was written for) and
(modifying the test slightly to remove ES6 stuff) the test still seems
to fail 100% of the time there, which is what we want/expect.

PR-URL: #14831
Fixes: #14792
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
MylesBorins pushed a commit that referenced this pull request Sep 20, 2017
test-timers-blocking-callback may fail erroneously on
resource-constrained machines due to the timing nature of the test.

There is likely no way around the timing issue. This change tries to
decrease the probability of the test failing erroneously by having it
retry a small number of times on failure.

Tested on 0.10.38 (which has a bug that this test was written for) and
(modifying the test slightly to remove ES6 stuff) the test still seems
to fail 100% of the time there, which is what we want/expect.

PR-URL: #14831
Fixes: #14792
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@MylesBorins MylesBorins mentioned this pull request Sep 20, 2017
@Trott Trott deleted the issue-14792 branch January 13, 2022 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Issues and PRs related to the tests. timers Issues and PRs related to the timers subsystem / setImmediate, setInterval, setTimeout.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate test-timers-blocking-callback
7 participants