How to motivate people to fix flaky tests #21

orangemocha · 2016-02-26T19:44:22Z

The current documented policy for flaky tests (https://github.com/nodejs/node/wiki/Flaky-tests#what-to-do-when-you-encounter-a-new-flaky-test) calls for opening an issue to track them when you mark the test as flaky, and assigning the issue to the next release milestone.

One part that I think could use some improvement is clarifying who is going to take responsibility for fixing the flaky test / how to motivate people to do it. The person who marks the tests as flaky is usually the collaborator who is making the determination that the test is not failing because of the current pull request changes. They are not motivated to fix the test and not necessarily the most qualified in the particular test that is failing.

In a dev team working for one company, you could probably just assign the issue to the test author/owner. I am not sure that this would work in an open source project.

So how do we motivate collaborators to investigate and fix these failures? Here are some options we could consider:

We stick to the current policy of assigning them to the next release milestone. The main problem with this is that we might see the list too late in the release cycle, and decide to punt. To counter that, we could try to increase awareness of issues that are flagged for a milestone, throughout the milestone (for example by sending periodic reports). I think this might be useful even beyond flaky tests.
We somehow set a timeout for how long a test can be marked as flaky, or a limit to the number of tests marked as flaky, or both. When the limit is reached, we block merging all pull requests until the situation is brought back under control. This makes the whole team accountable for fixing flaky tests. I recall @domenic saying that this is what the v8 team does.
We try to motivate collaborators by some sort of reward mechanism, like keeping a scoreboard with how many issues they have resolved in a given period (and assign extra points for flaky tests).
We come up with an algorithm for assigning the flaky test issue to a specific collaborator, in addition to assigning a milestone. For example, the assignee could be the last active collaborator to have modified the test logic. If the search doesn't yield a suitable candidate, we could fall back to round-robin selection between a pool of collaborators, ideally choosing the pool based on the the area of the test.

claudiorodriguez · 2016-03-20T04:55:56Z

I hope you don't mind me putting my two cents here, but I've fixed a few flaky tests and lately I've been looking into fixing more. My takeout is:

It would help a lot to put the actual log lines the test in question produced, into the issue. The CI UI can be a bit cumbersome to newcomers. Perhaps the finder of the flaky test isn't the most motivated to fix it, but I think asking for some relevant log output might go a long way in helping people take the issue.
Given that flaky tests almost always happen in platforms I wouldn't describe as typical end-user environments, the biggest step is actually reproducing the error. Creating a document with some general hints on this, with assistance from whoever manages the CI platform, would be a great help. Also, listing frequent sources of flakiness (timeouts, dropped sockets, port in use, etc) and how to debug. An issue template for flaky tests might be a good idea.
Finally, the flakiness in several flaky tests can be attributed to the platform itself (some centos flavour on the CI has caused many flaky tests because of dropped sockets, IIRC). In this case, if possible at all the CI platform itself should be fixed, otherwise affected tests should be identified and the problem and possible solutions be included in a document.

I for one am interested in helping fix flaky tests, but some more information on how to reproduce results from particular platforms would be welcome.

Trott · 2016-03-24T16:50:44Z

I'm noticing an uptick in tagging @nodejs/testing when a flaky test fails. In and of itself, that's OK, but it seems more often than not to be the extent of what is done, and that may be the opposite of OK. How do we feel about that? I can see different ways of looking at it, but I'm finding myself feeling this way:

This makes it easy for people to ignore flaky tests and also ignore actual failures by assuming they are flaky. Tag @nodejs/testing and move on! This is bad and we should stop it. Additionally, it's not scalable. And it wrongly puts the onus of fixing flaky tests on the testing working group when that responsibility should lie with all collaborators.

What to do about it? I'm not sure.

Here's an example: nodejs/node#5314 (comment) (Not picking on you @jasnell, it's just the latest I've come across. And you're more likely to do this sort of thing because you work to push along an unusually large number of PRs so if anyone's going to do it and have it be OK, it's you.)

Labeling this issue for the agenda for the next meeting.

Trott · 2016-03-24T16:56:31Z

One possible thing to do is have a short, easy-to-understand bullet point or checklist of things to do when you find a possibly-flaky test and refer to it when this happens. Off the top of my head:

Investigate the test to confirm that the failure is unrelated to your change.
Re-run CI on the platform where the test failed. If the test fails consistently, then the test is not flaky. Something is broken. It may be the core code, it may be the test, but something needs fixing.
If it is flaky, do two things:
- Open an issue titled "Investigate flaky TESTNAME". Include a link to the failure in CI.
- Open a PR to mark the test as flaky in the appropriate *.status file

One thing we may need to codify is who is responsible for doing those last two things. Like, in the example above, is it James? Is it the person who opened the PR? I mean, it's everybody's responsibility. But if it's everybody's responsibility, it's easy for nobody to actually do it, you know? But we don't want to punish people for noting a flaky test by making them do a bunch of other stuff. Maybe we could ease the opening of the issue with some automation or something? I don't know what the answer is here, or even if there is a good one.

mhdawson · 2016-10-24T12:58:19Z

Long time since the last discussion, but I think we can as for "Open an issue titled "Investigate flaky TESTNAME". Include a link to the failure in CI." as the minimal responsibility we all have as collaborators.

In terms of the PR to mark the test as flaky, I'm thinking that can depend on how flaky it is. In some cases it may be better not to have it marked as that makes more invisible and possibly less likely for somebody to decide to try an fix it.

I'm not sure if in general flaky tests are a good first contribution, but marking those that are may be a good way to get more attention on them.

Trott · 2018-03-13T20:56:20Z

It seems like perhaps this should be closed. Feel free to re-open (or leave a comment requesting that it be re-opened) if you disagree. I'm just tidying up and not acting on a super-strong opinion or anything like that.

Trott added the wg-agenda label Mar 24, 2016

Trott closed this as completed Mar 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to motivate people to fix flaky tests #21

How to motivate people to fix flaky tests #21

orangemocha commented Feb 26, 2016

claudiorodriguez commented Mar 20, 2016

Trott commented Mar 24, 2016

Trott commented Mar 24, 2016

mhdawson commented Oct 24, 2016

Trott commented Mar 13, 2018

How to motivate people to fix flaky tests #21

How to motivate people to fix flaky tests #21

Comments

orangemocha commented Feb 26, 2016

claudiorodriguez commented Mar 20, 2016

Trott commented Mar 24, 2016

Trott commented Mar 24, 2016

mhdawson commented Oct 24, 2016

Trott commented Mar 13, 2018