Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce support for retrying failed flaky tests #1558

Open
arcuri82 opened this issue Aug 20, 2018 · 78 comments
Open

Introduce support for retrying failed flaky tests #1558

arcuri82 opened this issue Aug 20, 2018 · 78 comments

Comments

@arcuri82
Copy link
Contributor

Overview

Feature request to add native support to rerun failing flaky tests

Feature request

Is there any plan to support the ability of rerunning failed flaky tests? For example, it would be great to have something like @Flaky(rerun=3), which would rerun a failing flaky test up to n times.

In the past, in Surefire/Failsafe we could use rerunFailingTestsCount. However, it does not work in JUnit 5. As JUnit 5.0.0 was released nearly 1 year ago, it is unclear if and when it will be supported in Maven.

And, even if it will be supported in Maven in the future, it could be good to have a special annotation to specify that only some tests are expected to be flaky, and not all of them.

At the moment, the workaround is to stick with JUnit 4, which is a kind of a shame as JUnit 5 has a lot of interesting features :( (which I can only use in projects with no flaky tests)

Related SO question.

@sbrannen sbrannen changed the title Feature-request: Support for rerunning failed flaky tests Introduce support for retrying failed flaky tests Aug 21, 2018
@sbrannen
Copy link
Member

sbrannen commented Aug 21, 2018

No, to my knowledge there are not currently any plans to implement such a feature.

But... I think it's a very good idea.

I've often wanted something like Spring Retry's @Retryable support for test methods. Of course, Spring Retry supports a much larger feature set than we would ever need/want in JUnit Jupiter.

Thanks for providing the link to the discussion on Stack Overflow. There are two interesting implementations there that can serve as inspiration.

Ideally, I'd like to see something like a @RetryableTest annotation that allows one to specify:

  • maximum number of retries
  • configurable exceptions that initiate a retry, defaulting to {Throwable.class}
  • potentially a minimum success rate

@junit-team/junit-lambda, thoughts?

@sbrannen
Copy link
Member

Tentatively slated for 5.4 M1 for team discussion

@jbduncan
Copy link
Contributor

configurable exceptions that initiate a retry, defaulting to {Throwable.class}

Hmm, how about defaulting to Exception.class or Throwable.class-except-blacklisted-exceptions?

My thinking is that in the off-chance an OutOfMemoryError is thrown, it should really be propagated to allow the JVM to shutdown as cleanly as it possibly can. 🤔

@sormuras
Copy link
Member

I don't like flaky tests (read: checks). Who does?
I don't like the idea of "fixing" flaky tests by (naiv, smart, conditional, [what|for]-ever) re-execution.
I don't want to support that in Jupiter.

Easy work-around:

@Test void test() { REPEAT flaky() UNTIL flaky-signals-green OR break-out }
boolean flaky() { ... return true : false ... }

Perhaps (!) such an annotation could live in an extension providing library: like https://junit-pioneer.org or another one. If we need to expose a new extension point tailored to make the "retry" functionality possible, well, I'm fine with that.

@arcuri82
Copy link
Contributor Author

@sormuras I do not like flaky tests either :) but unfortunately they are a reality :(

not an issue with unit tests (unless you work with randomized algorithms), but they are a big problem for E2E tests. There, it is hard to escape from flakyness. Even if a test wrongly fails 1 out 1000 times, when you have hundreds/thousands of E2E tests, there is a high probability that at least 1 fails, failing the whole build.

Of course one can use a try/catch in a loop, but that is a lot of boilerplate that has to be repeated each time for each test.

In the past, it was not an issue with JUnit 4, as this was externally handled with Maven. But this is not the case any more. In my case, this is a major showstopper for using JUnit 5. In some projects we tried JUnit 5 and were forced to revert to 4. In others, we use JUnit 5 for unit tests, but then have a (Maven) module for the E2E using JUnit 4.

@nipafx
Copy link
Contributor

nipafx commented Aug 21, 2018

Coincidentally, I recently hacked an extension that repeats failed tests:

class RepeatFailedTestTests {

	private static int FAILS_ONLY_ON_FIRST_INVOCATION;

	@RepeatFailedTest(3)
	void failsNever_executedOnce_passes() { }

	@RepeatFailedTest(3)
	void failsOnlyOnFirstInvocation_executedTwice_passes() {
		FAILS_ONLY_ON_FIRST_INVOCATION++;
		if (FAILS_ONLY_ON_FIRST_INVOCATION == 1) {
			throw new IllegalArgumentException();
		}
	}

	@RepeatFailedTest(3)
	void failsAlways_executedThreeTimes_fails() {
		throw new IllegalArgumentException();
	}

}

If JUnit 5 decides that flaky tests have no place in a proper testing framework, we pioneers don't have such scruples. 😉

@sbrannen
Copy link
Member

Nice, @nicolaiparlog! 👍

I'm guessing you didn't know about the Rerunner Jupiter extension? 🤔

@nipafx
Copy link
Contributor

nipafx commented Aug 21, 2018

Nope. 😆 Most extensions I write come from excursions into the extension model, not from actual use cases (hence the lack of research).

Looks like @arcuri82 has at least one, maybe soon two options to achieve his goal of rerunning flaky tests in JUnit 5 - no need for native support. 😉

@sormuras
Copy link
Member

Related to "flaky tests": https://twitter.com/michaelbolton/status/1032125004480237568

@smoyer64
Copy link
Contributor

He should sooooo change his name - https://www.youtube.com/watch?v=ADgS_vMGgzY

@marcphilipp
Copy link
Member

According to @anonygoose in #1079 (comment) the Rerunner extension is "dangerously buggy, and inactive".

@anonygoose Could you please post an example that showcases its bugginess?

@anonygoose
Copy link

anonygoose commented Sep 4, 2018

Hey Marc,

import io.github.artsok.RepeatedIfExceptionsTest;

public class RerunnerBugTest {

    @RepeatedIfExceptionsTest(repeats = 3)
    void test1() {
        System.out.println("Running test 1");
    }

    @RepeatedIfExceptionsTest(repeats = 3)
    void test2() throws Exception {
        System.out.println("Running test 2");
        throw new Exception("How does this not error?  It never runs!");
    }

    @RepeatedIfExceptionsTest(repeats = 3)
    void test3() {
        System.out.println("Running test 3");
    }

}

The above set of tests will run test1, and then skip test2 and test3. As soon as one test annotated with the annotation passes, it will skip any other tests annotated with it.

Uploaded the full example here: https://github.com/anonygoose/rerunner-bug-example

@artsok
Copy link

artsok commented Sep 4, 2018

Hi guys :)

I can make Pull Request with ReRun flasky mechanism, but i need information :)

@anonygoose
Copy link

Getting official support for a retry feature would be great.

Ideally, I'd like to see something like a @RetryableTest annotation that allows one to specify:

  • maximum number of retries
  • configurable exceptions that initiate a retry, defaulting to {Throwable.class}
  • potentially a minimum success rate

Each of these points are configurable on the current Rerunner extension, and are useful. Carrying them over to any official support seems sensible.

Further consideration may need giving to:

  • Making retries work across the entire life-cycle, wherever the exception occurs (BeforeEach, AfterEach, etc)
  • What the proper outcome of the test is. Current rerun implementations end up with a lot of tests being reported as skipped. This is a bit messy when trying to interpret results.

@nipafx
Copy link
Contributor

nipafx commented Sep 6, 2018

It would be nice to know whether the JUnit 5 team is considering implementing this.

  • If so, I will not continue working on my experiments.
  • If not, this issue can be closed and @artsok and I can decide how to move forward with his extension and my snippets.

@artsok: Would you consider integrating your extension into JUnit Pioneer? You would get more visibility and more hands to fix issues or implement features.

@sormuras
Copy link
Member

sormuras commented Sep 6, 2018

...now on the list for tomorrow's team call.

@benken-parasoft
Copy link

benken-parasoft commented Sep 6, 2018

dangerously buggy, and inactive

I wanted to second this. I came to a similar conclusion a few weeks ago. I had to give up on other attempts, not feeling they were entirely safe, clean, light-weight, or stable. Instead, I ended up making my own copy of RepeatedTestExtension then made it use a custom TestTemplateInvocationContext Iterator with a TestExecutionExceptionHandler that conditionally throws a TestAbortedException. This seems to be the general approach I've seen floating around but at least I have my own light-weight solution that I have control over. I wish artsok and pioneer the best of luck but it would definitely be nice to have a clean, official, stable, and trusted solution.

@filiphr
Copy link
Contributor

filiphr commented Sep 7, 2018

Just to add my 2 cents to this issue. We also have such flaky tests and were using the Surefire rerunFailingTestsCount in order to circumvent this. However, since our migration to JUnit Jupiter we noticed that this does not work anymore 😄. It would be great if there is something official from JUnit or maybe the appropriate plugins (Maven, Gradle) that would still have this support.

If you ask me I think that it is a bit better if we can do this expressively in the test itself and not have it global for the entire build.

@marcphilipp
Copy link
Member

Team Decision: As a first step, we think this should be maintained externally as an extension based on @TestTemplate. We might potentially introduce support for rerunning flaky tests in core, but then it should support all kinds of testable methods, including @ParameterizedTests. We'd be happy to provide guidance and review such an extension, whether in the rerunner project, junit-pioneer or another project.

@marcphilipp marcphilipp modified the milestones: 5.4 M1, General Backlog Sep 7, 2018
@marcphilipp marcphilipp removed this from the 5.x Backlog milestone Jun 19, 2021
@hakanai
Copy link

hakanai commented Jun 25, 2021

I looked at Gradle's flaky re-runner and I'm not sure that having the setting be project-global is very great, especially when the vast majority of the tests are expected to be rock-solid.

Did you see that you can specify custom annotations or class name pattern for those tests that should be retried?

I did not, no. Does it also let you have a different number of retries per test?

I also took a closer look at both rerunner-jupiter and pioneer's way of doing it, but because neither have a property to put the reason for marking the test as flaky, we're still stuck with our own implementation for the time being. But both of those and also our own current way of doing it all have the issue that the @Test annotation can't also be on the same method. This also means that the @Flaky annotation can't be put on the class, e.g. if all the tests in one class are flaky. :(

If only an InvocationInterceptor could just call proceed() twice, this would be much, much easier.

@marcphilipp
Copy link
Member

I did not, no. Does it also let you have a different number of retries per test?

No, the number of retries can only be configured globally.

If only an InvocationInterceptor could just call proceed() twice, this would be much, much easier.

I stand by what I said previously in #1558 (comment).

@pavelicii
Copy link

Any update on when this will be implemented?

@SandersAaronD
Copy link

It would be relatively easy to implement this in the platform. However, it might not satisfy the needs of all users since static state, threads that leaked from prior tests and might have caused the flakiness doesn't get reset.

I'd like to add +1 for doing this.

To add to the use cases: We have a bunch of E2E tests. Some are working, some are flaky, and some are broken. In any given test run, it's difficult to tell which are flaky and which are actually broken, so I have to rerun them manually to sort those out. Right now I will run my entire test suite 3-5 times and manually look at the reports side by side to see which tests fail uniformly, and which fail only sometimes.

A broken test indicates, probably, a logic error somewhere. A flaky test indicates a latency, availability or concurrency issue. They are very different things, and I need to go looking for the root issue in different places, and being able to see the difference matters.

We already have to manage static state and threading issues when doing repeated runs of a test or using test concurrency in junit, it's not clear to me that facing those issues is a good reason not to put an opt-in feature in since those are already common for use cases already in the package.

@hakanai
Copy link

hakanai commented Oct 15, 2021 via email

@SandersAaronD
Copy link

Latency and availability, certainly. But a concurrency issue is a logic error.

On Fri, 15 Oct 2021 at 04:55, Aaron Sanders @.***> wrote: It would be relatively easy to implement this in the platform. However, it might not satisfy the needs of all users since static state, threads that leaked from prior tests and might have caused the flakiness doesn't get reset. I'd like to add +1 for doing this. To add to the use cases: We have a bunch of E2E tests. Some are working, some are flaky, and some are broken. In any given test run, it's difficult to tell which are flaky and which are actually broken, so I have to rerun them manually to sort those out. Right now I will run my entire test suite 3-5 times and manually look at the reports side by side to see which tests fail uniformly, and which fail only sometimes. A broken test indicates, probably, a logic error somewhere. A flaky test indicates a latency, availability or concurrency issue. They are very different things, and I need to go looking for the root issue in different places, and being able to see the difference matters. We already have to manage static state and threading issues when doing repeated runs of a test or using test concurrency in junit, it's not clear to me that facing those issues is a good reason not to put an opt-in feature in since those are already common for use cases already in the package. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1558 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAKRZAYUQXKLY7L7R6KG23UG4KQNANCNFSM4FQSYPTQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Fair, but still very different to bug-hunt for.

@SandersAaronD
Copy link

I did not, no. Does it also let you have a different number of retries per test?

No, the number of retries can only be configured globally.

If only an InvocationInterceptor could just call proceed() twice, this would be much, much easier.

I stand by what I said previously in #1558 (comment).

You can specify a custom task that only runs tests for certain tags and set the retry block in the gradle plugin there. I have retries set of my regression tests but not smoke tests, for example, since they are triggered from different tasks.

@marcphilipp
Copy link
Member

Have you all tried JUnit Pioneer's support for retrying failed tests?

@pavelicii
Copy link

pavelicii commented Oct 18, 2021

@marcphilipp

Have you all tried JUnit Pioneer's support for retrying failed tests?

As @hakanai said, this is not a good solution, especially for those who use @ParameterizedTest:

rerunner-jupiter's and JUnit Pioneer's annotations didn't really do it for us either - they both introduced a new annotation for flaky tests which replaces the @Test annotation. This is less than great for two separate reasons:
(1) When we want to disable a test, we don't go and replace the @Test annotation with a @DisabledTest annotation - we add the @Disabled annotation next to the @Test annotation. Having flakiness annotations behave differently is inconsistent.
(2) When you want to mark a @ParameterizedTest as flaky, now you need yet another annotation. (And this will compound the more things you want to annotate about the test, or the more ways you have of running a test.)

@Michael1993
Copy link

Technically, I think we (JUnit Pioneer) could support that. You would be able to write something like this:

@Test
@Flaky(maxRetries = 3)
void test() {
  // some flaky test code here
}

It would create results like this:
image

Personally, I think that looks a bit ugly... but if this is good enough and there is demand for it, we will host it (I'll ask the other maintainers but I'm fairly confident they'll agree).

@ksobolew
Copy link

Yeah, this is inconsistent with the general pattern. More "regular" way would be to add @FlakyTest, @FlakyParameterizedTest etc., which may be even worse.

@wyhasany
Copy link

It passed more then 3 years and there is no reliable solution to retry @ParameterizedTest or @TestFactory. There are only ones working with build tools (surefire for maven and gradle retry plugin). Maybe console-launcher could be configurable to rerun failing tests?

@iNikem
Copy link

iNikem commented Oct 17, 2022

@nipafx Do I understand correctly, that JUnit Pioneer does not support retrying ParameterizedTest?

@nipafx
Copy link
Contributor

nipafx commented Nov 10, 2022

@iNikem Yes, that is the correct. I explain the reason in a related issue.

@Avinm
Copy link

Avinm commented Mar 30, 2023

It passed more then 3 years and there is no reliable solution to retry @ParameterizedTest or @testfactory.

I am here to echo this on behalf of several of my team members. Would there be anything we could consider to inch closer to resolving this? (and essentially junit-pioneer/junit-pioneer#586).

I could try contributing towards an implementation if junit owners can align on and provide a direction.
FWIW, I think testng has IRetryAnalyzer which helps to neatly solve this problem.

Ref: https://testng.org/doc/documentation-main.html#rerunning

@benken-parasoft
Copy link

I ended up making my own copy of RepeatedTestExtension

JUnit 5.10.0 now has "Failure threshold for @RepeatedTest." No more need for my hacked copy of RepeatedTestExtension. Thanks!

@JonathanBerkeley
Copy link

JonathanBerkeley commented Aug 23, 2023

Adding successThreshold to @RepeatedTest would solve this.

E.G.
@RepeatedTest(value = 3, successThreshold = 1)

@carusology
Copy link

An additional use case has come up when writing LLM-based AI integration tests.

When an LLM-based AI is asked to return a response with well-formatted data, it can do so reliably the vast majority of the time. However, it isn't truly deterministic - even if provided the exact same user and system same prompts. Having first class support for retrying would way to accommodate this unpredictability.

It would be even better if it could retry based upon specific exception types. For example, it could only retry parsing errors because the LLM didn't respond in the expected well-formatted data specified in the system prompts.

@pan3793
Copy link

pan3793 commented Nov 1, 2023

I'm wondering how to achieve the scalatest's eventually functionality in JUnit5? This is extremely useful to tackle flaky test assertions or wait some resources to start before moving to next steps.

test("test case contains flaky assertion") {
  // ... some logic maybe with assertion
  
  eventually(timeout(30 seconds), interval(100 millis)) {
     // assertion that may fail
  }

  // ... some logic maybe with assertion
}

Ref: http://doc.scalatest.org/1.8/org/scalatest/concurrent/Eventually.html

@quiram
Copy link

quiram commented Nov 2, 2023

Hi @pan3793,

I think for this the easiest is using a tool like awaitility. We use it in our projects for precisely this reason and it works fairly well.

@ArvindJoshi-okta
Copy link

All of the above mentioned workaround options have annotations (for rerunning flaky test) at a method level. I am looking for an option to specify the flakiness retry for the entire code base, could be at class level on a base class. Any suggestions for that? Thanks!

@Avinm
Copy link

Avinm commented Jul 23, 2024

@ArvindJoshi-okta:
I was able to achieve it (in an admittedly hacky manner) using an InvocationInterceptor:

public void interceptTestTemplateMethod(Invocation<Void> invocation,
                                            ReflectiveInvocationContext<Method> invocationContext,
                                            ExtensionContext extensionContext) throws Throwable {
    try {
            invocation.proceed();
    } catch (Throwable initialEx) {
       if(shouldRetry(initialEx) {
               invocationContext.getExecutable().invoke(
                        invocationContext.getTarget().get(), invocationContext.getArguments().toArray()
              ); // Invokes your method again, consider looping with try catch if multiple retries are needed
       }
    }
}

One downfall of this approach is that it does not execute the @Before* methods again though

@ArvindJoshi-okta
Copy link

Thanks, @Avinm! I have tried similar options and yes, running the Befores is crucial for reruns.

@benken-parasoft
Copy link

Adding successThreshold to @RepeatedTest would solve this.

I just wanted to highlight this again. It seems reasonable.

@nipafx
Copy link
Contributor

nipafx commented Jul 30, 2024

Adding successThreshold to @RepeatedTest would solve this.

I want to offer a counterpoint to this sentiment. At the moment @RepeatedTests only succeed if all executions succeed, which makes the annotation fundamentally unfit to carry flaky tests across the finishing line. Adding successThreshold changes that just as fundamentally and opens the door to an entirely new use case. A use case that, as we have discovered in JUnit Pioneer, quickly demands follow-up additions:

  • fixed pause times
  • backoff strategies
  • differentiate by exception type
  • probably more I don't remember right now

There's nothing wrong with these features but IMO it doesn't make sense to take the first step down that road if one is not willing to travel it further.

So, as I see it, the decision is not "can we add successThreshold to @RepeatedTest?" but "do we want to turn @RepeatedTest into an annotation that (properly) supports rerunning flaky tests?".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests