-
-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too many failing checks / implement smarter CI #3662
Comments
Thanks for the suggestion, I have myself had a variety of ideas like this in mind since some while based off on recommendations from other repositories. Adding labels to check like I suppose skipping the benchmark runs on documentation-only fixes would be fine. It is the longest job and we don't need it to run everywhere, especially if we're not pushing changes to an external repository like we do with a CRON job. I'm aware that SciPy's CI configuration has a lot of such things we can seek inspiration from; for example, they check for the There is indeed a lot of redundancy with things like repeated workflows and workflow files (the scheduled tests and regular CI tests could be integrated into the same file, #3518 is still open, benchmarks-related jobs can have less files, etc.) P.S. |
I understand the perils of skipping certain checks for certain PRs. But then can it be possible to ensure that checks don't fail without a valid reason? This would mean investigating each workflow and the possibilities where they may be failing erroneously. |
I suppose the usual practice is to run tests in such a way that they are as upright as they can be (fail on any deprecation warnings detected or so on) and that they are not flaky (do not fail without reason). We regularly experience benchmarks failures due to the absence of fixed CI runners, one way could be to conduct a statistical analysis by looking at the past data for benchmarks available on https://pybamm.org/benchmarks and adjust the thresholds for failure using the normal distribution obtained so that it is experienced once or twice in a hundred pull request runs. The other way is to improve the benchmark times themselves by improving the solver. Both of these are hard to do, but one thing that can be done is to not fail the workflow but let it emit a "status required" message, i.e., not let it get marked with a ❌ but with a As for other workflows, most of them don't fail at random, except for the link checker which sometimes does so on certain links (we have a lot of them throughout the source code) |
Thanks @adityaraute for opening this, this would really save resources and potentially make development faster. I had also been thinking that we should [skip docs] until & unless there's a documentation related change & benchmarks can be skipped in most of the PRs. +1 for @agriyakhetarpal's suggestion, can we maybe also label complexity & priority for this? |
I agree, though let's keep this open for some while first for more discussion. For now, we have these scenarios:
We can create file patterns based on these points, though for some of them it might make sense to do so after #3641 is complete. Please let me know if there are more scenarios that can be looked after. |
Might be helpful for detecting changes across a range of commits on a pull request:
|
@agriyakhetarpal Can we considered this resolved now? There have been a bunch of improvements |
Not for now, I would say – the recent improvements have been great (#4125, #4099), but we want to do much more on this in the coming months. @prady0t will be starting out with restructuring some parts of the workflows aside from the |
@agriyakhetarpal, we might want to have a hypothetical checklist of tasks that one might need to do & a starting point? |
I think so – the easiest thing to start would be to implement some patterns in the |
A good place for anyone who wants to start with this is this resource: https://learn.scientific-python.org/development/guides/gha-basic/#advanced-usage |
Description
I was scrolling through the PRs and noticed most PRs failing some checks or the others; often the same ones.
I've attached a screenshot where a missing package led to a check fail, but there are several similar instances.
Motivation
If checks are failing, and yet they may not be related to the issue at hand, I wonder if it makes the Check or Action itself irrelevant, at least to the majority of PRs.
Possible Implementation
Reduce the number of actions/checks attached to certain workflows, such that only the ones that are essential for a PR are run.
Additionally, investigate why certain actions fail repeatedly and rectify any issues in the core repo build.
Additional context
An example of an action running ubuntu tests on PRs
The text was updated successfully, but these errors were encountered: