Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve Flaky CI builds #980

Closed
martincostello opened this issue Dec 16, 2022 · 24 comments
Closed

Resolve Flaky CI builds #980

martincostello opened this issue Dec 16, 2022 · 24 comments
Labels
CI/build Hacktoberfest Suggested contribution for Hacktoberfest stale Stale issues or pull requests

Comments

@martincostello
Copy link
Member

Our AppVeyor CI build has been flaky for a while now, which causes unnecessary complication to making changes, and with different permissions to the GitHub repository makes being able to do things like retry a build difficult to achieve despite write access to the repo.

As a stop-gap measure, I added a Windows CI using GitHub Actions in #979 to give an alternative view of whether the project is building successfully.

The question is, what should we do to push forward and get CI stable again so that contributors can be confident in changes they submit?

Two possible approaches include:

  1. Fix the AppVeyor CI and update the permissions so those with write access to the repo can do things like retry builds.
  2. Move away from AppVeyor, and use GitHub Actions instead.

Cards on the table, I'm in favour of item 2 because I use Actions exclusively for my own open source projects and at work, and I'm a big fan of having one CI/CD system, rather than multiples. It also has the benefit of allowing us to build and test on Linux, macOS and Windows in a single CI/CD system.

I am however happy to go with the majority position if that is to keep the status quo of AppVeyor.

If we decided to move to GitHub Actions, the following things are missing from the basic one I've set up already:

  1. The build doesn't yet work for macOS and Linux;
  2. The created NuGet packages are missing the appropriate SemVer;
  3. The build isn't currently triggered by tags (and doesn't do any tagging either).
  4. Something like GitHubActionsTestLogger needs adding to show test results in the build logs.
  5. NuGet packages aren't published anywhere.
@martincostello
Copy link
Member Author

Discussed with @joelhulen and @SimonCropp, and (probably tomorrow) I will:

  • Remove the AppVeyor CI
  • Investigate fixing the build to include Linux and macOS
  • Investigate adding codecov coverage metrics to give us a coverage baseline for the future
  • Investigate how we could create a workflow for making publishing easier while still keeping the existing gates on what gets released and who can do so

@joelhulen
Copy link
Member

100% in agreement to move away from AppVeyor in favor of GitHub Actions. We implemented AppVeyor before GitHub Actions was a thing. Plus, AppVeyor gave us free build minutes each month since this is an Open Source project. Since then, GitHub Actions has become a viable (and, in many ways, better) alternative. Moving to GH Actions will also open the door to implementing policies like code coverage, improving deployment options, etc.

Thanks, Martin!

martincostello added a commit to martincostello/Polly that referenced this issue Jan 6, 2023
Remove the AppVeyor CI.
Contributes to App-vNext#980.
@martincostello
Copy link
Member Author

@joelhulen I've deleted the AppVeyor config, but it's still trying to build things. I guess there's something you need to turn off somewhere in the AppVeyor website.

martincostello added a commit to martincostello/Polly that referenced this issue Jan 6, 2023
Add GitHubActionsTestLogger for builds in GitHub Actions.
Contributes to App-vNext#980.
@joelhulen
Copy link
Member

@martincostello I'll look into it now.

@joelhulen
Copy link
Member

@martincostello I've deleted the Polly project from AppVeyor.

As for GitHub, we want to create two deployments: GitHub releases and nuget.org. I can help set up the NuGet piece when you're ready.

@joelhulen
Copy link
Member

As for the releases, those need to be gated so that an admin needs to approve the deployments.

@martincostello
Copy link
Member Author

Cheers Joel - I'm working on the CI now in #995, then I'll sort out xplat, then once that's done look a the publish/release flow.

@martincostello
Copy link
Member Author

The CI is now working correctly with respect to:

  • Versioning
  • Test results
  • Deterministic builds
  • Code coverage (we're currently at 71% in codecov.io)

Next up I'll look at sorting out the Linux and macOS builds.

@joelhulen
Copy link
Member

Great progress, Martin. Thanks!

@martincostello
Copy link
Member Author

@martintmk This still happens a little bit, but is mostly resolved. Are there any specific tests you're aware of we need to de-flake.

@martincostello martincostello added this to the v8.0.0 milestone Jun 16, 2023
@martincostello martincostello added the v8 Issues related to the new version 8 of the Polly library. label Jun 16, 2023
@martintmk
Copy link
Contributor

@martincostello , the problems i see quite often is that the code coverage check fails for unknown reasons.

We should investigate that.

@martincostello
Copy link
Member Author

I've seen that one in other projects, I think it's a coverlet issue.

@martincostello
Copy link
Member Author

We still get this a bit, but I think whatever is breaking it is external.

Should we close or leave this open @martintmk?

@martintmk
Copy link
Contributor

Dunno, the situation is not ideal as there are a lot of build failures, especially the code-scan one:

https://github.com/App-vNext/Polly/actions/workflows/codeql-analysis.yml

If we can stabilize it it would improve things.

@martincostello
Copy link
Member Author

martincostello commented Sep 27, 2023

I get that in two of my projects too - the C# compiler seems to crash for some reason with no useful information (I've tried to get more detail from another repo and get nothing).

Again, it's external so I don't think there's anything we can do to fix it ourselves.

@martincostello martincostello removed the v8 Issues related to the new version 8 of the Polly library. label Sep 27, 2023
@martincostello martincostello removed this from the v8.0.0 milestone Sep 27, 2023
@martincostello
Copy link
Member Author

We can leave it open, but I've removed the v8 milestones.

@martintmk
Copy link
Contributor

Again, it's external so I don't think there's anything we can do to fix it ourselves.

Can we do retries? Maybe try up-to 3 times. It's especially annoying for the bot-approver, i think 2 or 3 of automatic PRs failed because of this.

@martincostello
Copy link
Member Author

I don't know how to do that without having something external like a bot automatically retriggering them.

@martincostello
Copy link
Member Author

This failure looks like flakiness we need to fix though: https://github.com/App-vNext/Polly/actions/runs/6325048079/job/17175769906?pr=1641#step:5:400

@martincostello martincostello added the Hacktoberfest Suggested contribution for Hacktoberfest label Sep 28, 2023
@martincostello
Copy link
Member Author

FYI I've logged an issue with Roslyn about the code-ql failures we keep getting: dotnet/roslyn#70368

@martincostello martincostello removed their assignment Oct 31, 2023
@martincostello
Copy link
Member Author

Since updating to .NET 8, we seems to be getting flaky tests related to telemetry for .NET Framework due to some concurrent state somewhere being updated: example

@martincostello
Copy link
Member Author

Might have to skip that test on .NET Framework - it doesn't seem to want to pass at all now 😓

Copy link
Contributor

This issue is stale because it has been open for 60 days with no activity. It will be automatically closed in 14 days if no further updates are made.

@github-actions github-actions bot added the stale Stale issues or pull requests label Jan 16, 2024
@martincostello
Copy link
Member Author

Things seem to have been stable for a while now, so I'm going to close this. We can open a new one if it comes back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/build Hacktoberfest Suggested contribution for Hacktoberfest stale Stale issues or pull requests
Projects
None yet
Development

No branches or pull requests

3 participants