Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky bot being timed out before it can report issue #1507

Closed
anguillanneuf opened this issue Mar 12, 2021 · 4 comments
Closed

Flaky bot being timed out before it can report issue #1507

anguillanneuf opened this issue Mar 12, 2021 · 4 comments
Labels
bot: flakybot type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@anguillanneuf
Copy link

python-pubsublite samples were quietly failing from 2/23 to 3/9, but Flakybot didn't report any failures because the tests timed out (>180 minutes) before Flakybot ran.

TL;DR: The PR that broke these samples came from python-pubsub on 2/22. It caused a method that cancels a streaming pull future from a Pub/Sub Lite subscribe call to hang.

Is there something that Flakybot could do to catch such failures in the future?

@bcoe bcoe added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Mar 12, 2021
@tbpg
Copy link
Contributor

tbpg commented Mar 17, 2021

Thanks for the report. I've been thinking about this, and I'm not sure of a good solution.

One idea is to set a bash TRAP so that flakybot always runs? From the log, I'm not sure it would have run, though:

ERROR: Aborting VM command due to timeout of 10800 seconds

We can use flakybot to say "the build failed" even when it doesn't give a specific test case that failed. But, in this case, I don't think that would help. We know what case failed, the VM just gets shut down.

@JustinBeckwith
Copy link
Contributor

Hmmm. Brainstorming time - I wonder if we could retroactively look at CheckSuiteRuns on a cron for tests that run on the default branch? Yeah we don't have access to run code if the kokoro VM loses it's gourd, but GitHub would still see a failed status check, right? I think there's some nuance to how the GitHub API works with Kokoro registered jobs vs GitHub Action jobs, so some research would be needed.

@chingor13
Copy link
Contributor

Note: nightly jobs are not guaranteed to create status checks on the commits they run on unless we start a policy that the should.

@tmatsuo
Copy link
Contributor

tmatsuo commented May 5, 2022

Here is my 2 cents.

The nightly build can fail even before flakybot can do anything useful. We should have a way of checking nightly build status without flakybot. Closing, but feel free to reopen with a rationale.

@tmatsuo tmatsuo closed this as completed May 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot: flakybot type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

6 participants