Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests seem to hang on Mac on GitHub Actions #79

Closed
mvdan opened this issue Sep 8, 2020 · 5 comments · Fixed by #400
Closed

tests seem to hang on Mac on GitHub Actions #79

mvdan opened this issue Sep 8, 2020 · 5 comments · Fixed by #400
Assignees
Labels
kind/bug A bug in existing code (including security flaws)

Comments

@mvdan
Copy link
Contributor

mvdan commented Sep 8, 2020

For example: https://github.com/ipld/go-ipld-prime/pull/77/checks?check_run_id=1083075042

This seems to happen rather frequently; we've seen six of these in just a week, where twenty or so workflows were run.

GitHub confirmed this doesn't seem like a bug on their end, because the process is still running. But how can that be, since a test package execution should timeout after ten minutes by default? I'm pretty sure that's enforced via kill signals, too.

I'm opening this issue to investigate.

@alepauly
Copy link

alepauly commented Sep 8, 2020

GitHub confirmed this doesn't seem like a bug on their end, because the process is still running.

@mvdan Just to clarify, we haven't ruled out an issue on our end but the problem here seems to be something is holding on to the runner for the full 6 hours at which time the service times it out. We'll continue digging on the service side but if you can help us by confirming on the workflow side that nothing is keeping the agent process busy, that would be great. Please let us know what you find and I'll update here as soon as we know more. Thanks!

@mvdan
Copy link
Contributor Author

mvdan commented Sep 8, 2020

Thanks! The PR #77 seems to make the hang far more likely, because it makes some of the heavier tests run in parallel. I'm going to push some extra stuff in that PR to try to see what's happening when it hangs.

@mvdan
Copy link
Contributor Author

mvdan commented Sep 16, 2020

We never found out the "why", but I assume it has something to do with all the concurrent builds that get triggered, which honestly aren't a great idea to begin with. For some reason, a bunch of go build calls were hanging or crashing in some way that confused go test.

In any case, though, the fact that go test didn't obey its default 10m timeout is a Go bug. See golang/go#24050 (comment).

@marten-seemann marten-seemann added the kind/bug A bug in existing code (including security flaws) label Aug 16, 2021
@mvdan mvdan removed their assignment Mar 30, 2022
@BigLep
Copy link

BigLep commented Apr 5, 2022

2022-04-05 conversation: @rvagg will talk to @mvdan about this, and confirming that this is the only thing blocking unified CI.

rvagg added a commit that referenced this issue Apr 6, 2022
There's an underlying bug (or set of them) here that is causing problems
on macos. A temporary fix to get macos into CI is to disable the
parallelism but we really need to find the real cause and fix that.

Ref: #79
rvagg added a commit that referenced this issue Apr 6, 2022
There's an underlying bug (or set of them) here that is causing problems
on macos. A temporary fix to get macos into CI is to disable the
parallelism but we really need to find the real cause and fix that.

Ref: #79
rvagg added a commit that referenced this issue Apr 6, 2022
There's an underlying bug (or set of them) here that is causing problems
on macos. A temporary fix to get macos into CI is to disable the
parallelism but we really need to find the real cause and fix that.

Ref: #79
rvagg added a commit that referenced this issue Apr 7, 2022
There's an underlying bug (or set of them) here that is causing problems
on macos. A temporary fix to get macos into CI is to disable the
parallelism but we really need to find the real cause and fix that.

Ref: #79
@rvagg
Copy link
Member

rvagg commented Apr 19, 2022

Going to close this since #400 is merged to deal with the consequences of this problem, although not really the cause - there's TODOs in the code now to point to it.

@rvagg rvagg closed this as completed Apr 19, 2022
@rvagg rvagg moved this to 🎉 Done in IPLD team's weekly tracker Apr 19, 2022
rvagg added a commit that referenced this issue Apr 27, 2022
There's an underlying bug (or set of them) here that is causing problems
on macos. A temporary fix to get macos into CI is to disable the
parallelism but we really need to find the real cause and fix that.

Ref: #79
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws)
Projects
Status: 🎉 Done
Development

Successfully merging a pull request may close this issue.

5 participants