GitHub Actions - sometimes... our 'host is unreachable' #2890

trel · 2021-03-09T23:52:15Z

Description

We are using GitHub Actions to install packages from our own apt/yum repository, housed on a public VM, raw directories under Apache. Not seeing any problems from anywhere else... however...

Sometimes... 25%?... the GitHub Action cannot get to https://unstable.irods.org

It resolves correctly, the IP address is right.

Area for Triage:

Containers

Question, Bug, or Feature?:

Bug

Virtual environments affected

Image version

Version: 20210302.0

Expected behavior

Expected to be able to see/use our server.

Actual behavior

From:

https://github.com/irods/irods/blob/master/.github/workflows/build-irods.yml#L31

We see this in the Action logs:

W: Failed to fetch https://unstable.irods.org/apt/dists/bionic/InRelease  Could not connect to unstable.irods.org:443 (152.54.5.173), connection timed out
W: Some index files failed to download. They have been ignored, or old ones used instead.
E: Unable to locate package irods-externals-*
E: Couldn't find any package by glob 'irods-externals-*'
E: Couldn't find any package by regex 'irods-externals-*'

Repro steps

The commits to https://github.com/irods/irods trigger builds - sometimes they fail. Manual retries eventually can connect, and complete their work.

Has the feel of a firewall somewhere between GitHub and our VM that is throttling connections, perhaps IP-based. Is there any way to detect / determine this?

The text was updated successfully, but these errors were encountered:

AlenaSviridenko · 2021-03-10T14:50:25Z

Hi @trel,
we host our images in Azure. Please, create a VM there and if the issue will appear on that VM, then please contact Azure support.

trel · 2021-03-10T18:45:27Z

I have reproduced the failure with fewer moving parts - just a curl call.

https://github.com/trel/irods/pull/9/checks?check_run_id=2079836230

I'll look into trying a similar curl in vanilla Azure

trel · 2021-03-15T17:14:57Z

In some further testing, curl and wget always work fine. Only when hitting these servers with apt update do they fail to connect.

Could this be an "apt and https" issue? As in, the proxy is not configured for apt via https correctly within some of the Azure containers?

trel · 2021-03-15T17:36:55Z

#2919 possibly related?

Darleev · 2021-03-15T18:10:21Z

Hello @trel,
The issue #2919 is related to the specific OpenSUSE repository problem.
I tried to find any information regarding the error in the workflow:

W: Some index files failed to download. They have been ignored, or old ones used instead.

And all information that I was able to find - it is some temporary error with external apt-mirror. So, it is some error with https://unstable.irods.org/apt/ repository, and I believe it makes sense to check this problem from their side as well.

trel · 2021-03-15T18:23:30Z

That side is my side :)

We've checked and have no connectivity issues from anywhere else in the world that we've checked. And, it's also happening on https://packages.irods.org, a different VM with similar setup.

We're still investigating, but at this time it does feel like a firewall/proxy issue in the container itself (as most of the time it works cleanly, and has worked for some time via travis with the same commands before moving to github actions).

Darleev · 2021-03-15T20:37:40Z

Hello @trel,
Thank you for the provided information. We are checking the issue from our side as well, I'll try to deploy clean local/Azure machines and reproduce the issue one more time.
I'll keep you informed.

trel · 2021-03-16T16:39:45Z

Just saw the same issue via yum on a CentOS7 container, rather than with apt on Ubuntu...

https://unstable.irods.org/yum/pool/centos7/x86_64/irods-externals-avro1.7.7-0-1.0-1.x86_64.rpm: [Errno 12] Timeout on https://unstable.irods.org/yum/pool/centos7/x86_64/irods-externals-avro1.7.7-0-1.0-1.x86_64.rpm: (28, 'Connection timed out after 30001 milliseconds')

In addition...

I am noticing more failures in the middle of my day (UTC-0500).
Mornings rarely see these timeouts, evenings succeed more than during the workday.

Darleev · 2021-03-16T22:21:39Z

Hello @trel,
I'm trying to reproduce the issue locally and on the self-hosted agents, but facing an issue:

E: The repository 'https://unstable.irods.org/apt focal Release' does not have a Release file.
N: Updating from such a repository can't be done securely and is therefore disabled by default.

There are no changes from my side, I just follow instruction for Ubuntu from the official site: https://unstable.irods.org . Could you please check it? I need to reproduce the issue one more time, to check possible network problems from our side.
We look forward to hearing from you.

trel · 2021-03-16T22:25:28Z

Hi,

We don't have an Ubuntu20 release out yet - please try a bionic (ubuntu:18.04) VM/container.

Darleev · 2021-03-16T22:27:15Z

@trel Thank you, I assumed this, but Ubuntu20 was filled in the initial request, that's why I'm asking this.
I'll keep you posted.

trel · 2021-03-16T22:29:13Z

I will uncheck that box - I had added that check mark when it was 'just a curl call'.

And then forgot to uncheck when I learned it was the the apt/yum calls instead.

Thanks.

Darleev · 2021-03-22T17:43:50Z

@trel I've reproduced the issue several times, but could not find any relations with the region or specific machine configuration. We've created an internal issue to the Azure network engineering team for further investigation.
I'll keep you updated.

trel · 2021-03-22T17:49:29Z

Excellent - thank you.

lbruun · 2021-03-23T07:35:04Z

I've seen something similar, i.e. connection problems when connecting outbound to public internet from within GitHub Actions. The problem seems to have escalated over the past month(s). Like @trel my suspicion is the same. There seems to be a "first touch penalty" for creating outbound connections (perhaps the penalty is paid on a per-destination basis, dunno).

Therefore, one advice is to check your connect timeout. Our case were simple http downloads. We were using 5 seconds connect timeout. After increasing to 30 seconds the problem went away .. or at least could no longer be reproduced. However, wrt yum, the default is already 30 seconds as far as I can tell. But thought I would share findings anyways namely that it feels as if the runner needs some kind of warmup, network wise, before outbound connections are stable.

lukepighetti · 2021-03-25T15:35:14Z

Not sure if this is related but just today we started getting "This check failed" errors on our two linux checks without any guidance as to what the issue might be. No logs or anything is running.

OmgImAlexis · 2021-03-26T03:31:14Z

Not sure if this is related but just today we started getting "This check failed" errors on our two linux checks without any guidance as to what the issue might be. No logs or anything is running.

Same here, this started around 12 hours ago. I'm now randomly getting the following and our environment tag is gone in the repo settings.

Unable to fetch the information for the environment 'staging' targeted by this job.

maxim-lobanov · 2021-03-26T05:25:37Z

@lukepighetti @OmgImAlexis , could you please log the separate issue for this problem since it is not related to the initial issue. For investigation, we need links to the pipelines (links will be useful even if repo is private)

lukepighetti · 2021-03-26T12:26:43Z

Ours was a billing issue, but the big red X didn't inform us of this. There are logs available if you click on the Actions tab which are not available if you view the action status from the PR. I'm considering my particular issue resolved, but I do think my feedback should be considered. Apologies for the noise in this PR.

trel · 2021-04-20T14:29:09Z

We've seen increased success lately. Not sure that's actionable here, but seeing fewer timeout failures. Not yet zero, though.

Darleev · 2021-08-10T04:27:29Z

Hello @trel,
We are going to close the issue. In case of any questions feel free to contact us.

trel · 2021-08-10T10:32:55Z

Okay. Thanks for the update - we still see this timeout more than once per week.

trel · 2021-09-20T11:58:36Z

Follow up: It's been more than a month since we upgraded the host itself that was sometimes unreachable. It had been an Ubuntu14 VM and is now CentOS7. We have seen no errors since this upgrade. Current speculation is that the aging/EOL SSL libraries on Ubuntu14 could have been related to the intermittent errors.

trel added the needs triage label Mar 9, 2021

AlenaSviridenko added Area: Packages OS: Ubuntu and removed needs triage labels Mar 10, 2021

AlenaSviridenko assigned Darleev Mar 12, 2021

miketimofeev assigned LeonidLapshin Mar 16, 2021

ghost mentioned this issue May 26, 2021

Fix for curl timeout in GitHub Actions termux/termux-packages#6858

Merged

ghost referenced this issue in termux/termux-packages May 26, 2021

Fix for curl timeout in GitHub Actions

391c5a2

Darleev closed this as completed Aug 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Actions - sometimes... our 'host is unreachable' #2890

GitHub Actions - sometimes... our 'host is unreachable' #2890

trel commented Mar 9, 2021 •

edited

Loading

AlenaSviridenko commented Mar 10, 2021

trel commented Mar 10, 2021

trel commented Mar 15, 2021 •

edited

Loading

trel commented Mar 15, 2021

Darleev commented Mar 15, 2021

trel commented Mar 15, 2021

Darleev commented Mar 15, 2021

trel commented Mar 16, 2021

Darleev commented Mar 16, 2021

trel commented Mar 16, 2021

Darleev commented Mar 16, 2021

trel commented Mar 16, 2021

Darleev commented Mar 22, 2021

trel commented Mar 22, 2021

lbruun commented Mar 23, 2021 •

edited

Loading

lukepighetti commented Mar 25, 2021

OmgImAlexis commented Mar 26, 2021 •

edited

Loading

maxim-lobanov commented Mar 26, 2021

lukepighetti commented Mar 26, 2021 •

edited

Loading

trel commented Apr 20, 2021

Darleev commented Aug 10, 2021

trel commented Aug 10, 2021

trel commented Sep 20, 2021

GitHub Actions - sometimes... our 'host is unreachable' #2890

GitHub Actions - sometimes... our 'host is unreachable' #2890

Comments

trel commented Mar 9, 2021 • edited Loading

AlenaSviridenko commented Mar 10, 2021

trel commented Mar 10, 2021

trel commented Mar 15, 2021 • edited Loading

trel commented Mar 15, 2021

Darleev commented Mar 15, 2021

trel commented Mar 15, 2021

Darleev commented Mar 15, 2021

trel commented Mar 16, 2021

Darleev commented Mar 16, 2021

trel commented Mar 16, 2021

Darleev commented Mar 16, 2021

trel commented Mar 16, 2021

Darleev commented Mar 22, 2021

trel commented Mar 22, 2021

lbruun commented Mar 23, 2021 • edited Loading

lukepighetti commented Mar 25, 2021

OmgImAlexis commented Mar 26, 2021 • edited Loading

maxim-lobanov commented Mar 26, 2021

lukepighetti commented Mar 26, 2021 • edited Loading

trel commented Apr 20, 2021

Darleev commented Aug 10, 2021

trel commented Aug 10, 2021

trel commented Sep 20, 2021

trel commented Mar 9, 2021 •

edited

Loading

trel commented Mar 15, 2021 •

edited

Loading

lbruun commented Mar 23, 2021 •

edited

Loading

OmgImAlexis commented Mar 26, 2021 •

edited

Loading

lukepighetti commented Mar 26, 2021 •

edited

Loading