Sometimes a lot of HTTP status codes 520 when accessing Galaxy API #2429

felixfontein · 2020-07-04T15:57:45Z

Bug Report

SUMMARY

I'm working on the Ansible changelog / porting guide build (ansible-community/antsibull-build#103). Both that build, and the ACD build itself, are querying the Galaxy API for all included collections (~60 of them). It often happens to me that I get a lot of 520 HTTP status codes (seems to be a Cloudflare internal error code):

WARNING: aio_session.get('https://galaxy.ansible.com/api/v2/collections/community/azure/versions/0.1.0/', params={'format': 'json'}) failed with status code 520, retrying...
WARNING: aio_session.get('https://galaxy.ansible.com/api/v2/collections/community/azure/versions/0.1.0/', params={'format': 'json'}) failed with status code 520, retrying...
WARNING: aio_session.get('https://galaxy.ansible.com/api/v2/collections/cisco/nxos/versions/?format=json&page=2', params={'format': 'json'}) failed with status code 520, retrying...
WARNING: aio_session.get('https://galaxy.ansible.com/api/v2/collections/google/cloud/versions/0.10.1/', params={'format': 'json'}) failed with status code 520, retrying...
WARNING: aio_session.get('https://galaxy.ansible.com/api/v2/collections/cisco/ios/versions/?format=json&format=json&format=json&page=4', params={'format': 'json'}) failed with status code 520, retrying...
WARNING: aio_session.get('https://galaxy.ansible.com/api/v2/collections/junipernetworks/junos/versions/?format=json&format=json&format=json&format=json&page=5', params={'format': 'json'}) failed with status code 520, retrying...
WARNING: aio_session.get('https://galaxy.ansible.com/api/v2/collections/cisco/ios/versions/?format=json&format=json&format=json&page=4', params={'format': 'json'}) failed with status code 520, retrying...
WARNING: aio_session.get('https://galaxy.ansible.com/api/v2/collections/vyos/vyos/versions/?format=json&format=json&format=json&format=json&page=5', params={'format': 'json'}) failed with status code 520, retrying...
WARNING: aio_session.get('https://galaxy.ansible.com/api/v2/collections/junipernetworks/junos/versions/?format=json&format=json&format=json&format=json&page=5', params={'format': 'json'}) failed with status code 520, retrying...
WARNING: aio_session.get('https://galaxy.ansible.com/api/v2/collections/community/vmware/versions/?format=json&format=json&page=3', params={'format': 'json'}) failed with status code 520, retrying...
WARNING: aio_session.get('https://galaxy.ansible.com/api/v2/collections/community/vmware/versions/?format=json&format=json&page=3', params={'format': 'json'}) failed with status code 520, retrying...
WARNING: aio_session.get('https://galaxy.ansible.com/api/v2/collections/arista/eos/versions/?format=json&format=json&format=json&format=json&format=json&format=json&page=7', params={'format': 'json'}) failed with status code 520, retrying...
WARNING: aio_session.get('https://galaxy.ansible.com/api/v2/collections/cisco/ios/versions/?format=json&format=json&format=json&page=4', params={'format': 'json'}) failed with status code 520, retrying...
WARNING: aio_session.get('https://galaxy.ansible.com/api/v2/collections/cisco/nxos/versions/?format=json&format=json&format=json&page=4', params={'format': 'json'}) failed with status code 520, retrying...
WARNING: aio_session.get('https://galaxy.ansible.com/api/v2/collections/cisco/iosxr/versions/?format=json&format=json&format=json&format=json&page=5', params={'format': 'json'}) failed with status code 520, retrying...
WARNING: aio_session.get('https://galaxy.ansible.com/api/v2/collections/community/vmware/versions/?format=json&format=json&page=3', params={'format': 'json'}) failed with status code 520, retrying...
WARNING: aio_session.get('https://galaxy.ansible.com/api/v2/collections/cisco/iosxr/versions/?format=json&format=json&format=json&format=json&page=5', params={'format': 'json'}) failed with status code 520, retrying...
WARNING: aio_session.get('https://galaxy.ansible.com/api/v2/collections/cisco/nxos/versions/?format=json&format=json&format=json&format=json&page=5', params={'format': 'json'}) failed with status code 520, retrying...
WARNING: aio_session.get('https://galaxy.ansible.com/api/v2/collections/community/vmware/versions/?format=json&format=json&page=3', params={'format': 'json'}) failed with status code 520, retrying...

After adding code to retry the requests (with some increasing delay), it finally almost always completes (before I had to run it 2-10 times until it completed).

The text was updated successfully, but these errors were encountered:

felixfontein · 2020-08-05T18:45:45Z

I now got this in a web brower as well: it's an error reported by Cloudflare:

Error 520 Ray ID: 5be2aa9b0df0be1e • 2020-08-05 18:43:54 UTC
Web server is returning an unknown error

You
Browser
Working

Milan
Cloudflare
Working

galaxy.ansible.com
Host
Error

dmsimard · 2021-01-26T15:05:42Z

When building a release for Ansible, part of the work is querying the API to retrieve the versions of collections we're interested in and then we download them to include in the release tarball.

The part where we query the API is often failing with error 520's. Despite the tooling providing exception handling and retries, it still ends up giving up.

Can we do something about this ?

ironfroggy · 2021-01-26T19:19:51Z

There are known performance issues with fetching lots of collection data. There may be plans on the radar to flatten the requests needed to make this more performant for sync purposes, but I don't know if that's slated for community galaxy or only automation hub.

dmsimard · 2021-01-26T19:33:08Z

There are known performance issues with fetching lots of collection data. There may be plans on the radar to flatten the requests needed to make this more performant for sync purposes, but I don't know if that's slated for community galaxy or only automation hub.

I haven't personally run into performance problems but I learned that the HTTP 520s returned by cloudflare are likely due to rate limiting which could make sense given we make a number of requests in a short time -- there's already over 80 collections included so it quickly adds up.

Ironically, we end up doing more requests because we re-try on exceptions which further exacerbates the issue.

Edit: my personal experience in regards to performance might not be representative, I'm told it could be much faster :)

felixfontein · 2021-03-12T07:34:07Z

I currently get these all the times in community.general's CI (Azure Pipelines). For example for this backport: ansible-collections/community.general#2002 I had to restart failing CI jobs multiple times before finally everything passed.

felixfontein · 2021-03-12T08:37:13Z

To give some numbers: in the first run of ansible-collections/community.general#2004, 75 CI jobs failed because of this (77 succeeded). When rerunning them, 18 failed again. Only on the second rerun all passed.

priteau · 2021-03-18T11:39:40Z

We regularly see failed CI jobs for Kayobe (which is part of Kolla in OpenStack) due to this error:

<role> was NOT installed successfully: None (HTTP Code: 520, Message: Origin Error)

Anecdotally, it seems to have become worse in the past few weeks.

felixfontein · 2021-03-22T06:38:15Z

It got a lot worse ~2 weeks ago, and basically stayed that bad until now. In community.general, I still have to restart almost most stable-1 CI runs (but not only them, though later versions installed a lot less from galaxy) at least once, and usually at least twice.

I'm currently thinking of replacing installs from galaxy with clones of the corresponding git repos. Galaxy is getting pretty unusable :-(

markgoddard · 2021-03-26T17:15:20Z

This is getting quite painful for our CI environment.

…due to ansible/galaxy#2429.

…due to ansible/galaxy#2429. (#211)

…due to ansible/galaxy#2429. (#11)

…due to ansible/galaxy#2429. (#24)

…due to ansible/galaxy#2429. (#113)

ssbarnea · 2021-03-27T15:09:26Z

New occurence at https://github.com/ansible-community/ansible-lint/pull/1497/checks?check_run_id=2208777133

We still see flakiness when downloading content from Ansible Galaxy, often HTTP 520. This change increases the retries from 3 to 10, and adds a 5 second delay between attempts. Change-Id: I0c46e5fcc6979027dc6f1bc5cc49e923a205f654 Related: ansible/galaxy#2429

* Update kayobe from branch 'master' to 557f4f1ad3f275a0623b9663c3cc5557ef3559ea - Merge "CI: increase Ansible Galaxy retries & add delay" - CI: increase Ansible Galaxy retries & add delay We still see flakiness when downloading content from Ansible Galaxy, often HTTP 520. This change increases the retries from 3 to 10, and adds a 5 second delay between attempts. Change-Id: I0c46e5fcc6979027dc6f1bc5cc49e923a205f654 Related: ansible/galaxy#2429

We still see flakiness when downloading content from Ansible Galaxy, often HTTP 520. This change increases the retries from 3 to 10, and adds a 5 second delay between attempts. Change-Id: I0c46e5fcc6979027dc6f1bc5cc49e923a205f654 Related: ansible/galaxy#2429 (cherry picked from commit df00ba2)

daviddavis · 2021-04-08T13:18:42Z

We're hitting 520s in our CI as well while trying to install the amazon.aws collection. The ansible-galaxy CLI is performing quite a number of requests to galaxy to find the collection to install:

$ ansible-galaxy -vvvv collection install amazon.aws
[DEPRECATION WARNING]: Setting verbosity before the arg sub command is deprecated, set the verbosity after the sub command. This feature will be removed from ansible-base in version 2.13. 
Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
ansible-galaxy 2.10.7
  config file = None
  configured module search path = ['/home/daviddavis/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/daviddavis/.local/lib/python3.9/site-packages/ansible
  executable location = /home/daviddavis/.local/bin/ansible-galaxy
  python version = 3.9.2 (default, Feb 20 2021, 00:00:00) [GCC 10.2.1 20201125 (Red Hat 10.2.1-9)]
No config file found; using defaults
Starting galaxy collection install process
Found installed collection amazon.aws:1.4.1 at '/home/daviddavis/.ansible/collections/ansible_collections/amazon/aws'
Process install dependency map
Initial connection to galaxy_server: https://galaxy.ansible.com
Opened /home/daviddavis/.ansible/galaxy_token
Calling Galaxy at https://galaxy.ansible.com/api/
Processing requirement collection 'amazon.aws'
Collection requirement 'amazon.aws' is the name of a collection
Found API version 'v1, v2' with Galaxy server default (https://galaxy.ansible.com/api/)
Calling Galaxy at https://galaxy.ansible.com/api/v2/collections/amazon/aws/versions/
Calling Galaxy at https://galaxy.ansible.com/api/v2/collections/amazon/aws/versions/?page=2
Calling Galaxy at https://galaxy.ansible.com/api/v2/collections/amazon/aws/versions/?page=3
Calling Galaxy at https://galaxy.ansible.com/api/v2/collections/amazon/aws/versions/?page=4
Calling Galaxy at https://galaxy.ansible.com/api/v2/collections/amazon/aws/versions/?page=5
Calling Galaxy at https://galaxy.ansible.com/api/v2/collections/amazon/aws/versions/?page=6
Calling Galaxy at https://galaxy.ansible.com/api/v2/collections/amazon/aws/versions/?page=7
Calling Galaxy at https://galaxy.ansible.com/api/v2/collections/amazon/aws/versions/?page=8
Calling Galaxy at https://galaxy.ansible.com/api/v2/collections/amazon/aws/versions/?page=9
Calling Galaxy at https://galaxy.ansible.com/api/v2/collections/amazon/aws/versions/?page=10
Calling Galaxy at https://galaxy.ansible.com/api/v2/collections/amazon/aws/versions/?page=11
Calling Galaxy at https://galaxy.ansible.com/api/v2/collections/amazon/aws/versions/?page=12
Calling Galaxy at https://galaxy.ansible.com/api/v2/collections/amazon/aws/versions/?page=13
Calling Galaxy at https://galaxy.ansible.com/api/v2/collections/amazon/aws/versions/?page=14
Calling Galaxy at https://galaxy.ansible.com/api/v2/collections/amazon/aws/versions/?page=15
Collection 'amazon.aws' obtained from server default https://galaxy.ansible.com/api/
Starting collection install process

ssbarnea · 2021-04-08T13:43:29Z

Sadly galaxy install CLI does not have retry mechanism included in it, which I see as a bug (not missing feature...). Just yesterday I had to implement retry mechanism in ansible-lint specially as it was randomly failing to install collections.

Network operations can fail and will fail, we better have an option in galaxy CLI to retry at least twice. This will likely avoid most glitches.

felixfontein · 2021-04-08T13:50:55Z

Hmm, I was assuming that ansible-galaxy collection install would use a larger page size. Or is that only implemented in stable-2.11 / devel? But anyway, having retries and a more efficient API would really help a lot...

felixfontein · 2021-04-08T14:15:57Z

Hmm, apparently I'm mistaken, it does not seem to set page_size for collection version enumeration, it only does that for some role-related things.

newswangerd · 2021-04-08T15:21:37Z

We've doubled the rate limit from 10 requests per second to 20 as a temporary fix and there's an issue for ansible-galaxy to correctly handle situations where it gets rate limited: ansible/ansible#74191

felixfontein mentioned this issue Jan 30, 2021

Increase galaxy page size to reduce number of requests ansible-community/antsibull-build#245

Merged

This was referenced Mar 13, 2021

Make docker_swarm_service option publish.published_port optional ansible-collections/community.docker#101

Merged

Add support for Redfish session create, delete, and authenticate ansible-collections/community.general#2027

Merged

felixfontein mentioned this issue Mar 22, 2021

Install collections in CI directly with git ansible-collections/community.general#2082

Merged

patchback bot mentioned this issue Mar 23, 2021

[PR #2082/7fe9dd7a backport][stable-2] Install collections in CI directly with git ansible-collections/community.general#2086

Merged

felixfontein added a commit to felixfontein/community.routeros that referenced this issue Mar 26, 2021

Stop using ansible-galaxy collection install to install a collection …

1445a4e

…due to ansible/galaxy#2429.

felixfontein added a commit to felixfontein/community.hrobot that referenced this issue Mar 26, 2021

Stop using ansible-galaxy collection install to install a collection …

a3b6e7e

…due to ansible/galaxy#2429.

felixfontein added a commit to felixfontein/community.crypto that referenced this issue Mar 26, 2021

Stop using ansible-galaxy collection install to install a collection …

c1c8a2c

…due to ansible/galaxy#2429.

felixfontein added a commit to felixfontein/community.docker that referenced this issue Mar 26, 2021

Stop using ansible-galaxy collection install to install a collection …

f711e75

…due to ansible/galaxy#2429.

felixfontein added a commit to felixfontein/community.docker that referenced this issue Mar 26, 2021

Stop using ansible-galaxy collection install to install a collection …

5f5932c

…due to ansible/galaxy#2429.

felixfontein added a commit to ansible-collections/community.crypto that referenced this issue Mar 27, 2021

Stop using ansible-galaxy collection install to install a collection …

befa690

…due to ansible/galaxy#2429. (#211)

felixfontein added a commit to ansible-collections/community.hrobot that referenced this issue Mar 27, 2021

Stop using ansible-galaxy collection install to install a collection …

8fab6f5

…due to ansible/galaxy#2429. (#11)

felixfontein added a commit to ansible-collections/community.routeros that referenced this issue Mar 27, 2021

Stop using ansible-galaxy collection install to install a collection …

7bab58e

…due to ansible/galaxy#2429. (#24)

felixfontein added a commit to ansible-collections/community.docker that referenced this issue Mar 27, 2021

Stop using ansible-galaxy collection install to install a collection …

bc096a9

…due to ansible/galaxy#2429. (#113)

This was referenced Apr 1, 2021

Fix CI justin-p/ansible-role-gophish#13

Merged

Fix CI justin-p/ansible-role-evilginx#2

Merged

daviddavis mentioned this issue Apr 8, 2021

Retry installing the amazon.aws collection pulp/plugin_template#365

Merged

newswangerd mentioned this issue Apr 8, 2021

ansible-galaxy doesn't handle rate limiting correctly ansible/ansible#74191

Closed

cutwater closed this as completed Jun 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sometimes a lot of HTTP status codes 520 when accessing Galaxy API #2429

Sometimes a lot of HTTP status codes 520 when accessing Galaxy API #2429

felixfontein commented Jul 4, 2020

felixfontein commented Aug 5, 2020

dmsimard commented Jan 26, 2021

ironfroggy commented Jan 26, 2021

dmsimard commented Jan 26, 2021 •

edited

Loading

felixfontein commented Mar 12, 2021

felixfontein commented Mar 12, 2021

priteau commented Mar 18, 2021

felixfontein commented Mar 22, 2021

markgoddard commented Mar 26, 2021

ssbarnea commented Mar 27, 2021

daviddavis commented Apr 8, 2021

ssbarnea commented Apr 8, 2021

felixfontein commented Apr 8, 2021

felixfontein commented Apr 8, 2021

newswangerd commented Apr 8, 2021

Sometimes a lot of HTTP status codes 520 when accessing Galaxy API #2429

Sometimes a lot of HTTP status codes 520 when accessing Galaxy API #2429

Comments

felixfontein commented Jul 4, 2020

Bug Report

SUMMARY

felixfontein commented Aug 5, 2020

dmsimard commented Jan 26, 2021

ironfroggy commented Jan 26, 2021

dmsimard commented Jan 26, 2021 • edited Loading

felixfontein commented Mar 12, 2021

felixfontein commented Mar 12, 2021

priteau commented Mar 18, 2021

felixfontein commented Mar 22, 2021

markgoddard commented Mar 26, 2021

ssbarnea commented Mar 27, 2021

daviddavis commented Apr 8, 2021

ssbarnea commented Apr 8, 2021

felixfontein commented Apr 8, 2021

felixfontein commented Apr 8, 2021

newswangerd commented Apr 8, 2021

dmsimard commented Jan 26, 2021 •

edited

Loading