Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cirrus: Use google mirror for docker.io #7965

Closed
wants to merge 3 commits into from

Conversation

cevich
Copy link
Member

@cevich cevich commented Oct 8, 2020

In Nov. 2020 Docker inc. will be limiting (in some unknown way) the rate of image pulls. In order for this limit to not randomly break automated testing, or require extensive code changes, we need to introduce a transparent mirror. Fortunately Google provides a transparent mirror of the docker hub, and we're already using GCP for all our test-VMs and containers. Unfortunately, measuring the use/non-use of the mirror is very difficult w/o achieve without test-breaking operational changes. However, based on test-failures discovered in the making of this PR, the mirror does appear to be in-use and transparent to (most) tests.

@cevich cevich marked this pull request as draft October 8, 2020 13:14
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 8, 2020
@cevich cevich force-pushed the use_caching_mirror branch 2 times, most recently from b9d0507 to bc0ff2f Compare October 8, 2020 17:06
@cevich cevich changed the title Cirrus: Use google mirror for docker.io WIP: Cirrus: Use google mirror for docker.io Oct 8, 2020
@cevich
Copy link
Member Author

cevich commented Oct 8, 2020

@vrothberg PTAL, I've succeeded in making podman really, really, really mad 😁

Error: unable to pull docker.io/library/alpine:latest: Error initializing source docker://alpine:latest: error loading registries 
configuration: error loading registries configuration "/var/tmp/go/src/github.com/containers/podman/test/registries.conf": 
mixing sysregistry v1/v2 is not supported

@vrothberg
Copy link
Member

test/registries.conf is in the old v1 format and you append v2 options to it.

[registries.search] registries = ['docker.io', 'quay.io', 'registry.fedoraproject.org']
Should be turned into
unqualified-search-registries = ['docker.io', 'quay.io', 'registry.fedoraproject.org']

But I suggest to configure the mirror there as well. I see beauty in the CI and my local tests to execute the same code.

@cevich
Copy link
Member Author

cevich commented Oct 9, 2020

test/registries.conf is in the old v1 format and you append v2 options to it.

Interesting, that wasn't a line I changed. I wonder if this test/registries.conf was maybe not even used before?

@cevich
Copy link
Member Author

cevich commented Oct 9, 2020

@vrothberg interesting, so I see there are several tests which fail, and the error mentions mirror.gcr.io but otherwise I don't see confirmation that images are being pulled from the mirror. For example:

[+0205s] Caching docker.io/library/alpine:latest at /tmp/alpine-latest.tar...Running: /var/tmp/go/src/github.com/containers/podman/bin/podman --root /tmp/crio --runroot /tmp/crio-run --runtime /usr/bin/crun --conmon /usr/bin/conmon --cni-config-dir /etc/cni/net.d --cgroup-manager systemd --tmpdir /tmp --events-backend file --storage-driver vfs pull docker.io/library/alpine:latest
         Trying to pull docker.io/library/alpine:latest...
         Getting image source signatures
         Copying blob sha256:df20fa9351a15782c64e6dddb2d4a6f50bf6d3688060a34c4014b0d9a752eb4c
         Copying config sha256:a24bb4013296f61e89ba57005a7b3e52274d8edd3ae2077d04395f806b63d83e
         Writing manifest to image destination
         Storing signatures
         a24bb4013296f61e89ba57005a7b3e52274d8edd3ae2077d04395f806b63d83e
         Running: podman [options] save -o /tmp/alpine-latest.tar docker.io/library/alpine:latest

@vrothberg
Copy link
Member

@vrothberg interesting, so I see there are several tests which fail, and the error mentions mirror.gcr.io but otherwise I don't see confirmation that images are being pulled from the mirror. For example:

[+0205s] Caching docker.io/library/alpine:latest at /tmp/alpine-latest.tar...Running: /var/tmp/go/src/github.com/containers/podman/bin/podman --root /tmp/crio --runroot /tmp/crio-run --runtime /usr/bin/crun --conmon /usr/bin/conmon --cni-config-dir /etc/cni/net.d --cgroup-manager systemd --tmpdir /tmp --events-backend file --storage-driver vfs pull docker.io/library/alpine:latest
         Trying to pull docker.io/library/alpine:latest...

That's good news! Using mirrors is transparent to Podman, in the sense that Podman does not know about them. Whatever mirror we end up pulling from, the image will be tagged as the one initially referenced. So it's good that we see mirror.gcr.io in some errors as it indicates it's being used.

Note that the debug logs reveal which mirrors we pull from.

@rhatdan
Copy link
Member

rhatdan commented Oct 12, 2020

/approve

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cevich, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 12, 2020
@cevich
Copy link
Member Author

cevich commented Oct 12, 2020

Note that the debug logs reveal which mirrors we pull from.

Ahh that is what I was hoping to learn - how to observe it. I want to run at least once manually, to verify the mirror is in use. Not because I don't believe you, but because I don't blindly trust my own work...

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 15, 2020
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 15, 2020
@cevich cevich changed the title WIP: Cirrus: Use google mirror for docker.io Cirrus: Use google mirror for docker.io Oct 15, 2020
@cevich cevich marked this pull request as ready for review October 15, 2020 19:22
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 15, 2020
@cevich
Copy link
Member Author

cevich commented Oct 16, 2020

@baude PTAL, I marked the one problematic test as skipped, referencing the issue I opened.

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love the system tests to use test/registries.conf as well. Can we export REGISTRIES_CONF_PATH in the Makefile?

Cc @edsantiago

contrib/cirrus/lib.sh Outdated Show resolved Hide resolved
@cevich
Copy link
Member Author

cevich commented Oct 20, 2020

which is a manifest list that isn't on Docker Hub?

Oh, good idea!

@cevich
Copy link
Member Author

cevich commented Oct 20, 2020

Rebased / force-pushed with suggested changes.

@cevich
Copy link
Member Author

cevich commented Oct 26, 2020

We can create a new work item to complete the migration.

Okay, SGTM, and then I can drop the test "fix" from this PR as you suggested as well...

@cevich
Copy link
Member Author

cevich commented Oct 26, 2020

@vrothberg errr, I just read Note that Docker Hub delayed the upcoming changes to mid '21.. At the end, there's a section that says they are going to enforce pull-limits (in less than a week) 😢

Assuming this is true, we need to keep the exclusive mirroring in place. That implies (for this PR) we either skip the failing tests or fix them inline. Is the digest-pulling problem caused by the mirror not retaining them or something?

Signed-off-by: Chris Evich <cevich@redhat.com>
Signed-off-by: Chris Evich <cevich@redhat.com>
See issue containers#8023

Signed-off-by: Chris Evich <cevich@redhat.com>
@mtrmac
Copy link
Collaborator

mtrmac commented Oct 26, 2020

I've spent a lot of time considering the observation of these transparent operations. It's really (REALLY) bugging me that using a mirror is so transparent. In general things that function transparently, can also (necessarily) fail transparently.

I think that's a very fair point. It would definitely be more user friendly to display not only the reference we're pulling but also from where.

It’s a bit awkward in that even the generic copy.Image code doesn’t know about mirrors (in fact, the Trying to pull message comes from Podman, primarily to expose what short name search does).

What we would ideally want to do is to significantly extend the progress API to actually support a wide range of events, and to allow individual transports to contribute events. That’s a fair bit of work, and API change, and all that. So a “short-term” (clearly forever actually) mirror-specific mini-feature would probably be more appropriate, as long as it is easy enough to maintain.


On c/image level, I’m rather unhappy with adding environment variables that impact behavior of a library in a way non-obvious to callers; a types.SystemContext option to log mirrors in some way (increased Logrus level? just an io.Writer? a chan DockerPullEndpointAttempt?) would be fine with me, and if Podman wanted to set that based on an environment variable, fine.

(Alternatively, just running podman --log-level=debug should be cheap enough as far as the pull process is concerned, certainly compared to the network I/O, although just starting Podman does produce a fair bit of output.)

@edsantiago
Copy link
Member

I'm giving up for today, without actually having got anywhere. Current stumbling block is busybox:glibc, which is pulled by shortname in

It("podman pull from docker with tag", func() {
session := podmanTest.PodmanNoCache([]string{"pull", "busybox:glibc"})
. As best I can tell, that particular image is not cached in mirror.gcr.io, and I can't figure out any way to force-cache it (do we have access to that mirror? If we don't, then IMO this exercise in using it is simply postponing a pain point because anything not under our control cannot be relied upon).

I've tried many variations of [[registry.mirror]] trying to point to quay.io but the problem is that something somewhere adds a /library to docker.io, such that pulling busybox:glibc becomes docker.io/library/busybox:glibc, which if I redirect to quay.io/libpod becomes quay.io/libpod/library/busybox:glibc which fails with:

Mirrors also failed: [quay.io/libpod/library/registry:2.6: Requesting bear token: invalid status code from registry 400 (Bad Request)]):

(presumably because quay.io doesn't support multi-level hierarchies).

All for today. I'll see if the morning brings fresh insights.

@vrothberg
Copy link
Member

I've tried many variations of [[registry.mirror]] trying to point to quay.io but the problem is that something somewhere adds a /library to docker.io, such that pulling busybox:glibc becomes docker.io/library/busybox:glibc, which if I redirect to quay.io/libpod becomes quay.io/libpod/library/busybox:glibc which fails with:

[[registry]]
prefix="docker.io"
location="quay.io/libpod"

[[registry]]
prefix="docker.io/library"
location="quay.io/libpod"

@edsantiago, the upper snippet should handle the described scenario. The longest matching prefix will be selected and hence docker.io/library/busybox:glibc would be rewritten to quay.io/libpod/busybox:glibc.

@cevich
Copy link
Member Author

cevich commented Oct 27, 2020

just running podman --log-level=debug should be cheap enough

@mtrmac we have a TON of tests which both display and consume stderr/stdout from podman. Anything that significantly impacts output stands a good chance of needlessly breaking lots of tests, or at best blowing up the size of our test logs (which already require secondary processing for humans to look at).

Re: Add support in c/image and an option in podman. We could potentially face (or will face) the same docker.io problem with buildah testing. Instead of an env. var., what about simply always including the mirror information in the output? That might break a handful of tests here and there, but having this information always available is useful for humans in some contexts (i.e. not just automated testing).

@cevich
Copy link
Member Author

cevich commented Oct 27, 2020

@edsantiago, the upper snippet should handle the described scenario.

@vrothberg want me to set that here (since this PR touches that file already), then I think all we need is to just manually move some images over to quay, no?

@edsantiago
Copy link
Member

I'm on it, see #8165

@cevich
Copy link
Member Author

cevich commented Oct 27, 2020

@edsantiago OIC whatcha doin', okay I won't touch this PR's commits then. In case you didn't know: As long as the sha's (from rebasing) match up in your PR to this one, merging your's (when it's done) will automatically merge this one.

edsantiago added a commit to edsantiago/libpod that referenced this pull request Oct 28, 2020
Followon to containers#7965 (mirror registry). mirror.gcr.io doesn't
cache all the images we need, and I can't find a way to
add to its cache, so let's just use quay.io for those
images that it can't serve.

Tools used:
  skopeo copy --all docker://docker.io/library/alpine:3.10.2 \
                    docker://quay.io/libpod/alpine:3.10.2

...and also:

    docker.io/library/alpine:3.2
    docker.io/library/busybox:latest
    docker.io/library/busybox:glibc
    docker.io/library/busybox:1.30.1
    docker.io/library/redis:alpine
    docker.io/libpod/alpine-with-bogus-seccomp:label
    docker.io/libpod/alpine-with-seccomp:label
    docker.io/libpod/alpine_healthcheck:latest
    docker.io/libpod/badhealthcheck:latest

Since most of those were new quay.io/libpod images, they required
going in through the quay.io GUI, image, settings, Make Public.

Signed-off-by: Ed Santiago <santiago@redhat.com>
@edsantiago edsantiago mentioned this pull request Oct 28, 2020
@rhatdan
Copy link
Member

rhatdan commented Oct 29, 2020

Since #8165 merged, I am closing this PR.

@rhatdan rhatdan closed this Oct 29, 2020
@cevich
Copy link
Member Author

cevich commented Oct 29, 2020

Thanks Dan. Curious...I thought it would have automatically closed this one since the commit SHAs were the same. No matter.

@cevich cevich deleted the use_caching_mirror branch June 30, 2021 18:00
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants