Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Re) Introduce an artifact caching proxy for ci.jenkins.io #2752

Closed
4 tasks done
dduportal opened this issue Jan 20, 2022 · 39 comments · Fixed by jenkins-infra/pipeline-library#577
Closed
4 tasks done

Comments

@dduportal
Copy link
Contributor

dduportal commented Jan 20, 2022

Service

ci.jenkins.io

Summary

As part of #2733 , the subject of hosting a caching proxy for ci.jenkins.io builds (at least: maybe for trusted.ci, release.ci and infra.ci also) as been re-triggered in https://groups.google.com/g/jenkins-infra/c/laSsgPOH9qs.

This issue tracks the work related to deploying this service.

Why

  • Protect the CI builds of the Jenkins contributors from external JFrog repository (https://repo.jenkins-ci.org) slowness or outage
    • Note that it would be a partial protection in case of outage: items not cached would not be available at all
  • Decrease the outbound bandwitdh of JFrog's repository: caching items would have an impact, as part of the "fairness" of any opensource sponsorship like this one

What

We want each build, run by ci.jenkins.io (and eventually trusted.ci and release.ci), which involves maven (and eventually gradle), to use our caching proxy service instead of directly hitting repo.jenkins-ci.org.

As per https://maven.apache.org/settings.html#mirrors, we should be able to use the User-level settings.xml for Maven.

There are different methods to provide this settings.xml to the build:

The main challenge is to provide multiple caching proxies, on each cloud region that we use.
Rationale is that if we only have a single proxy, then we'll have to pay for the cross-cloud and/or cross-region bandwitdh , which we do not want. We could either:

  • Using agent template labels to specify which cloud provider and which region it is running on. Not sure if the config-file-provider could detect the agent labels. Or could the pipeline library code retireve the current's node labels?
  • We could host a mirror system, like get.jenkins.io, that would redirect request to the proxy which is the closest or fallback to repo.jenkins-ci.org otherwise

Definition of Done

  • Refresh the jenkins-infra/docker-repo-proxy to have a versionned and up-to-date Docker image for the proxy caching
  • Add an helm chart in jenkins-infra/helm-charts for a repo-proxy installation as helm chart
  • Add a service in jenkins-infra/kubernetes-management to host the service
  • Communicate about the new service and list what need to be done around Maven configuration on ci.jenkins.io (ping @timja @jglick @MarkEWaite if you can help by refreshing our memory on what are the "ways" to use such a proxy caching in maven builds for ci.j: controller config file, agent config, network config, pipeline library update, all of the above, other ?)

How

See associated PRs when they'll come.

@dduportal dduportal added the triage Incoming issues that need review label Jan 20, 2022
@dduportal dduportal self-assigned this Jan 20, 2022
@dduportal dduportal added ci.jenkins.io site:ci.jenkins.io and removed triage Incoming issues that need review labels Jan 20, 2022
@jglick
Copy link

jglick commented Jan 20, 2022

First of all read #938 (reverted by #2047); I am not sure offhand which infra repo had the actual proxy configuration that you could use as a starting point. You would need to do a bit of digging. I recall it being nginx configured with a simple LRU cache of 2xx results, i.e., successful retrieval of release or *-SNAPSHOT artifacts or metadata XML files from public URLs. I suppose the K8s equivalent would be a StatefulSet with a cache volume.

what are the "ways" to use such a proxy caching in maven builds

At a first approximation, revert jenkins-infra/pipeline-library#135 + jenkins-infra/pipeline-library#216 + jenkins-infra/pipeline-library#219 (but keeping some positive things from those PRs, such as removal of obsolete JDK 7 support).

@dduportal
Copy link
Contributor Author

Many thanks for the pointers @jglick !

We've started refreshing https://github.com/jenkins-infra/docker-repo-proxy (jenkins-infra/docker-repo-proxy#5) which has the behavior you describe so it means we are in the correct directions! (I'm currently trying this with a local build of a plugin before trying to deploy to production).

Sounds like with the informations you gave, we have enough to have a first version soon.

@jglick
Copy link

jglick commented Jan 20, 2022

Oh https://github.com/jenkins-infra/docker-repo-proxy, I see.

If you get the service running, I can help draft a pipeline-library PR to use it. Just specify the URL. (Or would we have two URLs, one public via ingress and one cluster-internal for efficiency?) Not sure how we test such PRs prior to use; I guess you can override the version in a @Library annotation in some draft plugin PR.

@timja
Copy link
Member

timja commented Jan 20, 2022

yeah you can access it via @Library('pipeline-library@refs/pull/number') or just push an origin branch

I was wondering if we would have a mirror per cloud? and then determine which cloud we were running on? to minimise bandwidth use but I guess that can be added on top

@dduportal
Copy link
Contributor Author

dduportal commented Jan 28, 2022

Putting in pause (not enough bandwidth for the team for now) + Jforg works again as expected.

@dduportal dduportal removed their assignment Jan 28, 2022
@jglick
Copy link

jglick commented Mar 7, 2022

Slow again today AFAICT.

@lemeurherve

This comment was marked as resolved.

@jglick
Copy link

jglick commented Mar 23, 2022

#2849

@MarkEWaite
Copy link

All the successful plugin bill of materials jobs run over the weekend were run with the artifact caching proxy disabled. When the artifact caching proxy is enabled for plugin bill of materials jobs, there is a high overall failure rate of the job. The failure often does not become visible until 90 minutes or more into the job.

Some examples are visible at:

@basil
Copy link
Collaborator

basil commented Mar 27, 2023

In particular, search for repo.do.jenkins.io from the bottom of each log upwards. You'll see a bunch of I/O errors, socket read timeouts, "Premature end of Content-Length delimited message body" errors, etc.

@jglick
Copy link

jglick commented Mar 27, 2023

MNG-714 would be helpful. I was hoping to use this trick but it did not seem to work. Created

<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
    <mirrors>
        <mirror>
            <id>proxy</id>
            <url>https://repo.do.jenkins.io/public/</url>
            <mirrorOf>*,!repo.jenkins-ci.org</mirrorOf>
        </mirror>
    </mirrors>
    <profiles>
        <profile>
            <id>fallback</id>
            <activation>
                <activeByDefault>true</activeByDefault>
            </activation>
            <repositories>
                <repository>
                    <id>repo.jenkins-ci.org</id>
                    <url>https://repo.jenkins-ci.org/public/</url>
                </repository>
            </repositories>
            <pluginRepositories>
                <pluginRepository>
                    <id>repo.jenkins-ci.org</id>
                    <url>https://repo.jenkins-ci.org/public/</url>
                </pluginRepository>
            </pluginRepositories>
        </profile>
    </profiles>
</settings>

where the mirror is expected to fail (since I am providing no authentication) and ran with

docker run --rm -ti --entrypoint bash -v /tmp/settings.xml:/usr/share/maven/conf/settings.xml maven:3-eclipse-temurin-17 -c 'git clone --depth 1 https://github.com/jenkinsci/build-token-root-plugin /src && cd /src && mvn -Pquick-build install'

but it fails immediately and does not fall back. additional-identities-plugin which does not use an extension from Central builds OK but does not use the proxy.

@lemeurherve
Copy link
Member

lemeurherve commented Mar 30, 2023

After clearing the cache of the DigitalOcean provider, a BOM build exclusively on DigitalOcean finished with success: https://ci.jenkins.io/job/Tools/job/bom/job/master/1564/

The fact the BOM builds failed only on DO with "Premature end of Content-Length delimited message body" each time, and passed after clearing the cache on this provider make me think the error came from corrupted cache data.

I'll check to either find a way to clear the cache for a specific artifact, or either reduce the cache retention currently set to one month.

@lemeurherve
Copy link
Member

@MarkEWaite @basil could you try your next BOM builds without the skip-artifact-caching-proxy label please?

@jglick
Copy link

jglick commented Mar 30, 2023

Can try jenkinsci/bom#1916

@jglick
Copy link

jglick commented Mar 30, 2023

or jenkinsci/bom#1907

@jglick
Copy link

jglick commented Mar 30, 2023

FYI https://issues.apache.org/jira/browse/MNG-7708 (probably not relevant if the cache errors were persistent).

@dduportal
Copy link
Contributor Author

Closing as the "unreliable" behavior (which is BOM-only) is tracked in #3481

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment