New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[INFRA-1633] Stop building PR merges #1355

Closed

jenkins-infra-bot opened this issue May 18, 2018 · 46 comments

Labels

ci imported-jira-issue jira-component:ci jira-type:ci

Milestone

infra-team-sync-2...

jenkins-infra-bot commented May 18, 2018

I complained about this on Apr 26 but was told by R. Tyler Croy and Daniel Beck it was not really an issue. It is an issue. My plugin PR build has now been sitting for 47m waiting for an executor, the build queue has 111 items (https://ci.jenkins.io/load-statistics?type=min says it was up to >210 at one point), and all of our executors are rebuilding core PRs, one of which I see has not been touched since 2012. This is totally unreasonable. We really need to apply this setting at least to core builds, if not everywhere.

Originally reported by jglick, imported from: Stop building PR merges

assignee: dduportal
status: In Progress
priority: Critical
resolution: Unresolved
imported: 2022/01/10

Author

jenkins-infra-bot commented May 18, 2018

I know this doesn't solve the root issue, but why is there a PR still open from 2012? Can't those just be closed?

Author

jenkins-infra-bot commented May 18, 2018

In that case, probably. It gets fuzzier though as you move into PRs which might be OK but just have not been properly reviewed yet, or are partly implemented but need some fixes.

Author

jenkins-infra-bot commented May 31, 2018

R. Tyler Croy were you not just recently complaining about the cost of running ci.jenkins.io and the need to minimize unnecessary CPU-hours? This seems like the obvious way to get a huge reduction in cost with very little effort. The downside is fairly minimal I think: the risk that a PR gets merged whose head rev passes but which has a semantic conflict with something done in master since the last PR change. (GitHub will of course display syntactic conflicts automatically.) Such conflicts are probably rare enough, but if there is any suspicion that there might be one in a particular PR, we can always just merge master into the PR branch to force a fresh build, or close & reopen the PR, or (if authorized) Build Now directly on the job, or at worst just revert the merge after the fact. Daniel Beck care to make the case why we should not do this?

Author

jenkins-infra-bot commented May 31, 2018

AFAIUI, we have effectively infinite cloud money. This is a measure that can be applied once that is no longer the case.

Author

jenkins-infra-bot commented May 31, 2018

As noted in ~~JENKINS-37491~~, protected branches can also be used to only allow PRs to be merged when that would be a fast-forward (and the CI build passed), which would provide additional safety at the expense of introducing a delay before a merge can be done by anyone except an admin.

Author

jenkins-infra-bot commented May 31, 2018

we have effectively infinite cloud money

That does not jibe with that R. Tyler Croy was telling me recently. And if we do have an unlimited expense account, why do we have such a limited number of executors available that builds are frequently waiting for a long queue to clear (today it again jumped to 175)?

Author

jenkins-infra-bot commented May 31, 2018

There is no such thing as "unlimited"

We do have a finite amount of budget, but that's not the only problem. Allocating too many agents at once hits quotas, and other limits in Azure itself.

The solution I would prefer is to execute our builds faster, which is partially about the app itself, and partially about agent provisioning. That solves the problem without expecting Azure to successfully deploy hundreds of VMs in short order (which is never going to be 100% successful on any cloud)

Author

jenkins-infra-bot commented Jun 1, 2018

Executing builds faster means either dropping test coverage (like ATH), which is not attractive; or doing very labor-intensive changes to speed up individual test suites or make Jenkins startup faster, which is surely possible, but requires significant engineering budget. I am talking about a fifteen-minute infrastructure change which could dramatically reduce peak load, worst-case queue wait time, and total cloud usage.

Author

jenkins-infra-bot commented Oct 5, 2018

The problem applies to PR builds of some plugins, not just core. Today tests of jenkinsci/workflow-multibranch-plugin#82 are blocked because all Windows executors are sucked up by rebuilding dozens of open PRs in git-plugin, going back to 2012.

Author

jenkins-infra-bot commented Nov 14, 2018

Today I am unable to iterate on jenkinsci/plain-credentials-plugin#12 mostly because of core PR rebuilds—ironically, due to my merging my own core PR!

Changes detected: PR-3438 (5cfdcfc6de353a881b27f3247bde584ba37901e7+e3af0f3bb1d2f80a672e85e31d2dd63157033c16 → 5cfdcfc6de353a881b27f3247bde584ba37901e7+0331d47d60da10df85d28d4fa66535e0d145dd0c)

Apparently I should wait until I am about to go to bed before merging anything to core.

Author

jenkins-infra-bot commented Nov 14, 2018

Wondering if using priority-sorter plugin in some way couldn't help here? Ideally, we'd the core PR rebuilds with lower priority than "other things" like plugins' PR builds.

Probably, though, we'd need a custom sorting strategy , given normal core PR should still be built with normal priority. And just that bunch of automated rebuilds should be lower prio.
Or the easy way is to always make *-plugin more prioritary and we're done I think. I've used this plugin in the past (it was slightly more obvious, since we basically had string parameters on some jobs to override the default priority. In that case here, this value should be dynamic).

Author

jenkins-infra-bot commented Nov 14, 2018

Rather than introducing more plugins to ci.jenkins.io to work around organizational dysfunction, I strongly encourage closing many of the 100-ish old or stale pull requests

It is my opinion that the pain we're feeling here should be addressed at the source, not in infrastructure.

Author

jenkins-infra-bot commented Nov 15, 2018

I would not object to closing old or stale PRs, but this is a contentious long-term endeavor, and would still leave a fair number that are moderately recent yet are still getting gratuitously rebuilt on a frequent basis and at considerable expense. Whereas there is a simple configuration change we could apply today which would immediately and drastically reduce the peak load on the server.

Author

jenkins-infra-bot commented Dec 11, 2018

Mark Waite seems to be using a workaround for one plugin.

Author

jenkins-infra-bot commented Dec 12, 2018

An alternate idea: switch the CI configuration at least for jenkinsci/jenkins to build branch head (not merging with the base branch), and then institute branch protection and required strict status checks, which will ensure that a PR is not mergeable until it has had a stable build on a commit which would be a fast-forward merge. Builds will only be performed if and when developers push new commits to the PR, including manually merging/rebasing or using the GitHub UI to bring the PR up to date with the base branch. We would need to be diligent about making sure master builds are clean, which has been a problem in the past couple of weeks or so (I am helping with some problems).

We would need to make sure at least Kohsuke Kawaguchi is marked as an administrator, since mvn release:perform is going to push directly to the master branch without any commit status.

Whether such a mode is suitable for plugin repositories is another question, but the top priority for now is clearly core with its many PRs and multi-hour 4× parallel builds.

Daniel Beck Oleg Nenashev etc. WDYT?

Author

jenkins-infra-bot commented Dec 12, 2018

I have barely any time for core maintenance as is, and this looks like a lot of overhead, including waiting a minimum of 4-5 hours between individual PR merges unless I go to the command line and merge there. I expect that this will make merges grind to a halt entirely.

Author

jenkins-infra-bot commented Jan 11, 2019

Ack.

BTW in serverless JX mode, this issue is handled by batch-testing PRs prior to merge. Comes with its own set of issues of course.

Author

jenkins-infra-bot commented Jan 14, 2019

Some repositories get a button in PRs that maintainers can click to merge the base branch into a PR, thus forcing a retest. (jenkinsci/jenkins at least does not seem to have this; I am not sure what you have to configure to get it. Anyway you can do the same from the command-line if necessary.) That might make for a reasonable middle ground when using the Ignore rebuilding merge branches when only the target branch changed option: a maintainer intending to merge one or more approved PRs can request retests against recent trunk changes only if they seem warranted.

Author

jenkins-infra-bot commented Jan 15, 2019

I am also in favor of this approach
In order to do that, branch protection must enabled on all repositories, and ensure that the branch are up to date with master, and status check are passing
More information here

Author

jenkins-infra-bot commented Jan 15, 2019

Olivier Vernin sorry, which approach? What are you describing resembles the alternative I floated on 2018-12-12 which Daniel Beck rejected, so I still advocate this mode.

Author

jenkins-infra-bot commented Mar 23, 2019

This issue doesn't help when you are then met by the intermittent connection issues which will fail the pipeline and now your back where you started in the beloved build queue!

Author

jenkins-infra-bot commented Mar 23, 2019

Or the fact that your waiting GitHub status being turned green but stuck on deploy waiting for Linux executors

Author

jenkins-infra-bot commented Mar 23, 2019

My suggestion, fix the build storm ASAP!
It has been an ongoing issue, and it is not very productive that we have to sit around waiting for hours on computing power.
We can always revert any infrastructure change that might be a temporary fix until we have a more permanent solution.

But also consider fixing the stale PRs, I suggest using the stale feature which is highly configurable https://github.com/probot/stale

Just because a PR is closed does not mean you cannot reopen it.

Author

jenkins-infra-bot commented Mar 29, 2019

Build storm at it again, merely moving to Travis while this is ongoing!

! !

Author

jenkins-infra-bot commented Mar 29, 2019

I understand the frustration. The easiest way to get past the build queues right now is to help experiment with the newer agents: http://lists.jenkins-ci.org/pipermail/jenkins-infra/2019-March/001641.html

Author

jenkins-infra-bot commented Mar 29, 2019

More than happy to oblige! Thanks for posting R. Tyler Croy and thanks for your hard work!

Author

jenkins-infra-bot commented Mar 29, 2019

The build from ACI is slower than what the current Linux agents provide when they are running stable.

From the maven docker file, I can see we have lost the ability to run docker test. We use Docker to run testcontainers.org's vault container to run integration tests.

Lucky the ACI is faster at connecting! so 🤷

Old Linux agent: Total time: 03:58 min

New ACI agent: Total time: 06:29 min

https://ci.jenkins.io/blue/organizations/jenkins/Plugins%2Fconfiguration-as-code-plugin/detail/jenkinsCIPlus/1/pipeline/4

Author

jenkins-infra-bot commented Apr 1, 2019

That does sound like a promising direction, though to be an actual replacement for the current cloud provisioner it would need to support Docker (as Joseph Petersen (old) mentioned) and Windows. In the meantime, disabling automatic builds on base branch commit (or simply switching to building PR heads rather than merge) would take all of five minutes and drastically improve responsiveness for everyone.

Author

jenkins-infra-bot commented Jun 7, 2019

Was just looking at a plugin PR build which was stalled, and saw that the server was loaded down rebuilding acceptance-test-harness, including for example a two-year-old PR. Again, most easily solved by a server-wide policy switch to not rebuild all PRs after every base branch update.

Author

jenkins-infra-bot commented Jun 12, 2019

Looked at the queue today and it had 308 items.

Author

jenkins-infra-bot commented Jun 13, 2019

W.r.t. the idea of only allowing PRs to be merged (by non-admins) when up to date with the base branch:

waiting a minimum of 4-5 hours between individual PR merges unless I go to the command line and merge there

Or just manually simulate Tide in cases where there is a batch of PRs waiting: file a new PR that just merges all the proposed content PRs, and if its build passes, merge that.

Author

jenkins-infra-bot commented Jun 13, 2019

We have enough difficulty with consistently getting reviews, and maintainers to click Merge, that adding more difficulty on top seems counterproductive.

Author

jenkins-infra-bot commented Aug 14, 2019

oleg_nenashev:

I would like to start from JENKINS-58939 in order to reduce number of rebuilds when they are not desired

Author

jenkins-infra-bot commented Jul 28, 2020

Daniel Beck you point out that maintainer overhead is an obstacle to using branch protection; would you consider something like Kodiak to be a solution?

Author

jenkins-infra-bot commented Jul 28, 2020

Don't know. Since I posted my comment a lot of things changed in core maintenance, I would not currently consider my preferences blocking.

Author

jenkins-infra-bot commented Jul 31, 2020

Changing title to emphasize that core (jenkins) is not the only problem; as of this writing there are a bunch of dead ACI executors and all the working executors are tied up for the foreseeable future building acceptance-test-harness.

Author

jenkins-infra-bot commented Oct 26, 2020

Mark Waite confirms that acceptance-test-harness is a problem in this regard.

Author

jenkins-infra-bot commented Nov 4, 2020

this is probably worse for ATH than others - as the ATH job cancels previous builds for the current job when a new job starts.

this causes a almost complete ATH run to be aborted which is a complete waste of resources for PRs especially if a PR itself has not been changed.

Author

jenkins-infra-bot commented Sep 14, 2021

Another case in point is that this often results in PRs not having their incremental version deployed if something was merged to master and the PR is not at the head (which is the case if say you committed some PR feedback to change a typo).
This has happened on a high proportion of my PRs to core (which is not statistically relevant)

Author

jenkins-infra-bot commented Oct 28, 2021

TBD whether https://github.blog/changelog/2021-10-27-pull-request-merge-queue-limited-beta/ could be made to work for us.

Author

jenkins-infra-bot commented Nov 24, 2021

For information, the setting "Ignore rebuilding merge branches when only the target branch changed" had been applied to the organization scanning folder named "Tools", which include the BOM project, as per https://groups.google.com/g/jenkinsci-dev/c/DGKzc2Y6ZSU.

dduportal added this to the Decrease Spendings on AWS below 8k€ (2021/2022) milestone

Contributor

dduportal commented Apr 5, 2022

Closing as it should not be an issue anymore (or at least no one complained about it in the 3 past months 🧌 ).

Feel free to reopen with details if needed.

dduportal closed this as completed

dduportal removed this from the Decrease Spendings on AWS below 8k€ (2021/2022) milestone

jglick commented Jul 6, 2022

This is still a problem in core. I am surprised to see it closed (jenkinsci/jenkins#6780 (comment)). Please reopen.

jglick commented Jul 6, 2022

https://groups.google.com/g/jenkinsci-dev/c/4fx00lI3DlE/m/djM3cmkOAgAJ

NotMyFault reopened this

Member

NotMyFault commented Jul 6, 2022 •

edited

Loading

Thanks for linking the issue, Jesse. I see, your described way has been applied to the Tools folder and therefore affecting bom only atm.
For the reasons outlined here and on the mailing thread I'd vouch to enable it on the Core folder or at least Core/jenkins.

dduportal added this to the infra-team-sync-2022-07-12 milestone

Member

timja commented Jul 7, 2022

I've switched Tools and Core to pr-head

Let's monitor for a bit and we can easily switch back if it causes a problem

timja closed this as completed

dduportal mentioned this issue

[ci.jenkins.io][Infra-as-code] Define Job Configuration as code with JobDSL #3071

Open

jglick mentioned this issue

Disable PR-merge mode everywhere #3474

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment