-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[INFRA-1633] Stop building PR merges #1355
Comments
I know this doesn't solve the root issue, but why is there a PR still open from 2012? Can't those just be closed? |
In that case, probably. It gets fuzzier though as you move into PRs which might be OK but just have not been properly reviewed yet, or are partly implemented but need some fixes. |
R. Tyler Croy were you not just recently complaining about the cost of running ci.jenkins.io and the need to minimize unnecessary CPU-hours? This seems like the obvious way to get a huge reduction in cost with very little effort. The downside is fairly minimal I think: the risk that a PR gets merged whose head rev passes but which has a semantic conflict with something done in master since the last PR change. (GitHub will of course display syntactic conflicts automatically.) Such conflicts are probably rare enough, but if there is any suspicion that there might be one in a particular PR, we can always just merge master into the PR branch to force a fresh build, or close & reopen the PR, or (if authorized) Build Now directly on the job, or at worst just revert the merge after the fact. Daniel Beck care to make the case why we should not do this? |
AFAIUI, we have effectively infinite cloud money. This is a measure that can be applied once that is no longer the case. |
As noted in |
That does not jibe with that R. Tyler Croy was telling me recently. And if we do have an unlimited expense account, why do we have such a limited number of executors available that builds are frequently waiting for a long queue to clear (today it again jumped to 175)? |
There is no such thing as "unlimited" We do have a finite amount of budget, but that's not the only problem. Allocating too many agents at once hits quotas, and other limits in Azure itself. The solution I would prefer is to execute our builds faster, which is partially about the app itself, and partially about agent provisioning. That solves the problem without expecting Azure to successfully deploy hundreds of VMs in short order (which is never going to be 100% successful on any cloud) |
Executing builds faster means either dropping test coverage (like ATH), which is not attractive; or doing very labor-intensive changes to speed up individual test suites or make Jenkins startup faster, which is surely possible, but requires significant engineering budget. I am talking about a fifteen-minute infrastructure change which could dramatically reduce peak load, worst-case queue wait time, and total cloud usage. |
The problem applies to PR builds of some plugins, not just core. Today tests of jenkinsci/workflow-multibranch-plugin#82 are blocked because all Windows executors are sucked up by rebuilding dozens of open PRs in git-plugin, going back to 2012. |
Today I am unable to iterate on jenkinsci/plain-credentials-plugin#12 mostly because of core PR rebuilds—ironically, due to my merging my own core PR! Changes detected: PR-3438 (5cfdcfc6de353a881b27f3247bde584ba37901e7+e3af0f3bb1d2f80a672e85e31d2dd63157033c16 → 5cfdcfc6de353a881b27f3247bde584ba37901e7+0331d47d60da10df85d28d4fa66535e0d145dd0c) Apparently I should wait until I am about to go to bed before merging anything to core. |
I would not object to closing old or stale PRs, but this is a contentious long-term endeavor, and would still leave a fair number that are moderately recent yet are still getting gratuitously rebuilt on a frequent basis and at considerable expense. Whereas there is a simple configuration change we could apply today which would immediately and drastically reduce the peak load on the server. |
Mark Waite seems to be using a workaround for one plugin. |
An alternate idea: switch the CI configuration at least for jenkinsci/jenkins to build branch head (not merging with the base branch), and then institute branch protection and required strict status checks, which will ensure that a PR is not mergeable until it has had a stable build on a commit which would be a fast-forward merge. Builds will only be performed if and when developers push new commits to the PR, including manually merging/rebasing or using the GitHub UI to bring the PR up to date with the base branch. We would need to be diligent about making sure master builds are clean, which has been a problem in the past couple of weeks or so (I am helping with some problems). We would need to make sure at least Kohsuke Kawaguchi is marked as an administrator, since mvn release:perform is going to push directly to the master branch without any commit status. Whether such a mode is suitable for plugin repositories is another question, but the top priority for now is clearly core with its many PRs and multi-hour 4× parallel builds. Daniel Beck Oleg Nenashev etc. WDYT? |
I have barely any time for core maintenance as is, and this looks like a lot of overhead, including waiting a minimum of 4-5 hours between individual PR merges unless I go to the command line and merge there. I expect that this will make merges grind to a halt entirely. |
Ack. BTW in serverless JX mode, this issue is handled by batch-testing PRs prior to merge. Comes with its own set of issues of course. |
Some repositories get a button in PRs that maintainers can click to merge the base branch into a PR, thus forcing a retest. (jenkinsci/jenkins at least does not seem to have this; I am not sure what you have to configure to get it. Anyway you can do the same from the command-line if necessary.) That might make for a reasonable middle ground when using the Ignore rebuilding merge branches when only the target branch changed option: a maintainer intending to merge one or more approved PRs can request retests against recent trunk changes only if they seem warranted. |
Olivier Vernin sorry, which approach? What are you describing resembles the alternative I floated on 2018-12-12 which Daniel Beck rejected, so I still advocate this mode. |
casz: This issue doesn't help when you are then met by the intermittent connection issues which will fail the pipeline and now your back where you started in the beloved build queue! |
casz: Or the fact that your waiting GitHub status being turned green but stuck on deploy waiting for Linux executors |
casz: My suggestion, fix the build storm ASAP! But also consider fixing the stale PRs, I suggest using the stale feature which is highly configurable https://github.com/probot/stale Just because a PR is closed does not mean you cannot reopen it.
|
casz: Build storm at it again, merely moving to Travis while this is ongoing! |
I understand the frustration. The easiest way to get past the build queues right now is to help experiment with the newer agents: http://lists.jenkins-ci.org/pipermail/jenkins-infra/2019-March/001641.html |
casz: More than happy to oblige! Thanks for posting R. Tyler Croy and thanks for your hard work! |
casz: The build from ACI is slower than what the current Linux agents provide when they are running stable. From the maven docker file, I can see we have lost the ability to run docker test. We use Docker to run testcontainers.org's vault container to run integration tests. Lucky the ACI is faster at connecting! so 🤷 Old Linux agent: Total time: 03:58 min New ACI agent: Total time: 06:29 min |
That does sound like a promising direction, though to be an actual replacement for the current cloud provisioner it would need to support Docker (as Joseph Petersen (old) mentioned) and Windows. In the meantime, disabling automatic builds on base branch commit (or simply switching to building PR heads rather than merge) would take all of five minutes and drastically improve responsiveness for everyone. |
Was just looking at a plugin PR build which was stalled, and saw that the server was loaded down rebuilding acceptance-test-harness, including for example a two-year-old PR. Again, most easily solved by a server-wide policy switch to not rebuild all PRs after every base branch update. |
Looked at the queue today and it had 308 items. |
W.r.t. the idea of only allowing PRs to be merged (by non-admins) when up to date with the base branch:
Or just manually simulate Tide in cases where there is a batch of PRs waiting: file a new PR that just merges all the proposed content PRs, and if its build passes, merge that. |
We have enough difficulty with consistently getting reviews, and maintainers to click Merge, that adding more difficulty on top seems counterproductive. |
I would like to start from JENKINS-58939 in order to reduce number of rebuilds when they are not desired |
Daniel Beck you point out that maintainer overhead is an obstacle to using branch protection; would you consider something like Kodiak to be a solution? |
Don't know. Since I posted my comment a lot of things changed in core maintenance, I would not currently consider my preferences blocking. |
Changing title to emphasize that core (jenkins) is not the only problem; as of this writing there are a bunch of dead ACI executors and all the working executors are tied up for the foreseeable future building acceptance-test-harness. |
Mark Waite confirms that acceptance-test-harness is a problem in this regard. |
this is probably worse for ATH than others - as the ATH job cancels previous builds for the current job when a new job starts. this causes a almost complete ATH run to be aborted which is a complete waste of resources for PRs especially if a PR itself has not been changed. |
Another case in point is that this often results in PRs not having their incremental version deployed if something was merged to master and the PR is not at the head (which is the case if say you committed some PR feedback to change a typo). |
TBD whether https://github.blog/changelog/2021-10-27-pull-request-merge-queue-limited-beta/ could be made to work for us. |
For information, the setting "Ignore rebuilding merge branches when only the target branch changed" had been applied to the organization scanning folder named "Tools", which include the BOM project, as per https://groups.google.com/g/jenkinsci-dev/c/DGKzc2Y6ZSU. |
Closing as it should not be an issue anymore (or at least no one complained about it in the 3 past months 🧌 ). Feel free to reopen with details if needed. |
This is still a problem in core. I am surprised to see it closed (jenkinsci/jenkins#6780 (comment)). Please reopen. |
Thanks for linking the issue, Jesse. I see, your described way has been applied to the Tools folder and therefore affecting bom only atm. |
I've switched Tools and Core to pr-head Let's monitor for a bit and we can easily switch back if it causes a problem |
I complained about this on Apr 26 but was told by R. Tyler Croy and Daniel Beck it was not really an issue. It is an issue. My plugin PR build has now been sitting for 47m waiting for an executor, the build queue has 111 items (https://ci.jenkins.io/load-statistics?type=min says it was up to >210 at one point), and all of our executors are rebuilding core PRs, one of which I see has not been touched since 2012. This is totally unreasonable. We really need to apply this setting at least to core builds, if not everywhere.
Originally reported by jglick, imported from: Stop building PR merges
The text was updated successfully, but these errors were encountered: