-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconsider RC Process #7738
Comments
Big fan of the proposal. Minor change to current process which should save significant time for dev team and offer benefits to users testing. |
i'm leaning in favor of this approach. IMO the reason we are dealing with these issues with our cherry picking process is more likely to do with cutting our first RC too early, and/or not being diligent enough about restricting the RC line to strictly bug fixes, docs, and tests. Even with that restriction though, there are a lot of bug fixes coming up, both on core SDK and IBC side... so it seems like this above approach may be better. What would be the designated timeline for this feature freeze / release cycle? I guess this offers a good incentive for us to publish the release as soon as feasible :) |
Btw- I had just finished up drafting backport PR for RC2 here: #7355 It once again is quite a behemoth... And was fraught with a handful of potentially missed PRs due to forgetting to label some PRs for "backport", or due to githubs inability to properly sort PRs by merge date. I ended up having to do a side byside comparison with RC1 commit logs to verify that i wasn't missing anything extra... In the end there were no big merge conflicts during the cherry-pick process, but it does feel a bit anxiety inducing to have our major release drifting this far from master, especially when we are in the middle of a release cycle. |
Thanks @clevinson. I'm not sure I agree with this. We have to release at some point. If we had released rc0 later we would have gotten feedback later. I'm also not sure we needed to be more diligent - I think we were diligent and only put stuff on master and backports that made it easier to support the maintenance process. I think the criteria should be getting to a release that includes only things we feel comfortable maintaining in their current state for some time. I just don't think a separate RC line makes sense for our type of development. Instead of saying we should only do X, Y, Z on the RC branch, I'd rather say we should only do stuff on master which makes stargate more maintainable until it's released.
I don't know. I think we will know when we get feedback that it's ready. Also I didn't say total feature freeze - just keep everything we're not ready to maintain in its current state off of master in a longer lived DNM PR. |
Thanks for the tag @aaronc! So for the Go release process, we make a release two times a year, and to accomplish this, we triage by a couple of guides: The process isn't perfect, but it massively helps reduce complexity, confusion, too many breaking changes, too. The freeze for 3 months massively helps with testing, bug fixing, documentation, catching regressions, reporting noteworthy changes, finding vulnerabilities. However, governing all this is the Go1 compatibility promise which we take very seriously and promise that correct programs that ran in earlier versions won't be mostly broken in newer releases https://golang.org/doc/go1compat. Usually about 2 to 3 release candidates are cut before the finally release, and for each we get a branch with cumulative updates e.g. 1.14beta1->1.14beta2->1.14rc1->1.14rc2, where the betas are released much more often and for those we get lots of feedback. References:
|
Thanks for sharing @odeke-em ! |
Thanks for writing this up @aaronc. I just want to state that the current release process should, but does not necessarily currently reflect, have a long-living branch where PRs are continuously cherry-picked into. The problem now is we cherry-pick in a batch type of way. What should really be happening is as a PR lands on master, the issue or PR should be labelled and immediately cherry-picked into the long-living branch...not at the very end of the RC process. This is the PR author's responsibility. Trust me, I know how tedious and laborious the process is as I was the only one doing it for quite some time. The process of having a long-living release branch where PRs are continuously cherry-picked in is exactly the same process that HashiCorp follows, where all their cloud-based products are completely OSS and they have enterprise offerings as well.
I think we should specifically specify features here. |
The current release management workflow has been working pretty well for the 0.39 Release Series. It must be said that 0.39 is very stable and in fact is in maintenance mode only. I'll share some of my thoughts about the problems that seem be affecting the users:
True that. This problem could be solved by managing users expectations better, for instance we should make clear that breaking changes are to be expected between RCs. Users would be warned at least.
That's not unusual. If PRs bring consistent incremental changes, the order should not matter I suppose, e.g. if a feature branch affected by bug a is cherry-picked into the would-be 0.40.0 final release branch then testers and devs should still be able to use
You have a point. I have a proposal: so far we always fix bugs in master first and then we cherry-pick them into stable branches. This has worked very well for Launchpad so far. It seems it's not working as well for the 0.40 release series. |
I agree, this "cherrypick and backport" approach works very well for stable release maintenance.
It makes most sense to me to actually fix the bugs on the branch they will be released from. The downstream users can immediately test the PR and approve it or request more changes, and then merge it. We must be careful to "forward port" these all into the development branch, but that is no more work than the backport process and you actually fix bugs where they were found. My big question is what is the difference between I would propose:
Especially with production bugs which are not clear unit tests, you need to fix the broken binary quickly and you cannot be sure you fixed the bug until you use the same version, so "trying to reproduce and fix a similar problem on master, get approvals, merge it, then backport it to the older release, then (at the end) test if it actually fixed the user's issue" seems very hard. These are the kinds of issues that are popping up for 0.40.0-rcX... inability to connect relayer, issues with genesis, etc. Much more agile would be "reproduce the issue on the same software version the user reported it on, prepare a fix for the issue, allow the user to test the fix, then get approvals on the code, merge it. And with this confidence, forward port it to master". |
TL;DR: Launchpad-style is great when we are stable. When all development is really focused on release, there is no need for 2 branches.
|
I agree with this as well. The release branches need to be up to date (on a daily basis). And no one is really doing it. But I still hear no reason why the Cosmos SDK team needs 2 branches right now. Who decided that a release branch was made before anyone had actually tested the code on a testnet? I think it just kind of happened. There is a whole ecosystem of tools that need to be tested against 0.40.0 before it can be called stable (and many cannot even start testing as we cannot get a stable enough testnet up yet). |
@ethanfrey the perhaps lazy reason that we have two branches right now, as I see it, is historical. I don't know of any particular PRs int master from the last week that were for a post-Stargate feature, but the flip side of that is- there are a more PRs into master than the Stargate release team can actively track, and I'm not aware if we currently have branch protection rules in place to make those assurances. That being said, I just went through and looked at all the PRs into master since RC0, and indeed it looks like the ones that weren't included in backports were all quite minor improvements, and no big new breaking changes that explicitly don't belong in a release. It sounds like most folks here are in favor with @aaronc's proposal here. Shall we move to decision to accept this proposal as is? Or are there concrete amendments that others would like to see? Main question I have is- do we want to temporarily update branch protection rules on master in order to have better assurance during a "release cycle" that non-release related PRs aren't merged into master? I don't have access to see branch protection rules, so I can't really tell if waht we have at the moment ins sufficient. In the meantime, I can go ahead create a tracking issue to replace my backport PR, which will serve as a checkbox list for RC2 since there are a few specific issues we are waiting on for that. |
Yes, I agree with @aaronc's proposal. There is only one concern that I'd like to bring up:
This risks to slow down the development of new features/ADRs. Teams that are not directly involved in the development/finalization process of a particular major release (I'm assuming here that with major release we are referring to That is mainly the rationale behind my proposal of fixing bugs affecting the major release being finalised directly in its branch first. We could always forward port fixes to master as suggested by @ethanfrey. This is a bit like how Gitflow release branches work, i.e. you branch out a release branch when you want to start finalizing it and you merge bugfixes in it. The main difference with Gitflow here is that out hotfix branches will target the release branch that is currently being finalised first - still hope it sounds like a good compromise? @aaronc @clevinson wdyt? |
Cherry picking makes sense if we already released a new version. In that case we should only cherry-pick critical bug fixes (or as @ethanfrey suggested, do the hotfix in release branch and cherry pick to master). |
According to the semantic versioning, in our case point releases (the
Later, once we have |
@alessio I don't see it as a risk. It's a benefit. During the RC cycle we should focus to tag the release as fast as possible. |
Yes, if work is done on a separate release branch then tagging is still possible I suppose.
Well, I disagree. There are teams that are working on ADRs and feature that quite certainly won't make it in time for 0.40. Why should we block them? |
No need to block them. They will be asked to target their changeset to a different branch, eg
My original idea, when we were discussing release process, was to be able to merge directly into |
I've never seen a branch |
No, we cannot go back to using Can we close this issue @aaronc? I thought we agreed on a process last SDK call? The process is really quite simple and doesn't deviate much from the current process:
It's that simple. |
I'm fine with that 👍 . My understanding was that we need some place to merge all things from community which we are not considering as a part of RC. |
I think we can accomplish that with have a rapid and iterative RC cadence + a feature freeze. This means RCs cannot take weeks upon weeks though. |
I agree we shouldn't create a separate @alessio I hear your concern about blocking other teams from merging into master. I would note that we are ourselves holding off on a bunch of stuff that we want to merge into master. While, I don't think we should create a separate My alternative proposal to this is simply that teams working on features that aren't quite ready for master merge those into long-lived feature PRs, and we can even pre-approve those PRs but put a |
Also if we get to a place where the only thing between an RC and release is say a Tendermint release, i.e. no big bugs, then we definitely shouldn't block master and cherry pick onto |
I am not a big fan of Gitflow - just to be clear :) and I like @alexanderbez's proposal, I'd just add a proposal:
I like the feature freeze concept, but if it is only loosely defined then we need some sort of hard freeze too, i.e. a point in time starting from which we block PRs from being merged because they don't meet some release-critical criteria (these should be very well defined instead) - else we'll risk to keep merging features that would increase the risk of delaying the cut of a release. In other words, at some point we should enforce a hard rule like from now onwards, only bugfixes please, no exception shall be made. Maybe at that stage we can fork out a branch dedicated to release stabilization so that ordinary development can go on on |
To not have a too disruptive of an RC process, we can prohibit any "breaking" changes, instead of a full-fledged feature freeze. This way we don't have PRs standing idle for weeks. |
Good, this is still an improvement and helps all SDK users. I have a feeling that with future releases, it will be challenging to have RC very short. Big stakeholders (Gaia, ...) will like to run "big bang" test nets during a RC. Moreover, (I hope) we may have more contributors developing new things. |
Exactly, that's what I've been saying. IMHO we have a pretty thorough multi-stakeholder review process in the SDK. We don't just merge stuff randomly. So no half-baked features or unintended breaking changes (I say unintended because sometimes we need to "break" something because of RC feedback). But if somebody delivers a complete additive, non-breaking piece, I'm not sure how it's a problem to deploy that to users. It got reviewed, tested, etc. Why not "release early, release often"? A few non-breaking additive "features" that have gotten in were things users felt were missing. |
Sounds like there's enough alignment here to proceed with tagging Stargate RC2 off of Before closing this issue, and to align on any final details, I can summarize what I see as this convo's consensus as a PR into CONTRIBUTING.md's release procedure. Should we also update STABLE_RELEASES.md to reflect some of the pieces discussed here? |
Yes please @clevinson 🙏. Thank you |
What happens to PRs that implement ADRs (e.g. Rosetta API support)? Will they be put on hold until Stargate's final is out? |
This is what I personally think should be the litmus test:
If all of those are yes then I personally see no problem with them getting merged. Also, in protobuf we can version different packages separately some things can be Where it doesn't have negative security implications or break things, IMHO I think we and users benefit from "release early release often". Does that help @alessio ? |
I agree with merging in non breaking changes, especially if they add new functionality that has been planned. Not random new ideas (to keep some limit) and ideally not modifying core functionality (to keep that as stable as possible, converging on a release). Both the cosmovisor and Rosetta api support are very valuable pieces that make 0.40.0 much more useful in practice. And they are built around the system more than tinkering with the internals. Both great things to merge in now in my opinion, even if they do include a fair bit of code. |
Aaron, I'm fine with anything yet I disagree that this is the optimal process as I'm still failing to understand what's the problem with having We used to have a similar loosely defined process as the one you described (i.e. the three questions). It generally works until you discover that allowing developers to merge things at the 11th hour comes with increased 1. regression risk 2. risk of never cutting the release. This process may appear simple because it's easy to handle it. Yet it comes with risks that delayed releases in the past. Again, we should have People will come begging to include things at the very last minute. And no one will be in the position to push back because the process is not enough strict. Effective release management needs rules written in stone and little to no room for exceptions. I neither can nor want to block progress though - I'm outnumbered apparently. So please press ahead. |
This was managed very poorly since rc0. And release and master diverged in unclear ways. I would rather one canonical branch with a limit of random fixes than two branches with certainty that some key fixes are forgotten in release branch (the process one week ago). I agree there could be a better process, especially if there was a very clear release owner, which I think is essential in any case. There were a slew of major issues with rc1. Once there is some vaguely stable rc that doesn't have major hiccups, I would love to have a slower, focused release process on it. Alessio, can you point out some prs pouring into master that are not release related? I agree too many changes to the core will delay release indefinitely |
@alessio I am not opposed to I want to note that we did not delay |
Once you've got a commit in |
So your main tweak to the process @alessio would be to just develop on |
Yes that's correct. But really hold on. I want to delay release as much as I want to block the work on this issue here: not in the slighest :) I'm happy to help either way! |
So you don't just wait for 2 weeks, then try to sort out what belongs where? (The previous process addressed by this issue). The master = 0.40 strategy works as it is simple, and everyone follows it. No one was actually backporting their PRs to the release branch, although all seemed to think that was how it should work. I think the key here is a release owner who works at least half time focusing on this release grooming, so all other devs can keep up high dev speeds, while we keep regressions out of the release. @clevinson was kind of fulfilling that role before, maybe he wants to, but I think it was not a very clear, active role. And I never had the feeling he actually had a say as to what belonged in the release and what not? It would be good to have someone with a clear opinion there. Ideally they can just ensure all the proper PRs are in the release (by backporting them within 24 hours) and provide a clear person to talk to if you think your PR should be in the release. Ideally the two can resolve this alone. Of course, this could be escalated to a larger discussion if there is an unresolvable disagreement, but I think that should be less than 5% of the PRs and the rest can quickly be in "release" or "not release" buckets. One questions, however. If we make the |
Or a couple of release owners/managers, that's fine too. More importantly we need release critical criteria, by which release managers could make justified decisions, e.g. That makes the process predictable, testing effort's outcome reliable and debugging via |
I think this is pointing to the need for greater coordination between teams. We do that in our Friday calls and in general that is where we are establishing release criteria as @alessio is mentioning. I'm not sure that there's a problem with that process. But there are still things that are blocking release. And I would note that from my perspective I do not think it has anything to do with scope creep which this discussion appears to be about. There are still core pieces that need to be fixed as I understand it. (@clevinson maybe you can comment more on what those are as I believe you have the context).
Our team is currently trying to prioritize 0.40 and 0.40.1. We are also working on stuff slated for 0.41, but that stuff is non-breaking ( |
I think that perhaps instead of going cold turkey immediately (which seems
like it is what is happening), let’s announce a freeze beginning in 2
weeks, give errbody due notice, point them to this issue, create a policy
on the wiki page of this repository, and start the process — if we
implement the freeze immediately, we are going to have demoralization, lots
of dead PRs hanging, little buy-in, lots of uncontent members of the
community, but even more it could stretch the goal and delay for a bit.
Just an idea to start a gradual move to code freezes and stable fixes
…On Wed, Nov 4, 2020 at 12:22 PM Aaron Craelius ***@***.***> wrote:
It would be good to have someone with a clear opinion there. Ideally they
can just ensure all the proper PRs are in the release (by backporting them
within 24 hours) and provide a clear person to talk to if you think your PR
should be in the release. Ideally the two can resolve this alone. Of
course, this could be escalated to a larger discussion if there is an
unresolvable disagreement, but I think that should be less than 5% of the
PRs and the rest can quickly be in "release" or "not release" buckets.
I think this is pointing to the need for greater coordination between
teams. We do that in our Friday calls and in general that is where we are
establishing release criteria as @alessio <https://github.com/alessio> is
mentioning. I'm not sure that there's a problem with that process. But
there are still things that are blocking release. And I would note that
from my perspective I *do not* think it has anything to do with scope
creep which this discussion appears to be about. There are still core
pieces that need to be fixed as I understand it.
One questions, however. If we make the release/0.40.x branch again, what
is master targetting? 0.40.1? 0.41? That is an essential question. I
think we have plenty non-breaking improvements that we will want in 0.40.1
and it may make sense to stop pushing those into a release branch to just
get bugfixes in there. However, if we are throwing both 0.40.1 and 0.41
code into master, it will never make it into 0.40.1
Our team is currently trying to prioritize 0.40 and 0.40.1. We are also
working on stuff slated for 0.41, but that stuff is non-breaking (x/authz,
x/fee_grants, and x/group). Either way we are currently holding off
merging until we're done with Stargate.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7738 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABFL3V7WD5RW25INUBHYNC3SOGZXXANCNFSM4TELGH4Q>
.
|
I'm still trying to understand if that's actually a problem. @alessio is your team feeling blocked from merging into master at this moment? Is that a problem today? (And as I said, I personally don't see a problem if Rosetta got into a Stargate release if it's ready... I don't see how a process of intentionally delaying Rosetta for instance will get to Stargate quicker. It seems orthogonal, nice to have, not blocking IMHO. Other things are blocking Stargate not feature creep IMHO) |
Summary
Consider a lighter weight RC process where we release off master and reserve cherry-picks for maintenance mode releases.
Problem Definition
As I understand it we are attempting to follow a process known as “trunk based development”. I understand there are many different viewpoints on branching strategies (here’s a debate thread between two teams at Microsoft: Stop merging if you need to cherry-pick | The Old New Thing). But without trying to pick sides philosophically, I just want to document what I’m seeing in practice.
Basically, I’m seeing two issues:
Some of the problems this causes for users are:
@jackzampolin @ethanfrey and @clevinson maybe you could comment a bit?
Frankly, I feel that we are cherry picking too much and have likely branched off of master too soon.
Proposal
I propose the following as an alternative:
Looking at the development happening on master, I can say that our team explicitly isn’t trying to merge stuff that we know isn’t ready for stargate. Other stuff we are doing in feature branches and leaving in draft mode. I actually can’t think of any PRs that have been merged that really shouldn’t be in stargate But with this process, many of those PRs will be excluded even if that doesn’t really make sense. If you look at the backport PRs, there are a lot of cherry picks and often it is hard for @clevinson to know what should and shouldn’t be cherry picked and in what order.
I’m not saying we shouldn’t use cherry picking for release maintenance, but I think we’re splitting off too early and it’s actually making this harder.
So I would like us to consider reserving backports for when master really has diverged from a past release. We could probably release all the RCs off of master and even a patch release within a week or two and things would be fine.
This just involves discipline around master when we are in an RC cycle and I think we’ve actually been doing this okay the past month. I’m not saying we should avoid breaking changes in an RC cycle - we’ve been doing it a bit and I think it’s okay if helps us get the release right. We just need to be disciplined and know when to include something and when not to. So this may mean that certain PRs stay in longer lived feature branches for a while until after the RC cycle, but I think that’s an okay trade off.
@odeke-em if you could comment on the golang release process that might be useful context for us.
For Admin Use
The text was updated successfully, but these errors were encountered: