Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: backports: fix regressions in previous releases for N days #34622

Closed
networkimprov opened this issue Sep 30, 2019 · 24 comments
Closed

Comments

@networkimprov
Copy link

networkimprov commented Sep 30, 2019

I was startled to learn via #34536 that regression backports cease on the day the next major release appears. I'd assumed that a previous release would be fully supported for 6 months. (Would anyone expect otherwise?)

Suppose I wait until 1.12.5 to upgrade a deployment from 1.11.x, and then discover a regression on the day 1.13 comes out, I'm outta luck? Either upgrade again to 1.13 or resort to a custom build? Note that many organizations require an evaluation period prior to deploying an upgrade.

Therefore I propose that regression backports cease at a later date, e.g. 90 days following the next major release, and that the policy should document the time frame.

Currently only breakage due to "external forces" is backported to the previous release.

A distinction between regressions and "external forces" seems arbitrary. If one could fully evaluate a deployed application running on a new release within a known period, that would surface any regressions before the current backport window closes. But that's not achievable in many cases.

I strongly suggest that the Go team seek community input on this issue before making a decision, for instance by adding a question to the annual Go user survey.

@FiloSottile
Copy link
Contributor

For context, this policy emerged from saying that we "consider upgrading to the latest release a valid workaround". I am finding it hard to form an opinion on this also because I am not aware of many cases in which this has happened. We usually get most of the serious regression reports in the first month or two.

@bcmills
Copy link
Contributor

bcmills commented Sep 30, 2019

A different way of putting this might be: if you have gone through the trouble to release-qualify the current version of the Go toolchain, then you can reasonably expect that it will continue to receive fixes for a while.

If you intentionally release-qualify an older-than-current version of the toolchain, then you should not expect fixes for the issues you find in that version, because active development has moved to the current release.

And if you're going to release-qualify a version of the Go toolchain immediately before a major release, we would really rather you do the qualification on the beta or release candidate of the upcoming release, rather than a version that has been sitting more-or-less stable for an entire cycle up to that point.

@networkimprov
Copy link
Author

networkimprov commented Sep 30, 2019

@FiloSottile I'd be surprised if it hasn't happened, but you'll need to actively solicit community input to gather war stories.

@bcmills you haven't addressed the scenario I gave. You could begin qualifying a release shortly after it ships, finish in 90 days, deploy, and then find a regression 90 days later. At which point you have to begin qualification all over because a new release has shipped.

And if you don't expect to find many regressions after the first couple months, where's the harm in saying you'll backport these fixes for longer than six months?

@bcmills
Copy link
Contributor

bcmills commented Sep 30, 2019

@networkimprov, if you qualify the release exhaustively enough that the process takes 90 days end-to-end, and there is a non-security-impacting regression that has a small enough effect that another 90 days elapse before you discover it, I think you'd be hard-pressed to make the case that that regression has a serious enough impact to warrant a backport.

@networkimprov
Copy link
Author

Please seek community input on the matter. The current policy was a surprise to me; I doubt I'm alone.

"90 days" isn't necessarily exhaustive, a staging server may not have a representative workload. And after deployment, conditions on the public Internet are quite unpredictable.

@stevenh
Copy link
Contributor

stevenh commented Oct 2, 2019

Using valuable resources backporting every bug fix when backwards compatibility is already guaranteed is just a waste IMO.

If that’s what your company requires the burden of it can easily be taken up, go is opensource so it’s trivial to create a custom build.

@vdobler
Copy link
Contributor

vdobler commented Oct 2, 2019

Without at least some litte pressure to use the current version people will stick to old stuff longer than it is healthy (either because they are lazy or because they are afraid). So please: Do not backport too much stuff and especially not for too long.

It sometimes takes us two or three weeks to switch to the latest release. But this time frame is determined by holidays, people being sick, planed downtimes and fixing production issues and not by having to get the new release working with our software.

@velovix
Copy link

velovix commented Oct 2, 2019

Please seek community input on the matter. The current policy was a surprise to me; I doubt I'm alone.

Respectfully, I think the onus is on you to seek community input as the proposer of the change.

This policy isn't surprising to me personally because in the Go community, the latest release has always been the recommended one. This isn't the case for every language, but I think Go has "earned it" by taking backwards compatibility seriously. We don't have an extensive qualification process at my company, so perhaps I'm less likely to understand this need, but it seems reasonable to me for a company to do some backporting itself if they have a policy that isn't compatible with Go's release schedule. This seems especially reasonable considering that releases don't happen very often and are done on a consistent schedule.

@andybons
Copy link
Member

andybons commented Oct 2, 2019

I strongly suggest that the Go team seek community input on this issue before making a decision, for instance by posting a request for input on the mailing lists.

@networkimprov, you are capable of posting to the same forums to solicit community feedback. Please feel free to do so.

@networkimprov
Copy link
Author

networkimprov commented Oct 2, 2019

I posted to golang-nuts when I filed the issue! I suggested that the Go team do so because your posts are far more widely read and shared. The response to my note has been minimal.

https://groups.google.com/d/topic/golang-nuts/4E5MYyoiMIg/discussion

EDIT: I changed the issue text to suggest adding a Q to the annual user survey.

@velovix
Copy link

velovix commented Oct 3, 2019

@networkimprov

I suggested that the Go team do so because your posts are far more widely read and shared. The response to my note has been minimal.

I remember a similar sentiment being expressed by you when discussing the try proposal. On one hand, I sympathize. When a core Go team member posts on the mailing list or files a proposal, it will generally garner more attention than if someone else did. On the other hand, I don't think it's reasonable to expect anybody, the Go team included, to speak on your behalf. If a Go team member agrees that there's cause for confusion regarding this policy, I'm sure that they'll reach out to the community. Otherwise, the onus is on you to prove the value of your proposal yourself.

@networkimprov
Copy link
Author

networkimprov commented Oct 3, 2019

My intention is to speak for the collection of orgs that are conservative about upgrading deployments. They would see upgrading to a brand new release for a regression fix as a likely leap out of the pan and into the fire.

I can't make the case for them by myself, and I don't know them personally.

@stevenh
Copy link
Contributor

stevenh commented Oct 3, 2019

You might want to look internally at why that is?

Very successful companies such as Netflix have demonstrated that such conservative approach is actually counter productive.

Search Netflix uses FreeBSD Head, if your interested.

@networkimprov
Copy link
Author

Here is a case where the current policy fails: #34713

@FiloSottile
Copy link
Contributor

I don't see #34713 as a failure. If a user tested Go 1.12 and is now successfully running it, it sounds like they don't need the fix right away, and like all improvements they can get it when they update the major version next. If a user tried to upgrade to Go 1.12 but was blocked by this, they might as well upgrade to Go 1.13 with the fix. If a user hasn't started upgrading yet, they should upgrade to Go 1.13.

This is what the Go 1 Compatibility Promise is for. It costs us a lot, but it saves us from having to do long term support releases, and make "just roll forward" a valid response for most users.

@networkimprov
Copy link
Author

networkimprov commented Oct 9, 2019

Go 1.12 was the first release where os.File.Sync() worked correctly on MacOS, so any filesystem-based database app was forced to upgrade, regardless of the above regression. #26650

Go 1.13.2 1.13.3 will be the first release in many years that works correctly on Windows laptops, so some apps will be force to upgrade, regardless of possible regressions. #31528

Since users have to accept tradeoffs like this at times, it would be helpful to extend the period for regression backports by 90 days, or thereabouts.

@stevenh
Copy link
Contributor

stevenh commented Oct 9, 2019

Why do you believe builds with important fixes should mean extra backports?

Surely a build with an import fix is just more incentive to stop lagging behind and upgrade?

@bcmills
Copy link
Contributor

bcmills commented Oct 11, 2019

@networkimprov, how does that follow? If the application was ok with being glitchy on macOS, or Windows laptops, for 11 or 12 previous Go releases, why would it suddenly become urgent for that application to become non-glitchy as soon as the fix is published in any release?

Or, going in the opposite direction: if an application that never worked before suddenly starts working with Go 1.13.2, why would its author care whether it also starts working with 1.12.11? (Why wouldn't they just use 1.13.2?)

@networkimprov
Copy link
Author

networkimprov commented Oct 11, 2019

Note: I amended the issue text to suggest adding a question to the annual user survey.

@bcmills in the File.Sync() case the failure was silent, unless you crash-tested your app, which is somewhat involved (app abort is not a crash test). Apparently none of the Golang database-library projects did so on MacOS. Any users hurt by this probably wouldn't suspect the app -- "Oh, system crash, and data fubar, nothing new here." I discovered the Go bug after reading up on fsync() for my own work, and learning that MacOS fsync() is intentionally broken.

I don't have direct experience of the Windows bug. It seems like Go hasn't been widely deployed on Windows laptops, so maybe glitches didn't yield enough user complaints to prompt developer intervention.

And rather few PC apps have used Go for "11 or 12" releases ;-)

Re the belief that "upgrading is a valid workaround" for previous-release regressions, there are sound reasons why many won't deploy a latest release early-on, e.g. stack corruption in 1.13 #34802

@Gobd
Copy link

Gobd commented Oct 17, 2019

I don't see #34713 as a failure. If a user tested Go 1.12 and is now successfully running it, it sounds like they don't need the fix right away, and like all improvements they can get it when they update the major version next. If a user tried to upgrade to Go 1.12 but was blocked by this, they might as well upgrade to Go 1.13 with the fix. If a user hasn't started upgrading yet, they should upgrade to Go 1.13.

This is what the Go 1 Compatibility Promise is for. It costs us a lot, but it saves us from having to do long term support releases, and make "just roll forward" a valid response for most users.

I don't have an opinion here but "just roll forward" doesn't work all the time. #27044 was bad enough we had to either stay back until a fix was released or implement a work-around. If #34472 gets fixed everything will still compile and run but will throw errors for things that previously worked.

@rsc
Copy link
Contributor

rsc commented Nov 14, 2019

Closing as a duplicate of #34536. Both issues are about "decide and clearly state what the policy is". We should have one conversation, not two.

@rsc rsc closed this as completed Nov 14, 2019
@networkimprov
Copy link
Author

It turns out that my original expectation, "that a previous release would be fully supported for 6 months" is in fact current policy, as of 2017... see #19069.

I'd like a little credit for my accurate read of user community needs :-)

@rsc
Copy link
Contributor

rsc commented Nov 21, 2019

@networkimprov, saying I told you so is not exactly the way to win friends.

@networkimprov
Copy link
Author

I'm sorry, that's not what I meant to imply.

I find it distressing to make an argument which I know to be true, and not hear any agreement. To top it off, now I've been admonished. I'm mystified on both counts :-/

@golang golang locked and limited conversation to collaborators Nov 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants