Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added version policy #28415

Merged
merged 10 commits into from
Sep 20, 2019
Merged

Added version policy #28415

merged 10 commits into from
Sep 20, 2019

Conversation

TomAugspurger
Copy link
Contributor

This tries to codify the discussion about versioning from yesterday. I tried to summarize the group's thoughts. Let me know if any of my biases slipped through.

cc @pandas-dev/pandas-core

doc/source/development/policies.rst Outdated Show resolved Hide resolved
doc/source/development/policies.rst Show resolved Hide resolved

We will not introduce new deprecations in patch releases.

Deprecations will only be enforced in **major** releases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to mention that major releases will happen at the team's discretion, unless we had talked yesterday about a cadence of major release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we merge a breaking change to master, then the next release is a major release? or do we have to maintain two branches.

2 branches would avoid cramming the work involved in actually executing deprecations between the final minor release and the next major release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we merge a breaking change to master, then the next release is a major release? or do we have to maintain two branches.

In practice it will go the other way around: we discuss if we want to make a major release, and then merge or not ;)

But it's certainly something we should think about how feasible it is. If we want to do bigger breaking changes (eg enabling nullable integers as default), even though we might already have it optional, we will need some time that those can sit in master. So the question will then be if we still want to further develop a stable branch, or only do bug fixes for that.
(now, I am not sure we need/can discuss this in detail now without having it in practice, but in general it is certainly a point to take into consideration)

@TomAugspurger
Copy link
Contributor Author

@mroeschke brings up a good point in #28415 (comment). What do we want to say about

  1. Release cadence
  2. Backports

Django is very explicit here: https://www.djangoproject.com/download/#supported-versions. I'm not sure if we're ready to do something like that. But perhaps we can provide some guidance?

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting this document!

Something I forgot to bring up yesterday: I think it might be good to be more strict in maintaining "experimental" labels on certain features, and clarify that those experimental features are excluded from the "no api breakages in minor releases" rule.
For example, we should label the nullable integers as experimental, so we can develop it further, if needed with breaking changes, so it can get stable towards 2.0. We should still be cautious with breaking changes of course, to not disrupt the people trying it out and giving feedback too much.


We will not introduce new deprecations in patch releases.

Deprecations will only be enforced in **major** releases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we merge a breaking change to master, then the next release is a major release? or do we have to maintain two branches.

In practice it will go the other way around: we discuss if we want to make a major release, and then merge or not ;)

But it's certainly something we should think about how feasible it is. If we want to do bigger breaking changes (eg enabling nullable integers as default), even though we might already have it optional, we will need some time that those can sit in master. So the question will then be if we still want to further develop a stable branch, or only do bug fixes for that.
(now, I am not sure we need/can discuss this in detail now without having it in practice, but in general it is certainly a point to take into consideration)

doc/source/development/policies.rst Outdated Show resolved Hide resolved
@WillAyd
Copy link
Member

WillAyd commented Sep 12, 2019

Currently we are including some deprecations in 1.0. Ideally, we should try to keep this number limited I think (we should try to deprecate as much as we want in the release before it), as we need to keep them until 2.0.

I was thinking we should remove these. Some friction in the interim but I don't see the long term advantage of introducing deprecated behavior into a major release

@jorisvandenbossche
Copy link
Member

I was thinking we should remove these. Some friction in the interim but I don't see the long term advantage of introducing deprecated behavior into a major release

Sorry, we actually have none yet: https://dev.pandas.io/whatsnew/v1.0.0.html#deprecations. I was mixing it up with api breakages (which are mostly very minor up to now).

@simonjayhawkins
Copy link
Member

Should python version support be linked to major releases? i.e if 1.0.0 is released with py3.5 support, it won't be dropped to 2.0.0.

or if we adopt #27557, then python versions can be dropped in minor releases .

do we need to be clear and include python support in this policy?

@mroeschke
Copy link
Member

mroeschke commented Sep 13, 2019

I wouldn't mind adopting a release schedule similar to Django (e.g. release minor versions every 6 months, release major version every year). Keeps things simple, predictable, and accountable no matter where we are at the current development stage. At minimum, there should probably be deprecations over a year that could constitute a major bump.

Backport and LTS support probably needs more discussion.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with what is proposed here, I think it's a very simple policy, and I think that makes a lot of sense as a starting point (if in the future we identify things that can be improves, we can add more complex rules then).

doc/source/development/policies.rst Outdated Show resolved Hide resolved
@TomAugspurger
Copy link
Contributor Author

do we need to be clear and include python support in this policy?

I think a statement on that is appropriate for this document. Are people on board with following NumPy's policy? If so, let's add that as a section here, and link to the NEP (once it's merged).

@jreback jreback added the Admin Administrative tasks related to the pandas project label Sep 13, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@jorisvandenbossche
Copy link
Member

do we need to be clear and include python support in this policy?

I think a statement on that is appropriate for this document. Are people on board with following NumPy's policy? If so, let's add that as a section here, and link to the NEP (once it's merged).

Meaning: not linking it to our major/minor policy, right? As I think Simon's question (at least the rest of his comment) was explicitly about that: can we drop python version support in minor releases?

(personally I don't think we should link it, it's difficult to exactly predict how things will work out)

@phamvantuong
Copy link

phamvantuong commented Sep 13, 2019 via email

@TomAugspurger
Copy link
Contributor Author

Outstanding discussion items:

  1. Do we introduce deprecations in X.0.0?:
    Added version policy #28415 (comment)
  • Right now the preference seems to be no deprecations in X.0.0.
  • May be a bit hard though, depending on the time between releases.
  1. Guidance on release cadence? How long will be do backports for? Will we have LTS branches?
  • Hard to say without trying it.
  • I suspect the pace of releases will be similar to what they are now.
  1. When might we drop support for a Python? Major only? Major or minor?
  • Link to the NEP?

@jreback
Copy link
Contributor

jreback commented Sep 13, 2019

Outstanding discussion items:

  1. Do we introduce deprecations in X.0.0?:
    #28415 (comment)
  • Right now the preference seems to be no deprecations in X.0.0.
  • May be a bit hard though, depending on the time between releases.

I don't see why we wouldn't include new deprecations in s major release, so -1 on say we don't.

  1. Guidance on release cadence? How long will be do backports for? Will we have LTS branches?
  • Hard to say without trying it.
  • I suspect the pace of releases will be similar to what they are now.

I would provide no guarantees.

no LTS at all (w/o separately guaranteed funding).

  1. When might we drop support for a Python? Major only? Major or minor?
  • Link to the NEP?

@datapythonista
Copy link
Member

datapythonista commented Sep 13, 2019

Agree with Jeff.

For dropping Python support, to me it makes more sense to drop it only in major releases, since it's easier to communicate (and for users to remember and be aware). But not a big deal if it makes our lives more difficult. And I'd like the NEP (we can say we will follow when it makes sense, if we anticipate we may need to be flexible).

@TomAugspurger
Copy link
Contributor Author

Roughly speaking, how often do people think we'll do major releases? In my head, I've been thinking approximately yearly.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Sep 13, 2019

Roughly speaking, how often do people think we'll do major releases? In my head, I've been thinking approximately yearly.

In my head, I think it will rather be something like 2 (or 3) yearly.

If you look at the last years, we had the following minor releases:

  • 0.25: July 2019
  • 0.24: January 2019
  • 0.23: May 2018
  • 0.21/0.22: October / December 2017 (counting as one with the empty sum change)
  • 0.20: May 2017
  • 0.19: October 2016
  • 0.18: March 2016

so we had at most 2 minor releases a year. And personally, I think it is certainly reasonable to do 3 to 4 minor releases before another major release (which gives easily 2 years with the current release cycle).
We might of course also want to try speed up the release cycle, though.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - thanks for spearheading this


.. versionchanged:: 1.0.0

Pandas uses a version of `SemVer`_ to govern deprecations, API compatibility, and version numbering.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"version" -> "variant"? "version of SemVer" seems awkward

@TomAugspurger
Copy link
Contributor Author

OK, I think we're pretty close here. We don't have

  1. a schedule / guidance on how often we'll release major / minor versions.
  2. Guidance on how long (if at all) we'll make backports to the previous major branch.

It's a bit hard to give guidance on those right now.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.


.. versionchanged:: 1.0.0

Pandas uses a variant of `SemVer`_ to govern deprecations, API compatibility, and version numbering.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe in a future iteration, we could be explicit on the differences to semver and explain the reasoning for the differences in that context.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. For guidance on when we plan to release, may be a comment saying pandas is develop by volunteers and we don't know, and some information of how often we released in the past? Or just don't include it.

Whenever possible, a deprecation path will be provided rather than an outright breaking change.

Pandas will introduce deprecations in **minor** releases. These deprecations will
preserve the existing behavior while emitting a warning that provide guidance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be a colon at the end?

@jorisvandenbossche
Copy link
Member

Let's leave this open for a few more days? (I don't think there is any hurry to get this merged quickly)
I just notified the mailing list that the discussion continued here.

@jorisvandenbossche
Copy link
Member

a schedule / guidance on how often we'll release major / minor versions.

I think it is fine for now to not have this, as it is very difficult to predict how it will go. Certainly for major releases, I suppose for minor releases it will be somewhat similar as to how we are doing now? (although we might want to try increase the speed a bit? eg towards 3 instead of 2 minors a year?)

I am fine with the suggestion of Marc to give a vague idea how it was in the past (2 / year), with the clear remark of volunteer-based, no commitment to exact schedule, etc. But also fine to leave that out for now.

Guidance on how long (if at all) we'll make backports to the previous major branch.

IMO, we will need to do some form of LTS, when going for this proposal.
Right now we normally don't do any bug fix release any more to x.y once x.y+1 is out. For minor releases we can certainly continue doing that. But for major release branches, I don't think it would be that uncommon to still do some bug fixes even after the next major release (whether it are just some compatibility fixes with a new python or numpy that is released shortly after the next major release).
But again, I don't think it is needed to already codify this now in the text. We can see how it goes in practice.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Sep 16, 2019

A few comments having been on the side of delivering commercial software and having these version number discussions:

  • I do think it is important to have a commitment to at least one major version a year, because if you end up taking 2 years to do a major version, you have taken a long time to actually deprecate something
  • Having said the above, if you decide to do a major release once a year, and there was a minor release 3 months before where there was a deprecation, then when the major release comes, the feature was only deprecated for 3 months. So you might want to consider that a feature is officially removed when one of the following events occurs last:
    • A major release
    • A combination of 3 minor or major releases
    • One year
  • You have to decide what constitutes a major vs. minor release. Is it based on time (e.g., once a year for a major release) or is it based on having a certain set of new features?
  • How do you want to handle dependencies on other libraries (numpy, matplotlib, etc.) where they introduce a change that causes something in pandas to break? So now you have to deprecate something in pandas because of that dependency.

@jorisvandenbossche
Copy link
Member

@Dr-Irv thanks for the feedback! Some answers / questions

I do think it is important to have a commitment to at least one major version a year, because if you end up taking 2 years to do a major version, you have taken a long time to actually deprecate something

Currently, we keep a deprecated feature for 3 minor releases (e.g. deprecate in 0.22, also 0.23 & 0.24 get the warning, removed in 0.25). With our current release schedule, this means that deprecations are already kept for a minimum of ca 1.5 years. So extending this to eg 2 years does not seem that of a problem to me.

Also, for "ideal" deprecations, you can already act upon them to get the new (or keep the old) behaviour, and not see the warning any more. For those cases, I also don't think that it is a problem that a deprecation takes a long time.
Or can you explain why taking a long time to deprecate something might be a bad thing? (except that it can be a burden to pandas development)

there was a minor release 3 months before where there was a deprecation, then when the major release comes, the feature was only deprecated for 3 months. So you might want to consider that a feature is officially removed when one of the following events occurs last:

(note, this actually also occurs if you would only do a major release every 2 or 3 years)
What you propose is similar to what django does (if something is deprecated in eg 2.3 and the next release is 3.0, then the feature is only removed in 3.1 instead of 3.0).
I personally like this, and it was very briefly discussed at the hangout, but for now seemed to add more complexity than needed (if I remember this correctly)

You have to decide what constitutes a major vs. minor release. Is it based on time (e.g., once a year for a major release) or is it based on having a certain set of new features?

For now, the only thing that we decided is: it is based on having breaking changes (meaning: direct behavioural changes, or removals of deprecations).
That said, it is my hope that those breaking changes also come with new features (eg switching the default integer dtype to the nullable integer dtype).

With our current resources, I don't think we can make more concrete commitments ..

How do you want to handle dependencies on other libraries (numpy, matplotlib, etc.) where they introduce a change that causes something in pandas to break? So now you have to deprecate something in pandas because of that dependency.

Do you have a more concrete example in mind?
In general, I would say that if there is a change in pandas behaviour due to a change in eg numpy or matplotlib, which is outside of our control (in practice quite some changes in dependencies can be worked around in pandas itself), that should not have influence on our version policy.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Sep 16, 2019

@jorisvandenbossche

Or can you explain why taking a long time to deprecate something might be a bad thing? (except that it can be a burden to pandas development)

This is just a personal feeling as a user, but with the current policy of waiting 3 minor releases, and with the period of time over 3 minor releases being close to 2 years, there is sometimes a feeling of "you said it would be deprecated, so when is that happening already?" when you run into the same deprecated feature again and again. Now, I have to admit I was involved in commercial software that did releases every 4 months or so, so 3 releases would take about a year, so you didn't get that feeling with the commercial software.

You have to decide what constitutes a major vs. minor release. Is it based on time (e.g., once a year for a major release) or is it based on having a certain set of new features?

For now, the only thing that we decided is: it is based on having breaking changes (meaning: direct behavioural changes, or removals of deprecations).
That said, it is my hope that those breaking changes also come with new features (eg switching the default integer dtype to the nullable integer dtype).

If that's the case, I would suggest that you deprecate a feature over 3 releases, whether they are minor or major. Because you could get to a point where things are stabilized enough that you might have 5 or 6 minor releases between major ones, and then you are deciding to have a major release because of deprecations, and calling a release "major" without some really new features doesn't feel right. Again, I have some bias here having been in the commercial world.

How do you want to handle dependencies on other libraries (numpy, matplotlib, etc.) where they introduce a change that causes something in pandas to break? So now you have to deprecate something in pandas because of that dependency.

Do you have a more concrete example in mind?

Not really, but in the back of my mind, I recall seeing some changes made in matplotlib that forced a change in the pandas API that calls matplotlib. So now because of the matplotlib change, you have to deprecate a pandas API. Or a new numpy version came out and pandas didn't work with it, so the main developers had to do work to make things work with numpy.

One other point - is there a systematic way that the team is tracking when a feature was deprecated, so that you know when it is safe to remove?

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Sep 16, 2019 via email

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Sep 16, 2019

Each deprecation should be associated with a new issue for that
deprecation's remove. That issue will be assigned to the milestone it's
being removed it.

Has that been done in the past with consistency?

I know I did some PR's that deprecated behavior, and certainly didn't create an associated issue for the deprecated issue to be removed in the future!

If that hasn't been consistent, you might want to update the "Contributing to pandas" document to indicate that.

@TomAugspurger
Copy link
Contributor Author

That's a new (proposed) policy.

@jorisvandenbossche
Copy link
Member

Each deprecation should be associated with a new issue for that
deprecation's remove. That issue will be assigned to the milestone it's
being removed it.

@TomAugspurger I am personally not sure this is necessarily better than one issue per major release gathering all things that should be removed for that major release (a single issue keeps the issue number a bit more down, and gives an easy overview of what still needs to be done on this area for a given major release).
(but anyway, this is not a fundamental discussion for the actual policy ;))

One other point - is there a systematic way that the team is tracking when a feature was deprecated, so that you know when it is safe to remove?

@Dr-Irv the current way is #6581

there is sometimes a feeling of "you said it would be deprecated, so when is that happening already?" when you run into the same deprecated feature again and again.

My answer here is that in most cases, you should adjust your code to not run into the warning anymore. That's the main point of those warnings, to give you time to adjust instead of directly having the change (of course, there will always be some cases where this is not straightforward)
If you do that, a long deprecation period is not that much of a problem (from a user's perspective), and actually makes it easier for users that do not upgrade every release.

If that's the case, I would suggest that you deprecate a feature over 3 releases, whether they are minor or major.

That brings us back to the rolling deprecations, the alternative that has been discussed (which is certainly a valid alternative). But personally, I find the argument of "a major release for just deprecations is not worth a major release" not that important. Of course it would be nice to also have fancy new features (but that is hard to commit to with the current resources), and also, most new features can be introduced in a backwards compatible manner anyway (eg new methods like explode), so we will continue to do them in minor releases.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Sep 17, 2019 via email

@jorisvandenbossche
Copy link
Member

I don't remember if we already settled on one (I was just stating my preference ;))

@TomAugspurger
Copy link
Contributor Author

That brings us back to the rolling deprecations,

Yep. And on the last dev call there was consensus with this form of semver.

I don't remember if we already settled on one (I was just stating my preference ;))

:)


Planning to merge in 24 hours if there aren't any objections.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small textual comments


.. versionchanged:: 1.0.0

Pandas uses a variant of `SemVer`_ to govern deprecations, API compatibility, and version numbering.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Pandas uses a variant of `SemVer`_ to govern deprecations, API compatibility, and version numbering.
Pandas uses a loose variant of semantic versioning (`SemVer`_) to govern deprecations, API compatibility, and version numbering.

Whenever possible, a deprecation path will be provided rather than an outright breaking change.

Pandas will introduce deprecations in **minor** releases. These deprecations will
preserve the existing behavior while emitting a warning that provide guidance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
preserve the existing behavior while emitting a warning that provide guidance
preserve the existing behavior while emitting a warning that provide guidance on:


We will not introduce new deprecations in patch releases.

Deprecations will only be enforced in **major** releases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can maybe use a bit more explanation (I am not sure everybody reads "enforced" the same, I am myself sometimes confused about this terminolog). Attempt:

... This means that a deprecated behaviour which is introduced in a minor release (eg 1.2) will continue to work for other minor releases in that series (other 1.x releases), but raise a warning. The deprecation will then only be removed and the behaviour changed in the next major release.

1.0.0, 1.1.0, ...). Those deprecations will be *enforced* in the next major
release.

Note that *behavior changes* and *API breaking changes* are not identical. If we
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First note here that API breaking changes will only happen in major releases? (as that part of the policy is actually not explicitly mentioned here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"here" being this section, or this document?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be literally here (before the sentence that I commented on), but in the end it is mainly important it is mentioned in this document, if you see a better order.

So the comment is basically that the part of "breaking changes will only happen in major releases" (which is explained in the policies.rst document) is not explicitly mentioned in the whatsnew note.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I was confused by the diff. Thought we were still in policies.st.

@TomAugspurger TomAugspurger added this to the 1.0 milestone Sep 19, 2019
@TomAugspurger
Copy link
Contributor Author

Good to go here?

@WillAyd WillAyd merged commit 6acfc75 into pandas-dev:master Sep 20, 2019
@WillAyd
Copy link
Member

WillAyd commented Sep 20, 2019

Thanks Tom!

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
@simonjayhawkins simonjayhawkins mentioned this pull request Jul 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Admin Administrative tasks related to the pandas project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants