Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 0.22 tracking #2484

Closed
20 tasks done
bkamins opened this issue Oct 13, 2020 · 12 comments
Closed
20 tasks done

Release 0.22 tracking #2484

bkamins opened this issue Oct 13, 2020 · 12 comments
Labels
Milestone

Comments

@bkamins
Copy link
Member

bkamins commented Oct 13, 2020

Here I list things to do before 0.22 can be released.

A must (breaking):

@nalimilan, @pdeffebach - if you see something that is required more for 0.22 then please comment.

The objective is that after 0.22 release we should aim at 1.0 release soon without any breaking changes (0.22 should serve as testing period only + new functionality).

@bkamins bkamins added this to the 1.0 milestone Oct 13, 2020
@nalimilan
Copy link
Member

Just a note that we should decide whether we change the way missing values are handled by default. Not saying that we should, but we must be aware that if we release 0.22/1.0 without it then it won't happen until 2.0 (if at all).

@bkamins
Copy link
Member Author

bkamins commented Oct 15, 2020

Just a note that we should decide whether we change the way missing values are handled by default.

My understanding was the conclusion was to add where as I described above and this is the only change we need. What additional changes do you see required? (in particular I understand that we concluded that adding a functionality that conditionally updates/creates a column only on a subset of rows in a data frame is something that we decided is not crucial to have)

@bkamins
Copy link
Member Author

bkamins commented Oct 18, 2020

What additional changes do you see required?

@nalimilan + @pdeffebach : do you see any more changes required in 0.22 (except for the described above) in the area of handling missing values?

@nalimilan
Copy link
Member

My understanding was the conclusion was to add where as I described above and this is the only change we need. What additional changes do you see required? (in particular I understand that we concluded that adding a functionality that conditionally updates/creates a column only on a subset of rows in a data frame is something that we decided is not crucial to have)

One potential issue is that having where skip missing values but other functions like combine, select and transform not skip them would be a bit inconsistent. And while we can add where without breaking anything, changing the behavior of other functions would be breaking. So at least we should decide whether that inconsistency would be a problem before 0.22. Probably better keep discussing this at #2314.

I'm not sure the conclusion regarding what we should do was so clear (you're referring to #2314, right?). One simple and consistent solution that wouldn't break anything would be to say that by default missing values should never be skipped in DataFrames (not even with where), but that they should be skipped by all functions in DataFramesMeta.

@bkamins
Copy link
Member Author

bkamins commented Oct 22, 2020

OK - all required PRs for 0.22 release are now open

@bkamins
Copy link
Member Author

bkamins commented Oct 24, 2020

We should also decide on #2499 before 0.22 release.

@bkamins bkamins pinned this issue Nov 1, 2020
@bkamins
Copy link
Member Author

bkamins commented Nov 7, 2020

Today we made a major step towards 0.22. I assume @nalimilan will now work on #2459 and I rewrite manual and docstrings using the new backend. After these two PRs are merged I will open a PR bumping DataFrames.jl to 0.22 to check again if all works as expected.

From this moment, except #2459 we do feature freeze for 0.22 release so testing is welcome!

@bkamins
Copy link
Member Author

bkamins commented Nov 13, 2020

There is a proposal (which I like) that for 0.22 release all deprecated functionality should have force=true set (so that we make sure users see deprecation warnings). @deprecate does not support force=true, but I can make a substitute for this case. The question is if we agree with this approach.

@oxinabox if I recall correctly, you recommended to drop printing depwarns by default in Julia Base, so your opinion here would be welcome.

@oxinabox
Copy link
Contributor

oxinabox commented Nov 13, 2020

I don't love deprecation warnings being turned off.
It's not my preferred solution.
But the problem was that deprecation warnings can cause immense slow down.
It was effectively breaking for users of DataStructures.jl (LightGraphs and JuMP) because it was so much slower.
I actually ended up having to undeprecate it for them.

If you do force there is no way for the user to save themselves from it. Even if they know it isn't a problem.

They also are transitive to the user.
It makes sense to see the deprecation warnings for your direct dependencies.
It doesn't make sense for your indirect dependencies.

(I would really like to fix that using a LoggingExtras filter. Though I think Base needs to be changed to make sure to set the group on deprecation warnings so they can be identified. But it's not something I have had time to do yet).
A particularly bad case is any kind of long running production system with logs that people need to read.
Deprecation warnings are turned off for a production system, since it's a frozen Manifest in a docker container -- no chance of anything being updated by mistake.
And being flooded by deprecation warnings when you are trying to deal with a production crash is noone's idea of a good time.
(Not too hard to work around since such a logging system has its own filters hopefully. I know SumoLogic does)

Also: it's not a problem to use deprecated methods.
They still work.
We follow semver.
Users should check for deprecations before updating to a new breaking change.
Plus we have depwarn turn on in tests -- which is the place you go to check.

Also allows users to turn off deprecations for tests if they don't want to be spammed so they focus on their actual errors not on the deprecation spam.
Using force is going to be really confusing for folks at Invenia as we run two seperate nightly CI jobs for everything.
One with deprecations turned off, so we can focus on the package itself, and one with it set to error so we can know if it's to update anything.

@bkamins
Copy link
Member Author

bkamins commented Nov 13, 2020

@oxinabox - this is a valid point (and much appreciated insight from folks using DataFrames.jl in production). I agree that the ideal solution would be to show deprecation for direct dependencies and not show them for indirect ones.

Hopefully we will ship out 1.0 release rather soon (3 to 6 months) after 0.22 release, and all deprecations will be pruned in 1.0.

Having thought of this more note that the original problem with push! was old (actually very old) bad design that we fixed some time ago already. The deprecations in 0.22 are:

  • DataFrame! (I do not think people are going to be confused - actually users were confused that it exists)
  • categorical and categorical! (rarely used in practice, will not lead to errors - we simply stop depending on CategoricalArrays.jl in 1.0 release)
  • bunch of DataFrame constructors (again - not a "bad design", but we will simply stop supporting some, so this should be clean)
  • convert method from Matrix to DataFrame (same as above)

So all in all:

  1. for 0.22 release it is not a big deal to leave things as is given this evidence
  2. in general a solution in Julia Base for more fine-grained deprecation handling would be nice, but it is an issue for Julia Base

@nalimilan
Copy link
Member

Yes I agree it's not a big deal if people don't see the deprecations in 0.22. What we can do if we want is to keep some deprecations as plain errors in 1.0 to point users to the new syntax (that won't work for categorical though since we'd need to keep the dependency on CategoricalArrays).

@bkamins
Copy link
Member Author

bkamins commented Nov 14, 2020

What we can do if we want is to keep some deprecations as plain errors in 1.0

This is what we do in 0.22 with some of the deprecations where it made sense. I think we will do it also for 1.0 release (actually one of the first PRs after 0.22 release will be probably removing the deprecations to have a clean master).

@bkamins bkamins closed this as completed Nov 14, 2020
@bkamins bkamins unpinned this issue Nov 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants