-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BREAKING] remove median and nunique from describe by default #2339
Conversation
(in particular note that even if we start adding support to threading in DataFrames.jl - which is a plan according to the responses in https://discourse.julialang.org/t/dataframes-jl-development-survey/44022) still |
I have also switched
|
Dropping the number of unique values sounds fine, but I'm a bit more reluctant to drop the median. It's a more robust indicator than the mean and it would be too bad not to report it by default just because in some cases it will be slow. I assume in most cases it should be fast enough. FWIW, the carefully designed skimr R package has an interesting approach:
Minimum and maximum are reported as being the 0% and 100% percentiles, so the median is between these two. Additionally, the 25% and 75% quantiles are reported. (The rate of complete observations is somewhat redundant IMO.) What do you think? |
The problem is (for the same size of data):
so I would not include it for performance reasons. I will revert the |
0% and 100% quantiles can be computed using minimum and maximum as currently, so I wouldn't include them in the timing. Also, it turns out that |
Actually I thought that computing Anyway in this PR I would focus on dropping things (and I understand we agree to drop |
re-introduced |
Thank you! |
Fixes #2269
As decided there we just drop computing
:median
and:nunique
by default.This is simplest to change, and if someone wants them it is easy to opt-in.
Just to get a relative impact on performance of dropping this consider:
which when you have even several dozens of variables only starts to be prohibitive.