-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH/API: accept list-like percentiles in describe (WIP) #7088
ENH/API: accept list-like percentiles in describe (WIP) #7088
Conversation
I'm close on this... a quick question though. There's also a describe for object types (strs or datetime). Notice the different index order for the last one (this is all current behavior): # strs only
In [31]: df2 = pd.DataFrame({"C2": ['a', 'a', 'b', 'c']})
In [32]: df2.describe()
Out[32]:
C2
count 4
unique 3
top a
freq 2
[4 rows x 1 columns]
# datetime only
In [28]: df = DataFrame({"C1": pd.date_range('2010-01-01', periods=4, freq='D')})
In [29]: df
Out[29]:
C1
0 2010-01-01
1 2010-01-02
2 2010-01-03
3 2010-01-04
[4 rows x 1 columns]
In [30]: df.describe()
Out[30]:
C1
count 4
unique 4
first 2010-01-01 00:00:00
last 2010-01-04 00:00:00
top 2010-01-01 00:00:00
freq 1
[6 rows x 1 columns]
# mix of timestamp and strs
In [33]: df = pd.concat([df, df2], axis=1)
In [35]: df.describe()
Out[35]:
C1 C2
count 4 4
first 2010-01-01 00:00:00 NaN
freq 1 2
last 2010-01-04 00:00:00 NaN
top 2010-01-01 00:00:00 a
unique 4 3
[6 rows x 2 columns] So the index gets sorted. Is it worth breaking backwards compat to keep the index in a sensible order? I'm not sure. |
yeh these should be in a sensible order I think |
Moved to generic (I'm not sure it was worth it; the code got pretty messy with a bunch of if / else.), updated docs. Should be good when travis says so. |
ok, in theory you can put tests in test_generic.py (you can do specific tests or have it create them generically) |
# dtypes: numeric only, numeric mixed, objects only | ||
data = self._get_numeric_data() | ||
if self.ndim > 1: | ||
if len(data.columns) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do this as len(data._info_axis)
@jreback was there anything else you saw here? I think it''s ready. |
didn't realize their is an argument percentile_width I think u should just rename it to percentiles and make it what u have for percentiles (and if it's a scalar then the meaning is unchanged) I think too confusing with that argument (which is prob not used much at all) - yours is much more useful |
Should we do any warning / deprecation? I should be able to handle that very easily. On May 11, 2014, at 3:46 PM, "jreback" <notifications@github.commailto:notifications@github.com> wrote: didn't realize their is an argument percentile_width I think u should just rename it to percentiles and make it what u have for percentiles (and if it's a scalar then the meaning is unchanged) I think too confusing with that argument (which is prob not used much at all) - yours is much more useful — |
sure why don't I deprecate perentile_width and replace with percentile otherwise functionality is the same |
Added a note about this deprecation to ##6581. Anything else? |
@jorisvandenbossche Does my deprecation note here look ok? That's how the numpy guide said to do it for objects. I assumed it was similar for keyword arguments. |
@@ -3478,6 +3478,152 @@ def _convert_timedeltas(x): | |||
|
|||
return np.abs(self) | |||
|
|||
_shared_docs['describe'] = """ | |||
Generate various summary statistics of self, excluding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the of self
is not very clear for people not knowing the self-concept, maybe just leave it out?
Some comments:
|
@jorisvandenbossche thanks for the comments. I was hoping that accepting both percentages and raw decimals would be less confusing, since I can never remember which we expect. I actually had a longer reply written and then I realized why it was confusing. I'll switch it back to just expecting decimals between |
@TomAugspurger I think |
BTW, nice and informative FutureWarning! +1 |
ENH/API: accept list-like percentiles in describe (WIP)
@TomAugspurger why are you using
On windows
|
Ahh I missed that one. I'll switch it over to use value counts and fix the test so that it isn't ambiguous. |
awesome just put up a pr and I can test |
Closes #4196
This is for frames. I'm going to refactor this into generic since to cover series / frames.
A couple questions:
percentiles
. For backwards compat, we keep thepercentile_width
kwarg. I changed the defaultpercentile_width
from 50 toNone
(but the default output is the same) Cases:percentile_width
andpercentiles
->ValueError
percentile_width
norpercentiles
->percentile_width
set to 50 and same as before.quantile
to be more consistent?