-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrame.describe(include=['O']) should include categorical columns #16722
Comments
why? you are mixing 2 distinct and different data types. Better to be much more explicit. |
The types are very closely related and they have the same output (count, unique, top and freq). Perhaps the bigger issue is with the wording of the docstrings. It literally uses the word 'categorical'. Maybe there could be an addition of a new word for 'category-like' data which would include object, category, bool and datetime. |
yeah a re-wording of the doc-string with an example (similar to yours) would be helpful. a PR would be great. |
There is still some inconsistency. Also, If the dataframe consists of only categoricals and objects then |
|
Agreed that we should clarify the docs here, and not change the behavior. It is inconsistent that |
Agreed, behavior should stay the same. Still wish there was something analogous to |
Code Sample, a copy-pastable example if possible
Problem description
For me it makes sense that when using
describe
onobject
data types, categorical data types should be included as well in the output. The docstrings even use the word categorical: "To limit it instead to categorical objects submit thenumpy.object
data type." It might make sense to add booleans and datetimes as well.Expected Output
The result should mimic the output of
df.describe(include=['O', 'category'])
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.2
pytest: 3.0.7
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.25.2
numpy: 1.13.0
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: 1.5.5
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: 0.3.0.post
The text was updated successfully, but these errors were encountered: