Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: DataFrame.get_dtype_counts #27145

Merged
merged 27 commits into from
Jul 3, 2019

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented Jun 30, 2019

Unfortunately, .dtype.value_counts() isn't a perfect replacement for get_value_counts() (IMO it's better), so the test replacements calls the internal representation:

In [2]: df = pd.DataFrame([[1, 1.5, pd.Timestamp.now(), True]])

In [3]: df
Out[3]:
   0    1                          2     3
0  1  1.5 2019-06-30 11:55:24.380126  True

In [7]: df.get_dtype_counts().index
Out[7]: Index(['float64', 'int64', 'datetime64[ns]', 'bool'], dtype='object')

In [8]: df.dtypes.value_counts().index
Out[8]: Index([datetime64[ns], bool, float64, int64], dtype='object')

@mroeschke mroeschke added the Deprecate Functionality to remove in pandas label Jun 30, 2019
@pep8speaks
Copy link

pep8speaks commented Jun 30, 2019

Hello @mroeschke! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-07-03 02:19:59 UTC

@@ -1693,7 +1693,7 @@ def test_constructor_datetimes_with_nulls(self):
for arr in [np.array([None, None, None, None,
datetime.now(), None]),
np.array([None, None, datetime.now(), None])]:
result = DataFrame(arr).get_dtype_counts()
result = Series(DataFrame(arr)._data.get_dtype_counts())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don’t move it to a private method

use .dtypes.value_counts()

@@ -79,11 +79,11 @@ def _can_use_numexpr(op, op_str, a, b, dtype_check):
# check for dtype compatibility
dtypes = set()
for o in [a, b]:
if hasattr(o, 'get_dtype_counts'):
s = o.get_dtype_counts()
if hasattr(o, '_data'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can prob just do
o.dtypes.value_counts() (may need a hasattr)

@@ -433,9 +433,10 @@ def test_with_datetimelikes(self):
'B': timedelta_range('1 day', periods=10)})
t = df.T

result = t.get_dtype_counts()
#result = Series(t._data.get_dtype_counts())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment here

@@ -5290,6 +5290,9 @@ def get_dtype_counts(self):
object 1
dtype: int64
"""
warnings.warn("`get_dtype_counts` has been deprecated and will be "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the docstring and add deprecated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we recommend .dtypes.value_counts() here instead? Or... we're in generic.py so that may be too hard?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah unfortunately that solution does not work for Series, but I could add for DataFrames use .dtypes.value_counts()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, just need something as a replacement (may also want to add in the doc-string itself)

@jreback jreback added this to the 0.25.0 milestone Jul 1, 2019
@jreback jreback mentioned this pull request Jul 1, 2019
34 tasks
@mroeschke
Copy link
Member Author

All green.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 small change & doc-string updates, otherwise lgtm.

@@ -2326,7 +2326,7 @@ def _sizeof_fmt(num, size_qualifier):
else:
_verbose_repr()

counts = self.get_dtype_counts()
counts = self._data.get_dtype_counts()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is okay. It's internal usage and slightly more performant I would think than dtype.value_counts() (left as a dictionary as opposed to constructing the Series)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove get_dtype_counts() from blocks its unecessary as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to be needed to get the dtypes later on for info?

@@ -5290,6 +5290,9 @@ def get_dtype_counts(self):
object 1
dtype: int64
"""
warnings.warn("`get_dtype_counts` has been deprecated and will be "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, just need something as a replacement (may also want to add in the doc-string itself)

@jreback jreback merged commit 54c0d5a into pandas-dev:master Jul 3, 2019
@jreback
Copy link
Contributor

jreback commented Jul 3, 2019

thanks @mroeschke

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants