Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: DataFrame.get_dtype_counts #27145

Merged
merged 27 commits into from
Jul 3, 2019
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -616,6 +616,7 @@ Other deprecations
- :attr:`Series.imag` and :attr:`Series.real` are deprecated. (:issue:`18262`)
- :meth:`Series.put` is deprecated. (:issue:`18262`)
- :meth:`Index.item` and :meth:`Series.item` is deprecated. (:issue:`18262`)
- :meth:`DataFrame.get_dtype_counts` is deprecated. (:issue:`18262`)

.. _whatsnew_0250.prior_deprecations:

Expand Down
6 changes: 3 additions & 3 deletions pandas/core/computation/expressions.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,11 +79,11 @@ def _can_use_numexpr(op, op_str, a, b, dtype_check):
# check for dtype compatibility
dtypes = set()
for o in [a, b]:
if hasattr(o, 'get_dtype_counts'):
s = o.get_dtype_counts()
if hasattr(o, '_data'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can prob just do
o.dtypes.value_counts() (may need a hasattr)

s = o._data.get_dtype_counts()
if len(s) > 1:
return False
dtypes |= set(s.index)
dtypes |= set(s.keys())
elif isinstance(o, np.ndarray):
dtypes |= {o.dtype.name}

Expand Down
2 changes: 1 addition & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -2325,7 +2325,7 @@ def _sizeof_fmt(num, size_qualifier):
else:
_verbose_repr()

counts = self.get_dtype_counts()
counts = self._data.get_dtype_counts()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is okay. It's internal usage and slightly more performant I would think than dtype.value_counts() (left as a dictionary as opposed to constructing the Series)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove get_dtype_counts() from blocks its unecessary as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to be needed to get the dtypes later on for info?

dtypes = ['{k}({kk:d})'.format(k=k[0], kk=k[1]) for k
in sorted(counts.items())]
lines.append('dtypes: {types}'.format(types=', '.join(dtypes)))
Expand Down
3 changes: 3 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -5290,6 +5290,9 @@ def get_dtype_counts(self):
object 1
dtype: int64
"""
warnings.warn("`get_dtype_counts` has been deprecated and will be "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the docstring and add deprecated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we recommend .dtypes.value_counts() here instead? Or... we're in generic.py so that may be too hard?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah unfortunately that solution does not work for Series, but I could add for DataFrames use .dtypes.value_counts()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, just need something as a replacement (may also want to add in the doc-string itself)

"removed in a future version.", FutureWarning,
stacklevel=2)
from pandas import Series
return Series(self._data.get_dtype_counts())

Expand Down
5 changes: 3 additions & 2 deletions pandas/tests/frame/test_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -433,9 +433,10 @@ def test_with_datetimelikes(self):
'B': timedelta_range('1 day', periods=10)})
t = df.T

result = t.get_dtype_counts()
#result = Series(t._data.get_dtype_counts())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment here

result = t.dtypes.value_counts()
if self.klass is DataFrame:
expected = Series({'object': 10})
expected = Series({np.dtype('object'): 10})
else:
expected = Series({'Sparse[object, nan]': 10})
tm.assert_series_equal(result, expected)
Expand Down
8 changes: 4 additions & 4 deletions pandas/tests/frame/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,8 +273,8 @@ def test_df_flex_cmp_constant_return_types(self, opname):
df = pd.DataFrame({'x': [1, 2, 3], 'y': [1., 2., 3.]})
const = 2

result = getattr(df, opname)(const).get_dtype_counts()
tm.assert_series_equal(result, pd.Series([2], ['bool']))
result = getattr(df, opname)(const).dtypes.value_counts()
tm.assert_series_equal(result, pd.Series([2], index=[np.dtype(bool)]))

@pytest.mark.parametrize('opname', ['eq', 'ne', 'gt', 'lt', 'ge', 'le'])
def test_df_flex_cmp_constant_return_types_empty(self, opname):
Expand All @@ -283,8 +283,8 @@ def test_df_flex_cmp_constant_return_types_empty(self, opname):
const = 2

empty = df.iloc[:0]
result = getattr(empty, opname)(const).get_dtype_counts()
tm.assert_series_equal(result, pd.Series([2], ['bool']))
result = getattr(empty, opname)(const).dtypes.value_counts()
tm.assert_series_equal(result, pd.Series([2], index=[np.dtype(bool)]))


# -------------------------------------------------------------------
Expand Down
25 changes: 14 additions & 11 deletions pandas/tests/frame/test_block_internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -217,19 +217,21 @@ def test_construction_with_mixed(self, float_string_frame):
df = DataFrame(data)

# check dtypes
result = df.get_dtype_counts().sort_values()
result = df.dtypes
expected = Series({'datetime64[ns]': 3})

# mixed-type frames
float_string_frame['datetime'] = datetime.now()
float_string_frame['timedelta'] = timedelta(days=1, seconds=1)
assert float_string_frame['datetime'].dtype == 'M8[ns]'
assert float_string_frame['timedelta'].dtype == 'm8[ns]'
result = float_string_frame.get_dtype_counts().sort_values()
expected = Series({'float64': 4,
'object': 1,
'datetime64[ns]': 1,
'timedelta64[ns]': 1}).sort_values()
result = float_string_frame.dtypes
expected = Series([np.dtype('float64')] * 4 +
[np.dtype('object'),
np.dtype('datetime64[ns]'),
np.dtype('timedelta64[ns]')],
index=list('ABCD') + ['foo', 'datetime',
'timedelta'])
assert_series_equal(result, expected)

def test_construction_with_conversions(self):
Expand Down Expand Up @@ -409,11 +411,12 @@ def test_get_numeric_data(self):
df = DataFrame({'a': 1., 'b': 2, 'c': 'foo',
'f': Timestamp('20010102')},
index=np.arange(10))
result = df.get_dtype_counts()
expected = Series({'int64': 1, 'float64': 1,
datetime64name: 1, objectname: 1})
result = result.sort_index()
expected = expected.sort_index()
result = df.dtypes
expected = Series([np.dtype('float64'),
np.dtype('int64'),
np.dtype(objectname),
np.dtype(datetime64name)],
index=['a', 'b', 'c', 'f'])
assert_series_equal(result, expected)

df = DataFrame({'a': 1., 'b': 2, 'c': 'foo',
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/frame/test_combine_concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def test_concat_multiple_frames_dtypes(self):
A = DataFrame(data=np.ones((10, 2)), columns=[
'foo', 'bar'], dtype=np.float64)
B = DataFrame(data=np.ones((10, 2)), dtype=np.float32)
results = pd.concat((A, B), axis=1).get_dtype_counts()
results = Series(pd.concat((A, B), axis=1)._data.get_dtype_counts())
expected = Series(dict(float64=2, float32=2))
assert_series_equal(results, expected)

Expand Down
36 changes: 18 additions & 18 deletions pandas/tests/frame/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -1579,7 +1579,7 @@ def test_constructor_with_datetimes(self):
'D': Timestamp("20010101"),
'E': datetime(2001, 1, 2, 0, 0)},
index=np.arange(10))
result = df.get_dtype_counts()
result = Series(df._data.get_dtype_counts())
expected = Series({'int64': 1, datetime64name: 2, objectname: 2})
result.sort_index()
expected.sort_index()
Expand All @@ -1591,7 +1591,7 @@ def test_constructor_with_datetimes(self):
floatname: np.array(1., dtype=floatname),
intname: np.array(1, dtype=intname)},
index=np.arange(10))
result = df.get_dtype_counts()
result = Series(df._data.get_dtype_counts())
expected = {objectname: 1}
if intname == 'int64':
expected['int64'] = 2
Expand All @@ -1613,7 +1613,7 @@ def test_constructor_with_datetimes(self):
floatname: np.array([1.] * 10, dtype=floatname),
intname: np.array([1] * 10, dtype=intname)},
index=np.arange(10))
result = df.get_dtype_counts()
result = Series(df._data.get_dtype_counts())
result = result.sort_index()
tm.assert_series_equal(result, expected)

Expand All @@ -1623,7 +1623,7 @@ def test_constructor_with_datetimes(self):
datetime_s = Series(datetimes)
assert datetime_s.dtype == 'M8[ns]'
df = DataFrame({'datetime_s': datetime_s})
result = df.get_dtype_counts()
result = Series(df._data.get_dtype_counts())
expected = Series({datetime64name: 1})
result = result.sort_index()
expected = expected.sort_index()
Expand All @@ -1634,7 +1634,7 @@ def test_constructor_with_datetimes(self):
datetimes = [ts.to_pydatetime() for ts in ind]
dates = [ts.date() for ts in ind]
df = DataFrame({'datetimes': datetimes, 'dates': dates})
result = df.get_dtype_counts()
result = Series(df._data.get_dtype_counts())
expected = Series({datetime64name: 1, objectname: 1})
result = result.sort_index()
expected = expected.sort_index()
Expand Down Expand Up @@ -1693,7 +1693,7 @@ def test_constructor_datetimes_with_nulls(self):
for arr in [np.array([None, None, None, None,
datetime.now(), None]),
np.array([None, None, datetime.now(), None])]:
result = DataFrame(arr).get_dtype_counts()
result = Series(DataFrame(arr)._data.get_dtype_counts())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don’t move it to a private method

use .dtypes.value_counts()

expected = Series({'datetime64[ns]': 1})
tm.assert_series_equal(result, expected)

Expand All @@ -1706,49 +1706,49 @@ def test_constructor_for_list_with_dtypes(self):

# test list of lists/ndarrays
df = DataFrame([np.arange(5) for x in range(5)])
result = df.get_dtype_counts()
result = Series(df._data.get_dtype_counts())
expected = Series({'int64': 5})

df = DataFrame([np.array(np.arange(5), dtype='int32')
for x in range(5)])
result = df.get_dtype_counts()
result = Series(df._data.get_dtype_counts())
expected = Series({'int32': 5})

# overflow issue? (we always expecte int64 upcasting here)
df = DataFrame({'a': [2 ** 31, 2 ** 31 + 1]})
result = df.get_dtype_counts()
result = Series(df._data.get_dtype_counts())
expected = Series({'int64': 1})
tm.assert_series_equal(result, expected)

# GH #2751 (construction with no index specified), make sure we cast to
# platform values
df = DataFrame([1, 2])
result = df.get_dtype_counts()
result = Series(df._data.get_dtype_counts())
expected = Series({'int64': 1})
tm.assert_series_equal(result, expected)

df = DataFrame([1., 2.])
result = df.get_dtype_counts()
result = Series(df._data.get_dtype_counts())
expected = Series({'float64': 1})
tm.assert_series_equal(result, expected)

df = DataFrame({'a': [1, 2]})
result = df.get_dtype_counts()
result = Series(df._data.get_dtype_counts())
expected = Series({'int64': 1})
tm.assert_series_equal(result, expected)

df = DataFrame({'a': [1., 2.]})
result = df.get_dtype_counts()
result = Series(df._data.get_dtype_counts())
expected = Series({'float64': 1})
tm.assert_series_equal(result, expected)

df = DataFrame({'a': 1}, index=range(3))
result = df.get_dtype_counts()
result = Series(df._data.get_dtype_counts())
expected = Series({'int64': 1})
tm.assert_series_equal(result, expected)

df = DataFrame({'a': 1.}, index=range(3))
result = df.get_dtype_counts()
result = Series(df._data.get_dtype_counts())
expected = Series({'float64': 1})
tm.assert_series_equal(result, expected)

Expand All @@ -1757,7 +1757,7 @@ def test_constructor_for_list_with_dtypes(self):
'c': list('abcd'),
'd': [datetime(2000, 1, 1) for i in range(4)],
'e': [1., 2, 4., 7]})
result = df.get_dtype_counts()
result = Series(df._data.get_dtype_counts())
expected = Series(
{'int64': 1, 'float64': 2, datetime64name: 1, objectname: 1})
result = result.sort_index()
Expand Down Expand Up @@ -2077,14 +2077,14 @@ def test_from_records_misc_brokenness(self):
rows.append([datetime(2010, 1, 1), 1])
rows.append([datetime(2010, 1, 2), 'hi']) # test col upconverts to obj
df2_obj = DataFrame.from_records(rows, columns=['date', 'test'])
results = df2_obj.get_dtype_counts()
results = Series(df2_obj._data.get_dtype_counts())
expected = Series({'datetime64[ns]': 1, 'object': 1})

rows = []
rows.append([datetime(2010, 1, 1), 1])
rows.append([datetime(2010, 1, 2), 1])
df2_obj = DataFrame.from_records(rows, columns=['date', 'test'])
results = df2_obj.get_dtype_counts().sort_index()
results = Series(df2_obj._data.get_dtype_counts()).sort_index()
expected = Series({'datetime64[ns]': 1, 'int64': 1})
tm.assert_series_equal(results, expected)

Expand Down
6 changes: 3 additions & 3 deletions pandas/tests/frame/test_dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -836,23 +836,23 @@ def test_timedeltas(self):
df = DataFrame(dict(A=Series(date_range('2012-1-1', periods=3,
freq='D')),
B=Series([timedelta(days=i) for i in range(3)])))
result = df.get_dtype_counts().sort_index()
result = Series(df._data.get_dtype_counts()).sort_index()
expected = Series(
{'datetime64[ns]': 1, 'timedelta64[ns]': 1}).sort_index()
assert_series_equal(result, expected)

df['C'] = df['A'] + df['B']
expected = Series(
{'datetime64[ns]': 2, 'timedelta64[ns]': 1}).sort_values()
result = df.get_dtype_counts().sort_values()
result = Series(df._data.get_dtype_counts()).sort_values()
assert_series_equal(result, expected)

# mixed int types
df['D'] = 1
expected = Series({'datetime64[ns]': 2,
'timedelta64[ns]': 1,
'int64': 1}).sort_values()
result = df.get_dtype_counts().sort_values()
result = Series(df._data.get_dtype_counts()).sort_values()
assert_series_equal(result, expected)

def test_arg_for_errors_in_astype(self):
Expand Down
10 changes: 5 additions & 5 deletions pandas/tests/frame/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -300,14 +300,14 @@ def test_getitem_boolean_casting(self, datetime_frame):
df['F1'] = df['F'].copy()

casted = df[df > 0]
result = casted.get_dtype_counts()
result = Series(casted._data.get_dtype_counts())
expected = Series({'float64': 4, 'int32': 2, 'int64': 2})
assert_series_equal(result, expected)

# int block splitting
df.loc[df.index[1:3], ['E1', 'F1']] = 0
casted = df[df > 0]
result = casted.get_dtype_counts()
result = Series(casted._data.get_dtype_counts())
expected = Series({'float64': 6, 'int32': 1, 'int64': 1})
assert_series_equal(result, expected)

Expand Down Expand Up @@ -615,7 +615,7 @@ def test_setitem_cast(self, float_frame):
df = DataFrame(np.random.rand(30, 3), columns=tuple('ABC'))
df['event'] = np.nan
df.loc[10, 'event'] = 'foo'
result = df.get_dtype_counts().sort_values()
result = Series(df._data.get_dtype_counts()).sort_values()
expected = Series({'float64': 3, 'object': 1}).sort_values()
assert_series_equal(result, expected)

Expand Down Expand Up @@ -1614,7 +1614,7 @@ def test_setitem_single_column_mixed_datetime(self):
df['timestamp'] = Timestamp('20010102')

# check our dtypes
result = df.get_dtype_counts()
result = Series(df._data.get_dtype_counts())
expected = Series({'float64': 3, 'datetime64[ns]': 1})
assert_series_equal(result, expected)

Expand Down Expand Up @@ -2637,7 +2637,7 @@ def _check_get(df, cond, check_dtypes=True):
for c in ['float32', 'float64',
'int32', 'int64']})
df.iloc[1, :] = 0
result = df.where(df >= 0).get_dtype_counts()
result = Series(df.where(df >= 0)._data.get_dtype_counts())

# when we don't preserve boolean casts
#
Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/frame/test_missing.py
Original file line number Diff line number Diff line change
Expand Up @@ -407,13 +407,13 @@ def test_fillna_downcast(self):
def test_fillna_dtype_conversion(self):
# make sure that fillna on an empty frame works
df = DataFrame(index=["A", "B", "C"], columns=[1, 2, 3, 4, 5])
result = df.get_dtype_counts().sort_values()
result = Series(df._data.get_dtype_counts()).sort_values()
expected = Series({'object': 5})
assert_series_equal(result, expected)

result = df.fillna(1)
expected = DataFrame(1, index=["A", "B", "C"], columns=[1, 2, 3, 4, 5])
result = result.get_dtype_counts().sort_values()
result = Series(result._data.get_dtype_counts()).sort_values()
expected = Series({'int64': 5})
assert_series_equal(result, expected)

Expand Down
12 changes: 9 additions & 3 deletions pandas/tests/frame/test_mutate_columns.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,16 +159,22 @@ def test_insert(self):
# new item
df['x'] = df['a'].astype('float32')
result = Series(dict(float32=1, float64=5))
assert (df.get_dtype_counts().sort_index() == result).all()
assert (Series(
df._data.get_dtype_counts()
).sort_index() == result).all()

# replacing current (in different block)
df['a'] = df['a'].astype('float32')
result = Series(dict(float32=2, float64=4))
assert (df.get_dtype_counts().sort_index() == result).all()
assert (Series(
df._data.get_dtype_counts()
).sort_index() == result).all()

df['y'] = df['a'].astype('int32')
result = Series(dict(float32=2, float64=4, int32=1))
assert (df.get_dtype_counts().sort_index() == result).all()
assert (Series(
df._data.get_dtype_counts()
).sort_index() == result).all()

with pytest.raises(ValueError, match='already exists'):
df.insert(1, 'a', df['b'])
Expand Down
Loading