Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: deprecate .add_prefix and .add_suffix #18347

Closed

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Nov 17, 2017

  • [x ] xref DEPR: let's deprecate #18262
  • [ x] tests added / passed
  • [ x] passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • [ x] whatsnew entry

Deprecate .add_prefix and .add_suffix as per #18262.

An issue is that .add_prefix and .add_suffix call the like-named methods in internals.py::BlockManager. Should BlockManager.add_prefix/add_suffix be changed somehow, e.g. have a warning as well?

EDIT: This is my first deprecation PR, so I'll be more than appreciative of criticism, so I understand these better in the future.

EDIT 2: With .add_prefix/add_postfix deprecated, I think it would be good to show in the doc strings for .rename how to do similar things, so I've changed its doc string a bit.

@codecov
Copy link

codecov bot commented Nov 17, 2017

Codecov Report

Merging #18347 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18347      +/-   ##
==========================================
- Coverage   91.59%   91.57%   -0.03%     
==========================================
  Files         150      150              
  Lines       48959    48961       +2     
==========================================
- Hits        44843    44835       -8     
- Misses       4116     4126      +10
Flag Coverage Δ
#multiple 89.93% <100%> (-0.03%) ⬇️
#single 41.13% <33.33%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 95.92% <100%> (ø) ⬆️
pandas/plotting/_converter.py 65.22% <0%> (-1.74%) ⬇️
pandas/util/testing.py 84.9% <0%> (+0.21%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cdebcf3...3a5785a. Read the comment docs.

@topper-123 topper-123 force-pushed the deprecate_prefix_postfix branch 3 times, most recently from dbc445e to b125c7b Compare November 17, 2017 20:12
@topper-123 topper-123 changed the title WIP: deprecate add_prefix and add_postfix DEPR: deprecate .add_prefix and .add_sufffix Nov 17, 2017
@topper-123 topper-123 force-pushed the deprecate_prefix_postfix branch 3 times, most recently from 59ff13b to eff7b11 Compare November 17, 2017 21:31
@gfyoung gfyoung added the Deprecate Functionality to remove in pandas label Nov 17, 2017
with_pct_suffix = self.frame.add_suffix('%')
expected = pd.Index(['{}%'.format(c) for c in self.frame.columns])
tm.assert_index_equal(with_pct_suffix.columns, expected)
with pytest.warns(FutureWarning):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback @jorisvandenbossche : Are we still going with tm.assert_produces_warning, or is using the pytest construct okay to use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I prefer to use our tm.assert_produces_warning for now (if we change then we should simply globally change, but in a separate set of PRs)

@topper-123 topper-123 mentioned this pull request Nov 18, 2017
34 tasks
with_pct_suffix = self.frame.add_suffix('%')
expected = pd.Index(['{}%'.format(c) for c in self.frame.columns])
tm.assert_index_equal(with_pct_suffix.columns, expected)
with pytest.warns(FutureWarning):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I prefer to use our tm.assert_produces_warning for now (if we change then we should simply globally change, but in a separate set of PRs)

0 1 4
1 2 5
2 3 6
>>> df.rename(index="index_{}".format, columns={"A": "a", "B": "c"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any use of .add_prefix/suffix in the main docs? if so can you update

Copy link
Contributor Author

@topper-123 topper-123 Nov 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not used in the docs. It's shown in 10min_to_pandas as a result of tab completion, but that should stay until removal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no please change the docs now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I miswrote: it's not in the docs, expect as a result of that tab comletion example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh these need to be added to
_deprecations in series and dataframe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, added.

@jreback
Copy link
Contributor

jreback commented Nov 19, 2017

@jorisvandenbossche you had some concerns here?

@topper-123 topper-123 force-pushed the deprecate_prefix_postfix branch 3 times, most recently from 79ab0aa to db08605 Compare November 19, 2017 20:03
@@ -304,7 +304,8 @@ def _constructor(self):

_constructor_sliced = Series
_deprecations = NDFrame._deprecations | frozenset(
['sortlevel', 'get_value', 'set_value', 'from_csv'])
['sortlevel', 'get_value', 'set_value', 'from_csv', 'add_prefix',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead add to NDFrame._deprecations (which are both used in Series/Frame)

@jorisvandenbossche
Copy link
Member

@jorisvandenbossche you had some concerns here?

Yes, as I mentioned in the other issue, not sure if we should deprecate this one. Based on stackoverflow questions (and doc page views) this is actually used a bit (in contrast to eg asobject)

@jreback
Copy link
Contributor

jreback commented Nov 19, 2017

Yes, as I mentioned in the other issue, not sure if we should deprecate this one. Based on stackoverflow questions (and doc page views) this is actually used a bit (in contrast to eg asobject)

where do you see doc page views?

@topper-123 topper-123 force-pushed the deprecate_prefix_postfix branch from db08605 to 3997642 Compare November 19, 2017 20:33
@tdpetrou
Copy link
Contributor

tdpetrou commented Nov 22, 2017

They are used fairly often on stack overflow. I use them. You can see about 20 posts with them this month on stack overflow.

That said, personally, I'd like to shrink pandas as much as possible and wouldn't mind seeing them gone.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to give feedback on how you did the deprecation: added two small comments, but it's certainly good!
(so go ahead with some less controversial deprecations :-))

That said, I am still in doubt we should do this one.

"""
warnings.warn("'add_prefix' is deprecated and will be removed in a "
"future version.", FutureWarning, stacklevel=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention the alternative in the message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -2590,6 +2595,8 @@ def _update_inplace(self, result, verify_is_copy=True):

def add_prefix(self, prefix):
"""
DEPRECATED: Use ``obj.rename('prefix_{}'.format)`` or similar instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit in doubt whether I prefer this alternative or rather obj.rename(lambda x: "prefix_" + x) or obj.rename(lambda x: 'prefix_{}'.format(x)) (it's just that I think such a bare format is not that easy to understand for beginners that it is a function, but of course lambda's are also a strange beast)

BTW, you need to add columns= to have the correct alternative I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree here, using the lambda is a bit simpler for beginners.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done.

@topper-123
Copy link
Contributor Author

@jorisvandenbossche: Ok, I can see that pople use them, but they're just so useless. There was at one point a discussion on adding a .format method to pandas objects, which would solve the same problem as add_prefix/-suffix but be more general. So:

>>> df.columns = df.columns.format("prefix_{}_suffix")
>>> df.columns.format("prefix_{}_suffix", inplace=True)  # equivalent, but inplace

This would be much more generally useful than add_prefix/-suffix, and would operate on the relevant object, which IMO is often more readable than methods on the dataframe.

Just an idea, as I could often use such methods and add_prefix/-suffix are too limiting and apply feels 'dirty', for a lack of better words.

@topper-123 topper-123 force-pushed the deprecate_prefix_postfix branch from 3997642 to 1e83d89 Compare November 24, 2017 21:51
@jreback
Copy link
Contributor

jreback commented Nov 24, 2017

We actually have this almost already

In [13]: i = pd.Index(list('abc'))

In [15]: i.str.cat(['foo'] * 3, sep='_')
Out[15]: Index(['a_foo', 'b_foo', 'c_foo'], dtype='object')

wouldn't be a stretch to allow a single element here, e.g.
i.str.cat('foo'), or maybe i.str.suffix() and i.str.prefix()

these are much more appealing that a method on Series/DataFrame

@topper-123
Copy link
Contributor Author

these are much more appealing that a method on Series/DataFrame.

I agree, though two points:

  • .str accessor only works on indices of type object (prob. specifically str, haven't checked), while add_prefix/suffix work for all data types. So for e.g. a float Index, you'd still have to do i.apply('prefix_{:+.1f}_suffix'.format), which would mean different operations for str vs. non-str data types, which isn't good.
  • both str.cat and potential str.prefix/suffix are/would be quite specific. A .format method would be more general and would map very well over to people's general Python knowledge, i.e. their knowledge of the Python string formatting language would transfer well over to Pandas, which would be nice.

So, I'd be the most positive to adding a str.format method, but not str.add_prefix/-suffix. But I'd also like some sanctioned but still flexible way to cast non-string types to string dtype, as astype(str) is too crude (lacks string formatting options) and using .apply feels unsanctioned and dirty.

@jreback jreback changed the title DEPR: deprecate .add_prefix and .add_sufffix DEPR: deprecate .add_prefix and .add_suffix Nov 25, 2017
@jreback
Copy link
Contributor

jreback commented Nov 25, 2017

.str accessor only works on indices of type object (prob. specifically str, haven't checked), while add_prefix/suffix work for all data types. So for e.g. a float Index, you'd still have to do i.apply('prefix_{:+.1f}_suffix'.format), which would mean different operations for str vs. non-str data types, which isn't good.

This is by-definition and on-purpose. string methods should not work on non-strings, full-stop.

We also want composability. So and I think there is an issue about this somewhere, having a generic .format() method that is basically an .astype(str), or maybe .astype(str, format=....) is prob ok (IIRC we had an issue about this somewhere, @jorisvandenbossche ?)

@topper-123
Copy link
Contributor Author

topper-123 commented Nov 26, 2017

(IIRC we had an issue about this somewhere, @jorisvandenbossche ?)

I posted an issue #17211 on this.

@jorisvandenbossche
Copy link
Member

As I said in the issue some months ago, I am +1 on adding a .format method. However, I think that is more generally usefu and somewhat orthogonal to this one. Eg it does not allow chaining method calls (for that rename would still be the alternative)

@topper-123 topper-123 force-pushed the deprecate_prefix_postfix branch from 1e83d89 to e797713 Compare December 24, 2017 09:58
@pep8speaks
Copy link

pep8speaks commented Dec 24, 2017

Hello @topper-123! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 24, 2017 at 10:45 Hours UTC

@topper-123 topper-123 force-pushed the deprecate_prefix_postfix branch from e797713 to 3a5785a Compare December 24, 2017 10:45
@jreback
Copy link
Contributor

jreback commented Feb 11, 2018

ok, so I think we need to add a .format() method before we can deprecate these, @topper-123 interested?

@jreback
Copy link
Contributor

jreback commented Jul 7, 2018

closing, we had a discussion about this, it is a bit premature to do this as these are useful. so alternative is to add a suffixes arg to pd.concat (#21791)

@jreback jreback closed this Jul 7, 2018
@topper-123 topper-123 deleted the deprecate_prefix_postfix branch October 27, 2018 08:16
@jbrockmendel
Copy link
Member

Should this be removed from the checklist in #6581?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants