Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Deprecate skip_footer in read_csv #13386

Closed

Conversation

gfyoung
Copy link
Member

@gfyoung gfyoung commented Jun 7, 2016

Title is self-explanatory.

Closes gh-13349 and partially undoes this commit back in v0.9.0. With such a massive API now, having duplicate arguments makes managing it way less practical.

@gfyoung
Copy link
Member Author

gfyoung commented Jun 7, 2016

In light of removing duplicate arguments and stripping down the API a bit, here are a couple of others I would like to propose (perhaps not in this PR but for discussion nonetheless):

  1. Deprecate skipfooter in read_excel (it's allowed as an alternative to skip_footer, though it is not surfaced in the signature but rather encompassed in a **kwargs argument)?

  2. Should we choose between sep and delimiter in read_csv?

@jreback
Copy link
Contributor

jreback commented Jun 7, 2016

the problem is that we also have skiprows. I would actually deprecate skip_footer instead.

@gfyoung
Copy link
Member Author

gfyoung commented Jun 7, 2016

Hmmm...that's a flip-flop AFAICT from what you said earlier here. However, deprecating skip_footer seems a little strange since we use skip_footer all over the place in the code (internally especially). That's why I chose to deprecate skipfooter, since the code impact is not very significant as you can tell. In addition, we use the _ in a lot of the other arguments, so I'd rather side with the majority.

@jreback
Copy link
Contributor

jreback commented Jun 7, 2016

just change it internally to the correct kw. I think they diverged at some point. It looks (just a quick skim), that skipfooter is more consistent (as we have skiprows, usecols) etc. excel should be fixed to be the same as well.

@gfyoung
Copy link
Member Author

gfyoung commented Jun 7, 2016

@jreback : let's actually do the count of multi-word argument names (excluding the skip(_)footer ones) using the documentation as of 0.18.1 here:

Arguments that use _:
filepath_or_buffer
index_col
mangle_dupe_cols
true_values
false_values
na_values
keep_default_na
na_filter
skip_blank_lines
error_bad_lines
warn_bad_lines
delim_whitespace
as_recarray
compact_ints
use_unsigned
low_memory
buffer_lines
memory_map
float_precision

Arguments that don't use _:
usecols
skipinitialspace
skiprows
chunksize
lineterminator
doublequote
quotechar
escapechar

Even excluding ones that have been recently deprecated (compact_ints, buffer_lines, use_unsigned, as_recarray), arguments that use _ vastly outnumber those that don't.

@jreback
Copy link
Contributor

jreback commented Jun 7, 2016

@gfyoung I am trying to minimize back-compat pain.

usecols and skiprows are prob some of the most used options. I agree we should just deprecate and move to _ separated, but that would be in 0.19.0 if we go whole hog.

@codecov-io
Copy link

codecov-io commented Jun 7, 2016

Current coverage is 85.23% (diff: 100%)

Merging #13386 into master will increase coverage by <.01%

@@             master     #13386   diff @@
==========================================
  Files           140        140          
  Lines         50420      50419     -1   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits          42975      42975          
+ Misses         7445       7444     -1   
  Partials          0          0          

Powered by Codecov. Last update cc216ad...d21345f

@jreback jreback added API Design IO CSV read_csv, to_csv labels Jun 7, 2016
@gfyoung
Copy link
Member Author

gfyoung commented Jun 7, 2016

@jreback : I have no issues waiting until 0.19.0 and going full hog. This duplicate argument is not breaking, and it will then give us the time to properly deprecate and rename arguments with _.

@jreback jreback added this to the 0.19.0 milestone Jun 7, 2016
@gfyoung gfyoung changed the title DEPR: deprecate skipfooter in read_csv API: Use underscore in read_* arguments Jun 8, 2016
@gfyoung gfyoung force-pushed the deprecate-dup-skipfooter branch 8 times, most recently from fdb6f97 to 85a58f5 Compare June 9, 2016 00:55
@gfyoung gfyoung changed the title API: Use underscore in read_* arguments API: Use underscore in read_csv arguments Jun 9, 2016
@gfyoung gfyoung force-pushed the deprecate-dup-skipfooter branch 7 times, most recently from ec04105 to 5ae47e9 Compare June 15, 2016 09:29
@shoyer
Copy link
Member

shoyer commented Jul 14, 2016

I would lean against this change -- I don't think deprecating or even adding aliases with underscores for these keywords is worth it. Yes, consistency is nice, but these are keywords that are mostly either (a) copied straight from the API docs or an example or (b) auto-completed. Omitting underscores between lower case words is also common enough in PEP8 compliant code that it doesn't hurt my eyes in the way that camelCase would, for example. I would save changes like this for next time somebody writes a DataFrame library ;). There's just very little upside to the change at this point.

@chris-b1
Copy link
Contributor

I'd also be inclined to not do this.

Consistency is great, but to me, big breaking (or warning triggering) changes for consistency are only worth it to the extent they actually improve the user experience. E.g. resample, the new sort_... api, or a closer example, when the inconsistency between the rows and index keyword was resolved in pivot_table / pivot - #5505

This doesn't seem to meet that hurdle. Like @shoyer said, these argument would be typically tab-completed or looked up, or even if not, the inconsistency doesn't cause any confusion.

@gfyoung
Copy link
Member Author

gfyoung commented Jul 14, 2016

To put things in perspective, this whole discussion was generated because I wanted to remove the duplicate 'skipfooter/skip_footer' argument in the signature. It really should only be one of the two IMO.

The consistency part came up because we weren't sure which way to go i.e do we take the underscore version or the non-underscore?

I perfectly understand if we don't want to make such a change as drastic as it has become, though some input on the initial goal would be good then.

@chris-b1
Copy link
Contributor

I don't have a strong opinion on that, but given that all the docs / SO answers / etc in the wild use skipfooter, I'd think deprecating skip_footer makes more sense?

@gfyoung gfyoung changed the title API: Use underscore in read_csv arguments API: Deprecate skip_footer in read_csv Jul 15, 2016
@gfyoung
Copy link
Member Author

gfyoung commented Jul 15, 2016

In light of the push back, I will rollback my changes and just deprecate skip_footer as @chris-b1 indicated. If skipfooter is being used in the wild, let's stick with that one then.

@gfyoung
Copy link
Member Author

gfyoung commented Jul 15, 2016

@jreback , @jorisvandenbossche : Rolled back my changes successfully (i.e. Travis is passing) to just deprecating skip_footer for the reason that @chris-b1 outlined above. Ready to merge if there are no other concerns.

@jreback
Copy link
Contributor

jreback commented Jul 15, 2016

lgtm. @jorisvandenbossche

@gfyoung
Copy link
Member Author

gfyoung commented Jul 19, 2016

@jorisvandenbossche : any updates on this?

@jorisvandenbossche
Copy link
Member

What do we do other read_ functions for the skipfooter case?

@gfyoung
Copy link
Member Author

gfyoung commented Jul 19, 2016

Same as here pre-PR (i.e. it just accepts both of them). However, I will handle all other cases similarly if this one is merged.

@gfyoung
Copy link
Member Author

gfyoung commented Jul 26, 2016

@jorisvandenbossche : Any updates on this?

@@ -1351,7 +1351,7 @@ back to python if C-unsupported options are specified. Currently, C-unsupported
options include:

- ``sep`` other than a single character (e.g. regex separators)
- ``skip_footer``
- ``skipfooter``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did the doc-string get changed (to add DEPRECATED)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm...not sure where those changes went. Added them back.

@gfyoung
Copy link
Member Author

gfyoung commented Jul 28, 2016

@jreback : added the documentation back about the deprecation, and Travis is still happy after rebase. Ready to merge if there are no other concerns.

@jreback
Copy link
Contributor

jreback commented Jul 29, 2016

thanks @gfyoung

@gfyoung gfyoung deleted the deprecate-dup-skipfooter branch July 29, 2016 02:29
gfyoung added a commit to forking-repos/pandas that referenced this pull request Dec 11, 2017
gfyoung added a commit to forking-repos/pandas that referenced this pull request Dec 11, 2017
gfyoung added a commit that referenced this pull request Dec 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design IO CSV read_csv, to_csv Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants