Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept range for list-requiring kwargs in pd.read_csv #17083

Closed
wants to merge 2 commits into from

Conversation

jbrockmendel
Copy link
Member

pd.read_csv(path, index_col=range(1)) currently breaks in py3. This PR fixes that.

There are likely other functions and kwargs for which the same fix would be useful. I haven't tracked them all down.

Edits to series and frame are unrelated, but not big enough to merit their own PR. This is just removing import of nan, ndarray that are used in a small number of places and instead using np.nan, np.ndarray.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@gfyoung
Copy link
Member

gfyoung commented Jul 26, 2017

Edits to series and frame are unrelated, but not big enough to merit their own PR. This is just removing import of nan, ndarray that are used in a small number of places and instead using np.nan, np.ndarray.

Actually, they do because that's a refactoring, and it obfuscates the actual change you're making. You should separate that out as a separate PR.

@gfyoung gfyoung added the IO CSV read_csv, to_csv label Jul 26, 2017
@gfyoung
Copy link
Member

gfyoung commented Jul 26, 2017

@jbrockmendel : Can you explain the motivation for this PR? range, being a generator for Python 3.x, is not a valid index_col parameter. Why can't you just pass in list(range(...)) yourself?

Unless you plan to iterate through them (so that you don't have to load the entire sequence into memory), I'm have some difficulty understanding the benefit.

@jbrockmendel
Copy link
Member Author

You should separate that out as a separate PR.

OK. I've been a bit concerned about making multiple small PRs because they each run travis et al. I'll separate these out.

Why can't you just pass in list(range(...)) yourself?

You can. The motivation is that user code that works in py2 should ideally work in py3. Is there some ambiguity as to what a user who passes index_col=range(3) intends? If not, I contend this should Just Work.

@gfyoung
Copy link
Member

gfyoung commented Jul 26, 2017

The motivation is that user code that works in py2 should ideally work in py3...Is there some ambiguity as to what a user who passes index_col=range(3) intends?

range(3) means different things in Python 2 and Python 3, so it's unclear as to why you're passing in a generator as a parameter when we don't allow that for index_col.

While I understand the intention, I'm not sure that justifies us loosening our specifications, especially since it's really not inconvenient to use list(range(...)) instead of range(...). Mind you that trying to accommodate old Python 2 conventions is probably not ideal in the long-run because you're enforcing old behavior when we should be practicing Python 3 habits. 😄

@jbrockmendel
Copy link
Member Author

you're enforcing old behavior when we should be practicing Python 3 habits

Sure the pandas devs should be practicing py3 habits, but when users have things that work in py2 and break in py3 for seemingly nitpicky reasons, that's just one more little bit of friction in the py3 transition process. It's counter-productive.

I'm not interested in bike-shedding user-friendliness vs purity. I'll put the ndarray/nan bit in a separate PR.

@jreback
Copy link
Contributor

jreback commented Jul 26, 2017

@jbrockmendel what I would accept here is an error message that raises on a non-scalar and non-list-like (IOW range).

we are actually somewhat picky on array-like; these generally *mustY be fully materialized objects (though we do accept generaters generally). In this instance an exception is fine.

@gfyoung
Copy link
Member

gfyoung commented Jul 26, 2017

I might also point out that you can use pandas' lrange object from pandas.compat instead of list(range(...)), which gives you the same Python 2.x range behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants