Accept range for list-requiring kwargs in pd.read_csv #17083

jbrockmendel · 2017-07-26T17:00:24Z

pd.read_csv(path, index_col=range(1)) currently breaks in py3. This PR fixes that.

There are likely other functions and kwargs for which the same fix would be useful. I haven't tracked them all down.

Edits to series and frame are unrelated, but not big enough to merit their own PR. This is just removing import of nan, ndarray that are used in a small number of places and instead using np.nan, np.ndarray.

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

gfyoung · 2017-07-26T17:18:58Z

Edits to series and frame are unrelated, but not big enough to merit their own PR. This is just removing import of nan, ndarray that are used in a small number of places and instead using np.nan, np.ndarray.

Actually, they do because that's a refactoring, and it obfuscates the actual change you're making. You should separate that out as a separate PR.

gfyoung · 2017-07-26T17:21:54Z

@jbrockmendel : Can you explain the motivation for this PR? range, being a generator for Python 3.x, is not a valid index_col parameter. Why can't you just pass in list(range(...)) yourself?

Unless you plan to iterate through them (so that you don't have to load the entire sequence into memory), I'm have some difficulty understanding the benefit.

jbrockmendel · 2017-07-26T17:28:38Z

You should separate that out as a separate PR.

OK. I've been a bit concerned about making multiple small PRs because they each run travis et al. I'll separate these out.

Why can't you just pass in list(range(...)) yourself?

You can. The motivation is that user code that works in py2 should ideally work in py3. Is there some ambiguity as to what a user who passes index_col=range(3) intends? If not, I contend this should Just Work.

gfyoung · 2017-07-26T18:00:14Z

The motivation is that user code that works in py2 should ideally work in py3...Is there some ambiguity as to what a user who passes index_col=range(3) intends?

range(3) means different things in Python 2 and Python 3, so it's unclear as to why you're passing in a generator as a parameter when we don't allow that for index_col.

While I understand the intention, I'm not sure that justifies us loosening our specifications, especially since it's really not inconvenient to use list(range(...)) instead of range(...). Mind you that trying to accommodate old Python 2 conventions is probably not ideal in the long-run because you're enforcing old behavior when we should be practicing Python 3 habits. 😄

jbrockmendel · 2017-07-26T18:39:09Z

you're enforcing old behavior when we should be practicing Python 3 habits

Sure the pandas devs should be practicing py3 habits, but when users have things that work in py2 and break in py3 for seemingly nitpicky reasons, that's just one more little bit of friction in the py3 transition process. It's counter-productive.

I'm not interested in bike-shedding user-friendliness vs purity. I'll put the ndarray/nan bit in a separate PR.

jreback · 2017-07-26T18:52:05Z

@jbrockmendel what I would accept here is an error message that raises on a non-scalar and non-list-like (IOW range).

we are actually somewhat picky on array-like; these generally *mustY be fully materialized objects (though we do accept generaters generally). In this instance an exception is fine.

gfyoung · 2017-07-26T19:07:23Z

I might also point out that you can use pandas' lrange object from pandas.compat instead of list(range(...)), which gives you the same Python 2.x range behavior.

Accept range for list-requiring args in py3

b3e2438

gfyoung added the IO CSV read_csv, to_csv label Jul 26, 2017

Fix missing range--> list conversion

1eea06e

jbrockmendel force-pushed the accept_range branch from ff38aad to 1eea06e Compare July 26, 2017 18:28

jbrockmendel closed this Jul 26, 2017

jbrockmendel mentioned this pull request Jul 26, 2017

MAINT: Remove non-standard and inconsistently-used imports #17085

Merged

4 tasks

jbrockmendel deleted the accept_range branch August 2, 2017 23:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accept range for list-requiring kwargs in pd.read_csv #17083

Accept range for list-requiring kwargs in pd.read_csv #17083

jbrockmendel commented Jul 26, 2017

gfyoung commented Jul 26, 2017

gfyoung commented Jul 26, 2017 •

edited

Loading

jbrockmendel commented Jul 26, 2017

gfyoung commented Jul 26, 2017

jbrockmendel commented Jul 26, 2017

jreback commented Jul 26, 2017

gfyoung commented Jul 26, 2017

Accept range for list-requiring kwargs in pd.read_csv #17083

Accept range for list-requiring kwargs in pd.read_csv #17083

Conversation

jbrockmendel commented Jul 26, 2017

gfyoung commented Jul 26, 2017

gfyoung commented Jul 26, 2017 • edited Loading

jbrockmendel commented Jul 26, 2017

gfyoung commented Jul 26, 2017

jbrockmendel commented Jul 26, 2017

jreback commented Jul 26, 2017

gfyoung commented Jul 26, 2017

gfyoung commented Jul 26, 2017 •

edited

Loading