Latest numpy and pandas #1339

ssanderson · 2016-07-21T00:49:22Z

Bump us up to the latest major versions of pandas and numpy.

Notable breakages in the latest:

Pandas:
- rolling, expanding, resample, and ewm* all changed to behave more like groupby. This PR adds backwards-compat shims to support both the new and old syntax. This change is the only one that required a material amount of work to preserve compat with pandas 0.17.
- .loc with an integer argument on an index of Asset objects no longer works in pandas 0.18. This is probably the change I'm most worried about from a user breakage perspective.
- Timezone information is now preserved in Series and DataFrame columns. This means that some fields that were previously tz-naive may now be tz-aware, leading to breakages. This is the second most worrisome change for user breakage.
- Group-label conventions for DataFrame/Series.nth() changed in pandas 0.18. The only affected usage has been re-written in a way that's 2x faster, so not much cost here. See Also: groupby.nth() labelling conventions changed from 0.17 -> 0.18 pandas-dev/pandas#13666.
- Passing null-ish values to pd.categorical is deprecated in pandas 0.18. This means, in particular, that the default missing value of None cannot be preserved in pipeline outputs for string-dtype Pipeline columns is no longer appropriate if we want to avoid pandas warnings and/or future breakages. This PR currently deprecates support for custom string-dtype missing values, and makes string-dtyped categorical output provide the pandas-recommended value of NaN. A future change should likely remove support for custom missing values entirely in favor of using categoricals with NaN missing values for both strings and ints. This is the code change I'm most conflicted about in this PR. I think a better change might be to just silence the warning for now, and remove support for missing values in one consistent change. As-is, the semantics for column missing values is inconsistent for strings and every other dtype. @llllllllll I'd be interested in your thoughts on this. See Also: Categorical.from_codes warns if None is in categories pandas-dev/pandas#13648
- DataFrame.sort was deprecated in favor of sort_values. This is trivial to fix.
- DataFrame.convert_objects was deprecated in favor of type-specific functions. We only had one, unnecessary invocation of convert_objects.
- DataFrame deprecated indexing with a float on .iloc. We only did this in one place, and it was almost certainly a bug.
Numpy:
- np.full started warning that passing an integer value would produce an integer array in the future (it currently produces a float array). This PR fixes most of those warnings by passing explicit float values, or passing an explicit dtype. This is probably the largest change in LoC, but there's no cost to users.
- np.NaT started warning on comparisons with itself that NaT != NaT will be true in the future. An isnat function has been added to numpy_utils, and it's been used anywhere that we were previously checking for NaT. See Also Add isfinite support for datetime64 and timedelta64 numpy/numpy#5610.

coveralls · 2016-07-29T11:45:37Z

Coverage decreased (-0.08%) to 85.012% when pulling e8fc7ac on latest-numpy-pandas into a937d6e on master.

coveralls · 2016-08-01T16:39:53Z

Coverage decreased (-0.05%) to 85.477% when pulling 245fae2 on latest-numpy-pandas into 9103516 on master.

coveralls · 2016-08-01T17:15:09Z

Coverage increased (+0.1%) to 85.66% when pulling 03efb1c on latest-numpy-pandas into 9103516 on master.

coveralls · 2016-08-01T17:26:16Z

Coverage decreased (-0.02%) to 85.508% when pulling 03efb1c on latest-numpy-pandas into 9103516 on master.

coveralls · 2016-08-02T15:53:10Z

Coverage decreased (-0.02%) to 85.524% when pulling 2a2e92c on latest-numpy-pandas into 129d16f on master.

coveralls · 2016-08-02T16:07:03Z

Coverage decreased (-0.02%) to 85.524% when pulling 2a2e92c on latest-numpy-pandas into 129d16f on master.

coveralls · 2016-08-02T17:52:47Z

Coverage decreased (-0.02%) to 85.494% when pulling 30ff125 on latest-numpy-pandas into f244dea on master.

coveralls · 2016-08-08T14:41:15Z

Coverage decreased (-0.02%) to 85.519% when pulling 829284b on latest-numpy-pandas into a260fb1 on master.

coveralls · 2016-08-09T17:02:25Z

Coverage decreased (-0.02%) to 85.519% when pulling 1d6ec4a on latest-numpy-pandas into 24f2ef8 on master.

coveralls · 2016-08-16T19:34:07Z

Coverage decreased (-0.02%) to 85.698% when pulling 15d7105 on latest-numpy-pandas into 4642fd2 on master.

coveralls · 2016-08-16T22:52:15Z

Coverage decreased (-0.02%) to 85.698% when pulling df0e748 on latest-numpy-pandas into 4642fd2 on master.

This reverts commit 1b1e842.

Pandas 0.18 doesn't like having null-ish values in categoricals. Fixing this properly requires re-thinking the semantics for missing_value on pipeline terms, so we're punting on that until after we've upgraded to 0.18.

coerce=True is deprecated.

It's slow and deprecated.

Anaconda doesn't have windows builds for scipy 0.18 (nor does conda-forge.)

Coerce and warn instead.

coveralls · 2016-09-20T22:11:46Z

Coverage decreased (-0.08%) to 86.589% when pulling c23dd5b on latest-numpy-pandas into 3fff659 on master.

ssanderson force-pushed the latest-numpy-pandas branch 4 times, most recently from 2308b71 to 25efea2 Compare July 28, 2016 19:38

ssanderson force-pushed the latest-numpy-pandas branch from f3b15a5 to e8fc7ac Compare July 29, 2016 11:27

ssanderson force-pushed the latest-numpy-pandas branch from e8fc7ac to 245fae2 Compare August 1, 2016 16:20

ssanderson force-pushed the latest-numpy-pandas branch 2 times, most recently from 6542b0d to 2a2e92c Compare August 2, 2016 15:20

ssanderson force-pushed the latest-numpy-pandas branch from 9be8d4e to 30ff125 Compare August 2, 2016 17:35

ssanderson force-pushed the latest-numpy-pandas branch 4 times, most recently from bdd6c8c to 829284b Compare August 8, 2016 14:20

ssanderson force-pushed the latest-numpy-pandas branch from b21699e to 1d6ec4a Compare August 9, 2016 16:35

ssanderson force-pushed the latest-numpy-pandas branch from 1d6ec4a to 15d7105 Compare August 16, 2016 18:48

ssanderson force-pushed the latest-numpy-pandas branch from ed367f6 to 1bb6f7a Compare August 18, 2016 17:23

ssanderson force-pushed the latest-numpy-pandas branch 3 times, most recently from d2ce509 to 9770f1c Compare August 31, 2016 20:42

Scott Sanderson added 23 commits September 20, 2016 17:12

Revert "MAINT: Remove support for custom string Column missing values."

0ff13e7

This reverts commit 1b1e842.

MAINT: Temporarily ignore pandas warnings in categoricals.

53eb196

Pandas 0.18 doesn't like having null-ish values in categoricals. Fixing this properly requires re-thinking the semantics for missing_value on pipeline terms, so we're punting on that until after we've upgraded to 0.18.

MAINT: Use errors='coerce'.

ac256f3

coerce=True is deprecated.

DOC: Typo in comment.

df76086

MAINT: Don't make datetime64 from tz-aware Timestamp.

aa3e2fe

It's slow and deprecated.

DOC: Note where cleanup happens.

7280662

BLD: Remove old numpy/pandas versions from travis.

659c8ae

MAINT: Put scipy back in travis reqs.

78dd69c

STY: Fix flake8 failures.

7e2230a

BLD: Update appveyor.yml for new pandas/numpy.

2e238bf

MAINT: Remove outdated compat code.

966c0ce

BLD: Downgrade to scipy 0.17.

76f8eaf

Anaconda doesn't have windows builds for scipy 0.18 (nor does conda-forge.)

MAINT: Use specific versions in appveyor.yml.

48e12a2

MAINT: Use explicit floats in np.full.

30a1eb6

MAINT: Fix PerformanceWarning import.

f3eeaa2

MAINT: Use df.resample().apply().

500f706

MAINT: Use sort_values instead of sort().

99a5957

MAINT: Bump blaze.

ae4efff

BUG: Don't fail on integral floats in event rules.

d9282ef

Coerce and warn instead.

MAINT: Rebuild example data.

ccd94e6

MAINT: Use randint instead of random_integers.

94e51cf

MAINT: Silence bad perf warning from pandas.

70755c5

MAINT: Bump blaze.

15b5cbf

ssanderson force-pushed the latest-numpy-pandas branch from f5f3384 to 15b5cbf Compare September 20, 2016 21:19

BUG: Remove set_trace and add test coverage.

c23dd5b

ssanderson merged commit 7441369 into master Sep 21, 2016

ssanderson deleted the latest-numpy-pandas branch September 21, 2016 02:03

This was referenced Sep 21, 2016

Dramatic Memory Usage Increases After Pandas 18 Merge #1503

Closed

Large Monotonic Index Objects Always Allocate Hash Tables on get_loc pandas-dev/pandas#14266

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latest numpy and pandas #1339

Latest numpy and pandas #1339

ssanderson commented Jul 21, 2016

coveralls commented Jul 29, 2016 •

edited

Loading

coveralls commented Aug 1, 2016 •

edited

Loading

coveralls commented Aug 1, 2016 •

edited

Loading

coveralls commented Aug 1, 2016 •

edited

Loading

coveralls commented Aug 2, 2016 •

edited

Loading

coveralls commented Aug 2, 2016 •

edited

Loading

coveralls commented Aug 2, 2016 •

edited

Loading

coveralls commented Aug 8, 2016 •

edited

Loading

coveralls commented Aug 9, 2016 •

edited

Loading

coveralls commented Aug 16, 2016 •

edited

Loading

coveralls commented Aug 16, 2016 •

edited

Loading

coveralls commented Sep 20, 2016 •

edited

Loading

Latest numpy and pandas #1339

Latest numpy and pandas #1339

Conversation

ssanderson commented Jul 21, 2016

coveralls commented Jul 29, 2016 • edited Loading

coveralls commented Aug 1, 2016 • edited Loading

coveralls commented Aug 1, 2016 • edited Loading

coveralls commented Aug 1, 2016 • edited Loading

coveralls commented Aug 2, 2016 • edited Loading

coveralls commented Aug 2, 2016 • edited Loading

coveralls commented Aug 2, 2016 • edited Loading

coveralls commented Aug 8, 2016 • edited Loading

coveralls commented Aug 9, 2016 • edited Loading

coveralls commented Aug 16, 2016 • edited Loading

coveralls commented Aug 16, 2016 • edited Loading

coveralls commented Sep 20, 2016 • edited Loading

coveralls commented Jul 29, 2016 •

edited

Loading

coveralls commented Aug 1, 2016 •

edited

Loading

coveralls commented Aug 1, 2016 •

edited

Loading

coveralls commented Aug 1, 2016 •

edited

Loading

coveralls commented Aug 2, 2016 •

edited

Loading

coveralls commented Aug 2, 2016 •

edited

Loading

coveralls commented Aug 2, 2016 •

edited

Loading

coveralls commented Aug 8, 2016 •

edited

Loading

coveralls commented Aug 9, 2016 •

edited

Loading

coveralls commented Aug 16, 2016 •

edited

Loading

coveralls commented Aug 16, 2016 •

edited

Loading

coveralls commented Sep 20, 2016 •

edited

Loading