Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest numpy and pandas #1339

Merged
merged 51 commits into from
Sep 21, 2016
Merged

Latest numpy and pandas #1339

merged 51 commits into from
Sep 21, 2016

Conversation

ssanderson
Copy link
Contributor

Bump us up to the latest major versions of pandas and numpy.

Notable breakages in the latest:

  • Pandas:
    • rolling, expanding, resample, and ewm* all changed to behave more like groupby. This PR adds backwards-compat shims to support both the new and old syntax. This change is the only one that required a material amount of work to preserve compat with pandas 0.17.
    • .loc with an integer argument on an index of Asset objects no longer works in pandas 0.18. This is probably the change I'm most worried about from a user breakage perspective.
    • Timezone information is now preserved in Series and DataFrame columns. This means that some fields that were previously tz-naive may now be tz-aware, leading to breakages. This is the second most worrisome change for user breakage.
    • Group-label conventions for DataFrame/Series.nth() changed in pandas 0.18. The only affected usage has been re-written in a way that's 2x faster, so not much cost here. See Also: groupby.nth() labelling conventions changed from 0.17 -> 0.18 pandas-dev/pandas#13666.
    • Passing null-ish values to pd.categorical is deprecated in pandas 0.18. This means, in particular, that the default missing value of None cannot be preserved in pipeline outputs for string-dtype Pipeline columns is no longer appropriate if we want to avoid pandas warnings and/or future breakages. This PR currently deprecates support for custom string-dtype missing values, and makes string-dtyped categorical output provide the pandas-recommended value of NaN. A future change should likely remove support for custom missing values entirely in favor of using categoricals with NaN missing values for both strings and ints. This is the code change I'm most conflicted about in this PR. I think a better change might be to just silence the warning for now, and remove support for missing values in one consistent change. As-is, the semantics for column missing values is inconsistent for strings and every other dtype. @llllllllll I'd be interested in your thoughts on this. See Also: Categorical.from_codes warns if None is in categories pandas-dev/pandas#13648
    • DataFrame.sort was deprecated in favor of sort_values. This is trivial to fix.
    • DataFrame.convert_objects was deprecated in favor of type-specific functions. We only had one, unnecessary invocation of convert_objects.
    • DataFrame deprecated indexing with a float on .iloc. We only did this in one place, and it was almost certainly a bug.
  • Numpy:
    • np.full started warning that passing an integer value would produce an integer array in the future (it currently produces a float array). This PR fixes most of those warnings by passing explicit float values, or passing an explicit dtype. This is probably the largest change in LoC, but there's no cost to users.
    • np.NaT started warning on comparisons with itself that NaT != NaT will be true in the future. An isnat function has been added to numpy_utils, and it's been used anywhere that we were previously checking for NaT. See Also Add isfinite support for datetime64 and timedelta64 numpy/numpy#5610.

@ssanderson ssanderson force-pushed the latest-numpy-pandas branch 4 times, most recently from 2308b71 to 25efea2 Compare July 28, 2016 19:38
@coveralls
Copy link

coveralls commented Jul 29, 2016

Coverage Status

Coverage decreased (-0.08%) to 85.012% when pulling e8fc7ac on latest-numpy-pandas into a937d6e on master.

@coveralls
Copy link

coveralls commented Aug 1, 2016

Coverage Status

Coverage decreased (-0.05%) to 85.477% when pulling 245fae2 on latest-numpy-pandas into 9103516 on master.

@coveralls
Copy link

coveralls commented Aug 1, 2016

Coverage Status

Coverage increased (+0.1%) to 85.66% when pulling 03efb1c on latest-numpy-pandas into 9103516 on master.

@coveralls
Copy link

coveralls commented Aug 1, 2016

Coverage Status

Coverage decreased (-0.02%) to 85.508% when pulling 03efb1c on latest-numpy-pandas into 9103516 on master.

@ssanderson ssanderson force-pushed the latest-numpy-pandas branch 2 times, most recently from 6542b0d to 2a2e92c Compare August 2, 2016 15:20
@coveralls
Copy link

coveralls commented Aug 2, 2016

Coverage Status

Coverage decreased (-0.02%) to 85.524% when pulling 2a2e92c on latest-numpy-pandas into 129d16f on master.

@coveralls
Copy link

coveralls commented Aug 2, 2016

Coverage Status

Coverage decreased (-0.02%) to 85.524% when pulling 2a2e92c on latest-numpy-pandas into 129d16f on master.

@coveralls
Copy link

coveralls commented Aug 2, 2016

Coverage Status

Coverage decreased (-0.02%) to 85.494% when pulling 30ff125 on latest-numpy-pandas into f244dea on master.

@ssanderson ssanderson force-pushed the latest-numpy-pandas branch 4 times, most recently from bdd6c8c to 829284b Compare August 8, 2016 14:20
@coveralls
Copy link

coveralls commented Aug 8, 2016

Coverage Status

Coverage decreased (-0.02%) to 85.519% when pulling 829284b on latest-numpy-pandas into a260fb1 on master.

@coveralls
Copy link

coveralls commented Aug 9, 2016

Coverage Status

Coverage decreased (-0.02%) to 85.519% when pulling 1d6ec4a on latest-numpy-pandas into 24f2ef8 on master.

@coveralls
Copy link

coveralls commented Aug 16, 2016

Coverage Status

Coverage decreased (-0.02%) to 85.698% when pulling 15d7105 on latest-numpy-pandas into 4642fd2 on master.

@coveralls
Copy link

coveralls commented Aug 16, 2016

Coverage Status

Coverage decreased (-0.02%) to 85.698% when pulling df0e748 on latest-numpy-pandas into 4642fd2 on master.

@ssanderson ssanderson force-pushed the latest-numpy-pandas branch 3 times, most recently from d2ce509 to 9770f1c Compare August 31, 2016 20:42
Scott Sanderson added 23 commits September 20, 2016 17:12
Pandas 0.18 doesn't like having null-ish values in categoricals.  Fixing
this properly requires re-thinking the semantics for missing_value on
pipeline terms, so we're punting on that until after we've upgraded to
0.18.
coerce=True is deprecated.
Anaconda doesn't have windows builds for scipy 0.18 (nor does
conda-forge.)
@coveralls
Copy link

coveralls commented Sep 20, 2016

Coverage Status

Coverage decreased (-0.08%) to 86.589% when pulling c23dd5b on latest-numpy-pandas into 3fff659 on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants