-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Latest numpy and pandas #1339
Merged
Merged
Latest numpy and pandas #1339
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ssanderson
force-pushed
the
latest-numpy-pandas
branch
4 times, most recently
from
July 28, 2016 19:38
2308b71
to
25efea2
Compare
ssanderson
force-pushed
the
latest-numpy-pandas
branch
from
July 29, 2016 11:27
f3b15a5
to
e8fc7ac
Compare
ssanderson
force-pushed
the
latest-numpy-pandas
branch
from
August 1, 2016 16:20
e8fc7ac
to
245fae2
Compare
ssanderson
force-pushed
the
latest-numpy-pandas
branch
2 times, most recently
from
August 2, 2016 15:20
6542b0d
to
2a2e92c
Compare
ssanderson
force-pushed
the
latest-numpy-pandas
branch
from
August 2, 2016 17:35
9be8d4e
to
30ff125
Compare
ssanderson
force-pushed
the
latest-numpy-pandas
branch
4 times, most recently
from
August 8, 2016 14:20
bdd6c8c
to
829284b
Compare
ssanderson
force-pushed
the
latest-numpy-pandas
branch
from
August 9, 2016 16:35
b21699e
to
1d6ec4a
Compare
ssanderson
force-pushed
the
latest-numpy-pandas
branch
from
August 16, 2016 18:48
1d6ec4a
to
15d7105
Compare
ssanderson
force-pushed
the
latest-numpy-pandas
branch
from
August 18, 2016 17:23
ed367f6
to
1bb6f7a
Compare
ssanderson
force-pushed
the
latest-numpy-pandas
branch
3 times, most recently
from
August 31, 2016 20:42
d2ce509
to
9770f1c
Compare
This reverts commit 1b1e842.
Pandas 0.18 doesn't like having null-ish values in categoricals. Fixing this properly requires re-thinking the semantics for missing_value on pipeline terms, so we're punting on that until after we've upgraded to 0.18.
coerce=True is deprecated.
It's slow and deprecated.
Anaconda doesn't have windows builds for scipy 0.18 (nor does conda-forge.)
Coerce and warn instead.
ssanderson
force-pushed
the
latest-numpy-pandas
branch
from
September 20, 2016 21:19
f5f3384
to
15b5cbf
Compare
This was referenced Sep 21, 2016
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Bump us up to the latest major versions of pandas and numpy.
Notable breakages in the latest:
rolling
,expanding
,resample
, andewm*
all changed to behave more likegroupby
. This PR adds backwards-compat shims to support both the new and old syntax. This change is the only one that required a material amount of work to preserve compat with pandas 0.17..loc
with an integer argument on an index ofAsset
objects no longer works in pandas 0.18. This is probably the change I'm most worried about from a user breakage perspective.DataFrame/Series.nth()
changed in pandas 0.18. The only affected usage has been re-written in a way that's 2x faster, so not much cost here. See Also: groupby.nth() labelling conventions changed from 0.17 -> 0.18 pandas-dev/pandas#13666.pd.categorical
is deprecated in pandas 0.18. This means, in particular, that the default missing value ofNone
cannot be preserved in pipeline outputs for string-dtype Pipeline columns is no longer appropriate if we want to avoid pandas warnings and/or future breakages. This PR currently deprecates support for custom string-dtype missing values, and makes string-dtyped categorical output provide the pandas-recommended value ofNaN
. A future change should likely remove support for custom missing values entirely in favor of using categoricals with NaN missing values for both strings and ints. This is the code change I'm most conflicted about in this PR. I think a better change might be to just silence the warning for now, and remove support for missing values in one consistent change. As-is, the semantics for column missing values is inconsistent for strings and every other dtype. @llllllllll I'd be interested in your thoughts on this. See Also: Categorical.from_codes warns ifNone
is in categories pandas-dev/pandas#13648DataFrame.sort
was deprecated in favor ofsort_values
. This is trivial to fix.DataFrame.convert_objects
was deprecated in favor of type-specific functions. We only had one, unnecessary invocation ofconvert_objects
.DataFrame
deprecated indexing with afloat
on.iloc
. We only did this in one place, and it was almost certainly a bug.np.full
started warning that passing an integer value would produce an integer array in the future (it currently produces a float array). This PR fixes most of those warnings by passing explicit float values, or passing an explicit dtype. This is probably the largest change in LoC, but there's no cost to users.np.NaT
started warning on comparisons with itself thatNaT != NaT
will be true in the future. Anisnat
function has been added tonumpy_utils
, and it's been used anywhere that we were previously checking for NaT. See Also Addisfinite
support fordatetime64
andtimedelta64
numpy/numpy#5610.