Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Deprecate generic timestamp dtypes #15987

Merged
merged 2 commits into from
Apr 14, 2017

Conversation

gfyoung
Copy link
Member

@gfyoung gfyoung commented Apr 12, 2017

We only use the nanosecond frequency, and numpy doesn't even handle generic timestamp dtype well internally with its ndarray object.

xref #15524 (comment).

dtype = np.timedelta64
s = Series([], dtype=dtype)

self.assertTrue(s.empty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use new pytest when you can, e.g.

assert s.empty
assert s.dtype == 'm8[ns]'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. Done.

Copy link
Member Author

@gfyoung gfyoung Apr 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note, not really a fan of having to rewrite every single test comparison just to be more "idiomatic" with the existing test standard.

Why not write our own in pandas.util.testing that we can implement according to idiom and call from everywhere else?

e.g.

def assert_true(val):
    assert val

def assert_equal(a, b):
    assert a == b

The idiom is thus abstracted away from our tests, and we don't have to worry about it when writing test cases.

Copy link
Member Author

@gfyoung gfyoung Apr 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, as I write this, I see that we do have one of these functions already:

assert_equal(a, b, msg="")

I think from now on, we should advocate that people use our static util.testing functions (and avoid using self.assert... or assert keyword).

def test_astype_empty_constructor_equality(self):
# see gh-15524

for dtype in np.typecodes['All']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paramerize when you can on new tests (you may need to create a new class that inherits from object and/or write as bare functions)

Copy link
Member Author

@gfyoung gfyoung Apr 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unittest classes and pytest decorators do not mesh well together based on experience. I can't find anything so far in documentation on the pytest to do so. Unless you know of a straightforward way to do this, I'm leaving as is.

Copy link
Member Author

@gfyoung gfyoung Apr 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Barring some native pytest way to implement, a test decorator is the only way to do this (similar to what was done with capturing stdout and stderr). This is probably preferable since it would be independent of testing platform.

However, I would prefer to leave that as a follow-up so I can experiment separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@gfyoung gfyoung Apr 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those docs don't answer my question. Does not explain how to parameterize a test method within tm.TestCase. As I already mentioned, unittest and pytest don't mesh well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying that I don't disagree about parameterizing. However, I think it is preferable and easier to implement our own parameterizaton decorator (just as we did with capture), that is all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am trying to shrink the codebase
implementing parameteize is not trivial and would need a whole test suite - it diesnt add any value

Copy link
Member Author

@gfyoung gfyoung Apr 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but neither is incorporating pytest's builtin paramterize without adding new code (i.e. an entirely new class). The only way to avoid that is to write this is as a bare function (not a test method of tm.TestCase), but I prefer the test class organization if possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just make another class inherit from object AS i show in the example

Copy link
Member Author

@gfyoung gfyoung Apr 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There actually is no example where you create a class that inherits from object. However, that being said, having been able to sit down and re-examine the class suggestion, I do see what you mean now.

Several things can come from this discussion:

  1. Document how you can instantiate test classes that inherit from object (and not tm.TestCase). I still prefer writing test classes instead of raw functions for organizational reasons, though it will still follow pytest idiom.

  2. Make a new PR that modifies the changes in TST: Add test decorators for redirecting stdout and stderr #15952 to use pytest fixtures for redirecting stdout and stdin (that will take substantial work though).

  3. Modify the existing test class in this file to inherit only from object (new commit).

# numpy arrays don't handle generic timestamp dtypes well. Since
# we only use the nanosecond frequency, interpret generic timestamp
# dtypes as nanosecond frequency.
if dtype.name == "datetime64":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is way too hacky to put this here, this should be trapped in maybe_cast_to_datetime, or if not, then let's create a generic way (via function there to do this).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree, except that when we call .astype, maybe_cast_to_datetime does not get called. But certainly, there are other places that casting can go.

@codecov
Copy link

codecov bot commented Apr 12, 2017

Codecov Report

Merging #15987 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15987      +/-   ##
==========================================
- Coverage   91.03%   91.01%   -0.02%     
==========================================
  Files         145      145              
  Lines       49587    49585       -2     
==========================================
- Hits        45141    45132       -9     
- Misses       4446     4453       +7
Flag Coverage Δ
#multiple 88.78% <100%> (-0.02%) ⬇️
#single 40.57% <33.33%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/internals.py 93.55% <100%> (ø) ⬆️
pandas/types/cast.py 86.51% <100%> (+0.77%) ⬆️
pandas/tools/plotting.py 71.75% <0%> (-0.72%) ⬇️
pandas/core/common.py 91.03% <0%> (+0.34%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b8a6b1...d581e3e. Read the comment docs.

@codecov
Copy link

codecov bot commented Apr 12, 2017

Codecov Report

Merging #15987 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15987      +/-   ##
==========================================
+ Coverage      91%   91.01%   +<.01%     
==========================================
  Files         145      145              
  Lines       49639    49649      +10     
==========================================
+ Hits        45175    45188      +13     
+ Misses       4464     4461       -3
Flag Coverage Δ
#multiple 88.78% <100%> (ø) ⬆️
#single 40.53% <25%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/types/cast.py 86.14% <100%> (+1.02%) ⬆️
pandas/core/common.py 90.68% <0%> (-0.35%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd35d22...eb4df87. Read the comment docs.

@gfyoung
Copy link
Member Author

gfyoung commented Apr 13, 2017

@jreback : All comments have been addressed, and everything is green.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Apr 13, 2017

In the issue, the discussion was about construction of empty series objects, while here you also added constructing/astyping actual data with non-parameterized np.datetime64. As I said in the issue (#15524 (comment), so my comment was relevant after all :-)), I don't see a strong reason to allow this.

@jorisvandenbossche
Copy link
Member

@gfyoung I see now you replied in the issue:

Those examples you bring up above, should we follow in numpy 's footsteps? We only use one frequency, so I don't see why those two examples wouldn't work.

There is maybe not a strong reason that they shouldn't work, but I also don't see a good reason that they should work. It just expands what users can do in an unnecessary way. Why is it needed that people can do .astype(np.datetime) while they already can do .astype('datetime64[ns]') which is just more explicit (and does not 'misuse' the numpy dtype by interpreting it differently as numpy intended).

@gfyoung
Copy link
Member Author

gfyoung commented Apr 13, 2017

@jorisvandenbossche : Well, it has to go one way or the other. Either you can astype and initialize with the dtype (empty or non-empty), or you can't do either. The consensus with @jreback was that you should be able to do both. The point of this PR was to make behavior consistent.

I don't have any major issues going the other way and preventing anyone from casting to the generic dtype (i.e. np.datetime64 or np.timedelta64) via astype or the constructor (though personally, having this convenience is nice because I sometimes forget how to format / exactly specify the frequency). However, some agreement on how to proceed with this would be good.

@jreback
Copy link
Contributor

jreback commented Apr 13, 2017

I am ok with this (still have to review the change again) as we already support this.

In [4]: Series([]).astype(np.datetime64)
Out[4]: Series([], dtype: datetime64[ns])

this should be consistent with direct construction (via dtype).

@jreback
Copy link
Contributor

jreback commented Apr 13, 2017

On second though, we could deprecate accepting direct np.datetime64 as a passed dtype (this doesn't affect accept an array already in this format, just the dtype= kw and .astype())

So the principal of construction / conversion should work with the same type will be preserved.

I think I do like being more explicit here (IOW doing the deprecation). @gfyoung can you see what this breaks?

IIRC this was originally an issue to fix something in csv parsing. But this was an orthogonal change (I don't even really understand how this change was related, maybe you can show that?)

@jorisvandenbossche
Copy link
Member

(this doesn't affect accept an array already in this format, just the dtype= kw and .astype())

And for dtype= it already fails:

In [5]: Series([], dtype=np.datetime64)
...
TypeError: cannot convert datetimelike to dtype [datetime64]

so it is only astype that would change.

If possible, I would be in favor of not allowing bare np.datetime64 (and thus deprecating the astype case)

@jreback
Copy link
Contributor

jreback commented Apr 13, 2017

@jorisvandenbossche yep its the .astype that works now, not the direct construction. So let's investigate deprecating directly specifying np.datetime64, np.timedelta64 in .astype and see what happens.

@gfyoung
Copy link
Member Author

gfyoung commented Apr 13, 2017

@jreback , @jorisvandenbossche : That sounds like a good plan. Shouldn't be too bad I imagine.

@jorisvandenbossche BTW, the reason for that weird error message that you saw in the issue here is because at some point, we call arr.view(dtype), which when dtype=np.datetime64 causes numpy to get angry (internal representation breaks), and it can't display a proper error message.

@gfyoung gfyoung changed the title ENH: Handle generic timestamp dtypes with Series DEPR: Deprecate generic timestamp dtypes Apr 13, 2017
@gfyoung
Copy link
Member Author

gfyoung commented Apr 13, 2017

Deprecation worked out reasonably well, but I can't really figure out why AppVeyor and Circle are failing my tests. The Travis logs are clean (and passing), so I don't think that there's any hidden warning being issued that's causing AppVeyor and Circle to miss them.

Suggestions on how to fix this?

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't directly see what could be going on with circle ci and appveyor.

"deprecated and will raise in a future version. "
"Please pass in '{dtype}[ns]' instead.")
warnings.warn(msg.format(dtype=dtype.name),
FutureWarning, stacklevel=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this stacklevel should be 8

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant 5? (same as the one beneath). Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think this should be 8 (at least, that's what I counted). But you didn't remove the stacklevel=False, so this does not fail the tests currently

if dtype.name in ('datetime64', 'datetime64[ns]'):
if dtype.name == 'datetime64':
warnings.warn(msg.format(dtype=dtype.name),
FutureWarning, stacklevel=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this one 5

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if dtype.name in ('timedelta64', 'timedelta64[ns]'):
if dtype.name == 'timedelta64':
warnings.warn(msg.format(dtype=dtype.name),
FutureWarning, stacklevel=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this one also 5

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

data = [1]

with tm.assert_produces_warning(FutureWarning,
check_stacklevel=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try to remove the check_stacklevel here? (btw, easiest to try out what it should be is the set warnings to raise an error, and then you can count the number of stacks in the traceback)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's fair. Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like you removed this?

warning_type = FutureWarning

with tm.assert_produces_warning(warning_type,
check_stacklevel=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it may help, I don't think it is necessarily needed to assert the warning here (as you already have an explicit test for that above), but you can also just catch warnings so they do not appear in the log (with warnings.catch_warnings(record=True):)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Done.

@gfyoung gfyoung force-pushed the timestamp64-no-freq branch 2 times, most recently from 282fd39 to 6445606 Compare April 13, 2017 21:55
@jreback jreback added Deprecate Functionality to remove in pandas Dtype Conversions Unexpected or buggy dtype conversions labels Apr 13, 2017
@jreback
Copy link
Contributor

jreback commented Apr 13, 2017

lgtm. I see you 'fixed' the passing np.datetime64 to the constructor (and then deprecated it), but that's fine.

ping on green.

@gfyoung
Copy link
Member Author

gfyoung commented Apr 13, 2017

Sounds good...once I can figure out these warnings 😁

We only use the nanosecond frequency, and numpy
doesn't even handle generic timestamp dtypes well.

xref pandas-devgh-15524 (comment).
@gfyoung
Copy link
Member Author

gfyoung commented Apr 14, 2017

@jreback : Everything is green again!

@jreback jreback added this to the 0.20.0 milestone Apr 14, 2017
@jreback jreback merged commit ebc0c09 into pandas-dev:master Apr 14, 2017
@jreback
Copy link
Contributor

jreback commented Apr 14, 2017

thanks!

@gfyoung gfyoung deleted the timestamp64-no-freq branch April 14, 2017 13:50
gfyoung added a commit to forking-repos/pandas that referenced this pull request Oct 28, 2018
Previously deprecated for Series constructor
and the `.astype` method. Now being enforced.

xref pandas-devgh-15987.
gfyoung added a commit to forking-repos/pandas that referenced this pull request Oct 28, 2018
Previously deprecated for Series constructor
and the `.astype` method. Now being enforced.

xref pandas-devgh-15987.
gfyoung added a commit to forking-repos/pandas that referenced this pull request Oct 28, 2018
Previously deprecated for Series constructor
and the `.astype` method. Now being enforced.

xref pandas-devgh-15987.
jreback pushed a commit that referenced this pull request Oct 28, 2018
Previously deprecated for Series constructor
and the `.astype` method. Now being enforced.

xref gh-15987.
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Previously deprecated for Series constructor
and the `.astype` method. Now being enforced.

xref pandas-devgh-15987.
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Previously deprecated for Series constructor
and the `.astype` method. Now being enforced.

xref pandas-devgh-15987.
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Previously deprecated for Series constructor
and the `.astype` method. Now being enforced.

xref pandas-devgh-15987.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants