-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Warn about panel.to_frame() discarding NaN GH7879 #7970
ENH Warn about panel.to_frame() discarding NaN GH7879 #7970
Conversation
@@ -858,6 +859,10 @@ def to_frame(self, filter_observations=True): | |||
mask = com.notnull(self.values).all(axis=0) | |||
# size = mask.sum() | |||
selector = mask.ravel() | |||
if not all(selector): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use not selector.all()
. there's a lot of overhead with the python builtin all
for ndarray
s because the builtin all
is very generic and works on anything with a next
method. For an ndarray
of size 10,000,000 ndarray.all()
is about 60x faster:
In [1]: n = 1e7
In [2]: x = randn(n)
In [3]: all(x)
Out[3]: True
In [4]: x.all()
Out[4]: True
In [5]: timeit all(x)
1 loops, best of 3: 658 ms per loop
In [6]: timeit x.all()
100 loops, best of 3: 11.3 ms per loop
In [7]: 658/11.3
Out[7]: 58.23008849557522
@m-novikov couple of minor comments here, otherwise looks good |
Fixed according to your comments, also added suppressing warning statement to pandas/io/tests/test_pytables.py as it's shows up on test run. |
result = wp.to_frame() | ||
|
||
with tm.assert_produces_warning(UserWarning): | ||
setattr(panelm, '__warningregistry__', {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this setattr line do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resets warnings in case it's already been raised somewhere. So I can for sure catch this warning.
if u take those extra lines out ( the filtering in setup and the setattr) what happens? I am puzzled why they are in here in the first place that is the point of assert_produces_warning it does the filtering necessary to assert the warning |
more to the point is the rationale behind this pr that seems a little odd no ? |
i think this warning is ok, it's a bit of a surprise to throw away data by default. |
why not just change the default then? |
well that would be nice but would break existing code |
If I lose |
I say just a change the default as an API change or do nothing |
i'd be ok with that, maybe a nice warning in the docs that says "you have to call dropna now" |
Removed warnings suppression |
@m-novikov if u change the result does anything break in the test suite (aside from the actual test for this) |
@jreback if I leave it with warnings nothing breaks, just occasional message in some test will be displayed and so without it but as was stated it's not expected behaviour. |
no I mean if u change the default to False |
@jreback There is SparsePanel.to_frame() which doesn't support filter_observations=False, if change signature only for Panel class then it will be inconsistent. |
@m-novikov can you put in a short doc note. Also Let's change this to a |
6ba5539
to
7d7a805
Compare
@jreback updated |
I would like to repurpose this and simply change the default here (no need to warn then). Can you update? |
Yes, so this means all related test to be fixed too, I will get to it |
@m-novikov gr8! lmk |
I think we can simply change this for 0.17.0. going to close this, but pls reopen if you want address that. |
Also there was unimported module warnings with calls to it. In pandas/core/panel.py lines 718, 748 for example.
closes #7879