-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pickle test #27
Add pickle test #27
Conversation
Codecov Report
@@ Coverage Diff @@
## main #27 +/- ##
==========================================
- Coverage 87.29% 87.23% -0.07%
==========================================
Files 11 11
Lines 1149 1159 +10
==========================================
+ Hits 1003 1011 +8
- Misses 146 148 +2
Continue to review full report at Codecov.
|
tests/test_reader.py
Outdated
d = f.to_dask() | ||
pd = pickle.dumps(d) | ||
assert isinstance(pd, bytes) | ||
# FIXME: this line currently leaks a file handle... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this FIXME
due to the __setstate__
method when unpickling?
__setstate__
will open the file, but I don't see the mechanism (context manager?) that would close the file again for an unpickled array. Maybe through garbage collection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least that was the case for my initial PR, but I guess the OpeningDaskArray
might fix that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tlambert03,
Thanks for this. I am awed.
I pulled this PR branch and it seems to work for my use cases, I think everything I mentioned in issues #19 and #25.
Moreover, I can now pip install locally and the shared library is found, so it seems to address #24 as well.
I left a couple of comments, nothing I feel strongly about. Would be great to have this merged and rolled out.
Thanks again!
tests/test_reader.py
Outdated
d = f.to_dask() | ||
pd = pickle.dumps(d) | ||
assert isinstance(pd, bytes) | ||
# FIXME: this line currently leaks a file handle... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least that was the case for my initial PR, but I guess the OpeningDaskArray
might fix that as well.
src/nd2/opening_dask_array.py
Outdated
return method | ||
|
||
|
||
class OpeningDaskArray(da.Array): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach with the subclass better than the ObjectProxy
.
Not sure about the class name, sounds a bit like a morphological operation :-)
Throwing in ResourceHandlingDaskArray
as a suggestion although that is quite a mouthfull.
Ooops .... |
So what I see in the context of a larger application (shortened traceback) when returning the dask array from a ND2File context manager. When I leave ND2File open and return the array, I don't observe the crashes. When I try and create a smaller example consisting of the following two steps
it doesn't crash. Although these are exactly the steps that happen in the context of the larger application.
|
Now I can reproduce it with a small example my import dask.array as da
import numpy as np
from nd2 import ND2File
# TODO: create a small test .nd2 on the Nikon that we can add to resources
# or add something to git lfs
dataset_nd2 = "/home/hilsenst/Documents/Luisa_Reference_HT/PreMaldi/Seq0000.nd2"
# removed a number of tests here for brevity
...
def _load_nd2(filepath):
with ND2File(filepath) as f:
return f.to_dask()
def test_nd2_from_context_mgr():
arr = _load_nd2(dataset_nd2)
np_slice = np.array(arr[0, 0, :, :]) Then runninng the test:
|
thank you @VolkerH! I really appreciate the help. I think we're narrowing in on something. I will note that I found a few edge cases where I had to specify the scheduler to avoid a segfault (but I usually was able to figure out why), so let me try your example here. not that this is a nice solution, but perhaps try the following? def test_nd2_from_context_mgr():
arr = _load_nd2(dataset_nd2)
np_slice = np.array(arr[0, 0, :, :].compute(scheduler='threads')) |
I'm inclined to merge what we have so far, and keep working at the edge cases... that ok with you? |
hmmm, this test actually passes fine for (on mac), both in pytest, as well as in ipython or as a script. you're on ubuntu yes? |
Yes, Ubuntu 20.04.
I initially had some trouble reproducing this error with very similar code.
I'm not sure how much this is effected by some sort of external state
(number of cores being used, race conditions between threads, etc)
…On Fri, 12 Nov 2021 at 14:55, Talley Lambert ***@***.***> wrote:
Now I can reproduce it with a small example
my test_nd2.py
hmmm, this test actually passes fine for (on mac). you're on ubuntu yes?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#27 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQ3R7BW3HLUBDWFMJVSEB3ULUMEPANCNFSM5H3BMQGQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Yes, I'm all for it. |
for more information, see https://pre-commit.ci
got the leaked file figured out! One of the key details was that the dask array wasn't being "unpickled" back to a the |
Just saw this. Great that you could track it down. |
this includes #26 ... and modifies the get/set state accordingly to deal with the new lock object.
@VolkerH ... after #26, have a look here