-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow creation of ND2File object from large ND2 files #50
Comments
I have also run into this issue. @tlambert03 I think the issue is that there needs to be a way to specify |
Yep, thanks both, I ran into this recently as well. We can definitely speed this up by not double checking the chunkmap. |
(actually, @jni... this is the reason for that big delay we observed last week, it was greedily performing that chunkmap validation that I originally added to "rescue" corrupt datat) |
Awesome, thanks! (Also, happy to submit a quick PR if you're busy.) (@tlambert03: for context, this week I'm hoping to finally migrate the paulsson lab codebase over to this reader from my hacked-together pickle/memmap-enabled nd2reader fork that I've been using for the last many years...) |
I'll take any chance I can get to hook you into the "contributors" column here 😂 I agree, we should make fixup default to False, but then also create a path for the |
Thank you for the rapid responses and fixes! In my hands it does and does not work in v0.2.3. With validate_frames = False, the loading speeds are very fast indeed, but in my case this unfortunately results in every single frame being "corrupted", i.e. frames being offset/overlapped incorrectly. So I now appreciate why you implemented the validation in the first place. It does seem to read the frames correctly when validate_frames = True. I did discover that depending on the source (different network drives or local) the time it takes to validate the frames varies by an order of magnitude (even if very little bandwidth is actually being used, perhaps a latency issue), so a workaround for me is to ensure that I have the files on a fast connection, then the overhead is not too severe. P.S.: I realized now that the above also means that by changing the default behavior, nd2.imread(file, dask=True) exclusively returns corrupted frames (with my data) and does not have the option to validate them either, which could cause some confusion for users encountering this. |
Ohhh, you're also reading something over the network. That helps to explain why I haven't seen this. I do have some local 30 Gb files (some of which are partially corrupted) and i had never seen long delays yet. Yeah, it's quite unfortunate that nd2 uses these massive single files, it makes it extremely easy for a tiny byte offset error to propagate crap frames throughout the dataset. Would you be willing to share that big dataset via Dropbox or something, so I can have a look? |
Description
Thank you for creating this useful package. However, in my use case, creating ND2File objects from a ~30 GB ND2 file seems quite slow, taking about 30 s, while in comparison creating an ND2Reader object using nd2reader takes ~1 s. Reading of subsets of the data once the object is created seems similar. I would much prefer the ND2File version for the slicing interface and dask compatibility, but the initial overhead for creating a series of ND2File/dask objects is (surprisingly?) large. Using nd2.imread(filepath, dask = True) gives similar overhead, while running to_dask() on the ND2 file once created is less than a second. File details are below. Both CPU and drive usage are very moderate as the object is created, so there do not seem to be hardware bottlenecks.
What I Did
The text was updated successfully, but these errors were encountered: