-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Display Python code for loading objects #100
Comments
I think this will be pretty useful for exploring the data in more detail. When you have a neurodata object open, you can click to view a Python code snippet to access that object remotely in a Python environment. I have found that the recommended fsspec method for loading the remote file is very inefficient compared with direct http requests using range headers (I don't know the inner-workings of that library). This prompted me to create a simple package called remfile. You can look at the README for that project for a more detailed discussion. The implementation is lightweight (a single Python file). Also, I have found pynwb to be inefficient for quick access to remote files - I think it's loading a lot of metadata up front. For that reason, I am using h5py directly for now. Happy to discuss further. |
There are still 2 big advantages to using a) See https://github.com/NeurodataWithoutBorders/nwbinspector/blob/6e4771f3008233a3a9e79ac919b2d4a0ae3d2f6c/src/nwbinspector/utils.py#L160-L174 for our implementation at the time, but we've intended to switch to b) The As I understand, the I admit this is only useful in some cases where revisiting data or metadata is common, not for the case of quickly scanning through an entire dataset once and only once, but still |
@magland would it be possible to add caching to disk and retries? |
Thanks @CodyCBakerPhD @bendichter yes, I have already added retries, and included a test for it (see below). Caching to disk is a bit more tricky... I will take a crack at it and you can tell me what you think. |
You can try joblib.Memory for caching |
Looks like a downside to i.e., if I request a slice of I'm not 100% sure w.r.t. a paginated file, or how the chunking equates to byte ranges under the hood, but I don't think it would re-use the data already downloaded from the first slice/request since it might equate to a different combination of range arguments I'd reference the But yes, it does seem more than a bit tricky to get it working comparably well. I'd be fine with just recommending |
Yeah, I think this is not straightforward to solve. For now let's say remfile does not have disk caching capability. |
@CodyCBakerPhD Looking at this some more, the difficult part of an LRU disk cache is the LRU. So I decided to implement a non-LRU disk cache, and it was pretty straightforward. See: |
@magland I think this discussion can be closed for now; we're working on higher level benchmarking and enhanced documentation/instructions for streaming recommendations and will open a new issue if/when we reach new recommendations for code snippets to follow |
No description provided.
The text was updated successfully, but these errors were encountered: