-
-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thinning down zarr-python
#1274
Comments
Yeah we have had similar discussions before |
Thanks for linking the previous discussions, @jakirkham. Some interesting points there. Also, we have PyData Global 2022 Sprints coming up next week so we're thinking to kick this off over there. |
The main thing to consider is currently we are testing these Stores pretty regularly (particularly with the general code here and dependency updates). Would be good to handle this somehow when refactoring (cron jobs on CI, running with all Store extensions installed here, etc.) |
Updates to the description:
To #1274 (comment), @jakirkham, I'd say with all the work on easing releases, we should really push for faster iterations, and then have dependabot test each of the refactor stores (daily?) |
If the interface between the Zarr core and the store layer is clearly and cleanly defined, it should in theory be sufficient to test each store thoroughly. If there are edge cases that only arise when testing a store against the full Zarr test suite, that might indicate leaky abstractions. (Partial chunk decompression comes to mind as a sketchy area.) In any case, we should probably be setting up more integration testing of the entire stack, including testing against real cloud storage. I would be happy to work on that. |
I think this is far more important than trimming the zarr-python codebase! Would it be simpler to move the not-so-used storage classes into a contrib/ directory, as a mark that they are not maintained by the core team? I definitely think we need to keep in all the stores that may be instantiated directly in the main open_* functions (DirStore, FSStore, ConsolidatedStore, ...?). |
Yeah we had a similar discussion about refactor contents out within Zarr-Python as different modules and adding code owners ( #764 ) Honestly agree this may be an easier first step to address the issue identified. Once complete this change may be useful for further library refactoring. It may also be easier for a new contributor to try than setting up a new repo On a different note perhaps we should also think about an entry point mechanism like what was done in Numcodecs ( zarr-developers/numcodecs#300 ). This may make it easier for developers wanting to add their own Store without needing changes in Zarr-Python |
I'd be happy seeing the n5 functionality pulled out into its own package. |
More info here: https://github.com/zarr-developers/cookiecutter-zarr-store |
see https://github.com/zarr-developers/n5py (private atm) for a start |
I've started taking steps to deprecate many of the more exotic stores. See #1756. |
Hi, I bumped into this when getting a deprecation warning. How can we use |
@JoOkuma - in v3, the nested directory store layout will be supported through the |
As an end user of this library, this warning bothers me: |
Do you need mongodb support for the zarr v2 format, or the zarr v3 format?
|
We need v2 support. We have the choice between filesystem zarr or mongo zarr, and my dev team would prefer to use mongo storage, but it feels strange to see deprecation logs every time I run the application, if it is a valid feature. |
I agree that the warnings are annoying. We added those warnings because it's fairly common for developers to use unpinned dependencies; without the deprecation warnings in place, then we might face a large number of broken projects on the day we release v3. Is there a problem with pinning to a version of zarr-python from before the addition of the deprecation warnings? I think the mongodb backend has been pretty stable for a while. |
Ok, I did not realize that you want to ensure backward compatibility with zarr-python v2 code. |
We do not want to ensure backwards compatibility -- we are removing the mongodb support from zarr-python v3, after all. The deprecation warnings are a signal to zarr-python 2 users (and were specifically requested by zarr-python 2 users), to indicate which aspects of their workflow will break in zarr-python v3, should they use that library. |
@nmoreaud - if you are interested in porting MongoDB support to Zarr-Python 3, you may be interested in checking out the new Store ABC. This should work great with MongoDB-Motor now that the Store API is built around AsyncIO. Happy to set up a time to discuss how this could be done as a stand alone project as well (contact details in my GitHub profile). |
Thank you for all your work on Zarr! It's really great! I keep getting warnings about LMDB store getting deprecated in v3. Is there any suggested migration for LMDB stores? Is there any memory mapped store alternative present in v3? I use Zarr / LMDB extensively in my radio drone tracking project ( https://github.com/misko/spf/blob/main/spf/dataset/spf_dataset.py ). |
@misko - Thanks for piping up! Glad to hear you like zarr and that LMDB store has been serving you well. Zarr had some serious store bloat going on so we decided to trim down the core stores to something we could reasonably support. The good news though is that in 3.0, the store API will provide a ABC that 3rd parties could build off of. You (or someone else) could easily implement an LMDB store following the ABC and use it in your project. |
This came up during one of the conversations with @joshmoore. The idea is to extract functionalities not in V2 Spec and their respective tests from zarr-python. E.g. Zarr Specification V2 has the following stores:
But in
zarr-python
, we have additional stores like SQLiteStore, MongoDBStore etc. If we could remove these additional stores from thezarr-python
codebase and host them in a separate repo under zarr-developers, it'll help us:If a user/developer wishes to use the removed functionality, they can shoot
pip install zarr-sqlitestore
or similar.FYI: A similar repo is in the works for Numcodecs, which is taken care of by @joshmoore and @jakirkham.
Please let me know your thoughts on this. Thanks!
CC: @zarr-developers/python-core-devs
The text was updated successfully, but these errors were encountered: