Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thinning down zarr-python #1274

Open
MSanKeys963 opened this issue Nov 22, 2022 · 23 comments
Open

Thinning down zarr-python #1274

MSanKeys963 opened this issue Nov 22, 2022 · 23 comments
Labels
help wanted Issue could use help from someone with familiarity on the topic V3 Affects the v3 branch
Milestone

Comments

@MSanKeys963
Copy link
Member

MSanKeys963 commented Nov 22, 2022

This came up during one of the conversations with @joshmoore. The idea is to extract functionalities not in V2 Spec and their respective tests from zarr-python. E.g. Zarr Specification V2 has the following stores:

  • Directory Store
  • Nested Directory Store
  • Zip file Store
  • Object Store
  • HTTP Store

But in zarr-python, we have additional stores like SQLiteStore, MongoDBStore etc. If we could remove these additional stores from the zarr-python codebase and host them in a separate repo under zarr-developers, it'll help us:

  • Making the code run faster and more efficiently
  • Run fewer tests and faster completion of them
  • Reduce the noise from dependabot
  • Add maintainers for specific stores
  • Focus on a more concise codebase, thus helping us to add additional features when migrating from V2 to V3

If a user/developer wishes to use the removed functionality, they can shoot pip install zarr-sqlitestore or similar.

FYI: A similar repo is in the works for Numcodecs, which is taken care of by @joshmoore and @jakirkham.

Please let me know your thoughts on this. Thanks!

CC: @zarr-developers/python-core-devs

@MSanKeys963 MSanKeys963 added the help wanted Issue could use help from someone with familiarity on the topic label Nov 22, 2022
@jakirkham
Copy link
Member

Yeah we have had similar discussions before

@MSanKeys963
Copy link
Member Author

Thanks for linking the previous discussions, @jakirkham. Some interesting points there.

Also, we have PyData Global 2022 Sprints coming up next week so we're thinking to kick this off over there.

@jakirkham
Copy link
Member

The main thing to consider is currently we are testing these Stores pretty regularly (particularly with the general code here and dependency updates). Would be good to handle this somehow when refactoring (cron jobs on CI, running with all Store extensions installed here, etc.)

@joshmoore
Copy link
Member

Updates to the description:

  • s/eliminate/extract/
  • dropped FSStore from the list for the moment
  • Added benefits:
    • Reduce the noise from dependabot
    • Add maintainers for specific stores

To #1274 (comment), @jakirkham, I'd say with all the work on easing releases, we should really push for faster iterations, and then have dependabot test each of the refactor stores (daily?)

@rabernat
Copy link
Contributor

The main thing to consider is currently we are testing these Stores pretty regularly (particularly with the general code here and dependency updates)

If the interface between the Zarr core and the store layer is clearly and cleanly defined, it should in theory be sufficient to test each store thoroughly. If there are edge cases that only arise when testing a store against the full Zarr test suite, that might indicate leaky abstractions. (Partial chunk decompression comes to mind as a sketchy area.)

In any case, we should probably be setting up more integration testing of the entire stack, including testing against real cloud storage. I would be happy to work on that.

@martindurant
Copy link
Member

we should probably be setting up more integration testing of the entire stack

I think this is far more important than trimming the zarr-python codebase!

Would it be simpler to move the not-so-used storage classes into a contrib/ directory, as a mark that they are not maintained by the core team?

I definitely think we need to keep in all the stores that may be instantiated directly in the main open_* functions (DirStore, FSStore, ConsolidatedStore, ...?).

@jakirkham
Copy link
Member

Yeah we had a similar discussion about refactor contents out within Zarr-Python as different modules and adding code owners ( #764 )

Honestly agree this may be an easier first step to address the issue identified. Once complete this change may be useful for further library refactoring. It may also be easier for a new contributor to try than setting up a new repo

On a different note perhaps we should also think about an entry point mechanism like what was done in Numcodecs ( zarr-developers/numcodecs#300 ). This may make it easier for developers wanting to add their own Store without needing changes in Zarr-Python

@d-v-b
Copy link
Contributor

d-v-b commented Nov 29, 2022

I'd be happy seeing the n5 functionality pulled out into its own package.

@MSanKeys963
Copy link
Member Author

@MSanKeys963
Copy link
Member Author

@jhamman - do you think this should be a part of V3 design doc (#1583) goals?

@d-v-b
Copy link
Contributor

d-v-b commented Mar 4, 2024

see https://github.com/zarr-developers/n5py (private atm) for a start

@jhamman
Copy link
Member

jhamman commented Apr 19, 2024

I've started taking steps to deprecate many of the more exotic stores. See #1756.

@jhamman jhamman moved this from Todo to In Progress in Zarr-Python - 3.0 May 7, 2024
@JoOkuma
Copy link

JoOkuma commented May 15, 2024

Hi, I bumped into this when getting a deprecation warning.

How can we use NestedDirectoryStore from V3 forward?

@jhamman
Copy link
Member

jhamman commented May 15, 2024

@JoOkuma - in v3, the nested directory store layout will be supported through the chunk_key_encoding setting.

@nmoreaud
Copy link

As an end user of this library, this warning bothers me: FutureWarning: The MongoDBStore is deprecated and will be removed in a Zarr-Python version 3, see https://github.com/zarr-developers/zarr-python/issues/1274 for more information/
If the MongoDBStore is going to be moved to a new python package, why is it described as deprecated?
What are you planning to do about mongodb support?

@d-v-b
Copy link
Contributor

d-v-b commented Sep 12, 2024

What are you planning to do about mongodb support?

Do you need mongodb support for the zarr v2 format, or the zarr v3 format?

  • If you need it for zarr v2 (e.g., you depend on what's offered in zarr python today), then the short-term solution is to pin your zarr-python dependency to < 3, because there is no mongodb support coming to zarr-python 3.
  • If you need mongodb for zarr v3, then we need to write new code for that, and it would live in a standalone repo, e.g. zarr-mongodb.

@nmoreaud
Copy link

What are you planning to do about mongodb support?

Do you need mongodb support for the zarr v2 format, or the zarr v3 format?

  • If you need it for zarr v2 (e.g., you depend on what's offered in zarr python today), then the short-term solution is to pin your zarr-python dependency to < 3, because there is no mongodb support coming to zarr-python 3.
  • If you need mongodb for zarr v3, then we need to write new code for that, and it would live in a standalone repo, e.g. zarr-mongodb.

We need v2 support. We have the choice between filesystem zarr or mongo zarr, and my dev team would prefer to use mongo storage, but it feels strange to see deprecation logs every time I run the application, if it is a valid feature.

@d-v-b
Copy link
Contributor

d-v-b commented Sep 12, 2024

but it feels strange to see deprecation logs every time I run the application, if it is a valid feature.

I agree that the warnings are annoying. We added those warnings because it's fairly common for developers to use unpinned dependencies; without the deprecation warnings in place, then we might face a large number of broken projects on the day we release v3. Is there a problem with pinning to a version of zarr-python from before the addition of the deprecation warnings? I think the mongodb backend has been pretty stable for a while.

@nmoreaud
Copy link

but it feels strange to see deprecation logs every time I run the application, if it is a valid feature.

I agree that the warnings are annoying. We added those warnings because it's fairly common for developers to use unpinned dependencies; without the deprecation warnings in place, then we might face a large number of broken projects on the day we release v3. Is there a problem with pinning to a version of zarr-python from before the addition of the deprecation warnings? I think the mongodb backend has been pretty stable for a while.

Ok, I did not realize that you want to ensure backward compatibility with zarr-python v2 code.

@d-v-b
Copy link
Contributor

d-v-b commented Sep 12, 2024

Ok, I did not realize that you want to ensure backward compatibility with zarr-python v2 code.

We do not want to ensure backwards compatibility -- we are removing the mongodb support from zarr-python v3, after all. The deprecation warnings are a signal to zarr-python 2 users (and were specifically requested by zarr-python 2 users), to indicate which aspects of their workflow will break in zarr-python v3, should they use that library.

@jhamman
Copy link
Member

jhamman commented Sep 17, 2024

@nmoreaud - if you are interested in porting MongoDB support to Zarr-Python 3, you may be interested in checking out the new Store ABC. This should work great with MongoDB-Motor now that the Store API is built around AsyncIO. Happy to set up a time to discuss how this could be done as a stand alone project as well (contact details in my GitHub profile).

@misko
Copy link

misko commented Oct 23, 2024

Thank you for all your work on Zarr! It's really great! I keep getting warnings about LMDB store getting deprecated in v3. Is there any suggested migration for LMDB stores? Is there any memory mapped store alternative present in v3?

I use Zarr / LMDB extensively in my radio drone tracking project ( https://github.com/misko/spf/blob/main/spf/dataset/spf_dataset.py ).

@jhamman
Copy link
Member

jhamman commented Oct 23, 2024

@misko - Thanks for piping up! Glad to hear you like zarr and that LMDB store has been serving you well.

Zarr had some serious store bloat going on so we decided to trim down the core stores to something we could reasonably support. The good news though is that in 3.0, the store API will provide a ABC that 3rd parties could build off of. You (or someone else) could easily implement an LMDB store following the ABC and use it in your project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issue could use help from someone with familiarity on the topic V3 Affects the v3 branch
Projects
Status: In Progress
Development

No branches or pull requests

10 participants