Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document storage classes and some developer apis #2279

Merged
merged 37 commits into from
Oct 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
fe49f5f
fix: zarr v2 compatability fixes
jhamman Sep 14, 2024
9a1580b
move zarr.store to zarr.storage
jhamman Sep 16, 2024
0d89912
make chunks a tuple
jhamman Sep 17, 2024
d78e384
Merge branch 'v3' of https://github.com/zarr-developers/zarr-python i…
jhamman Sep 17, 2024
e534279
Apply suggestions from code review
jhamman Sep 17, 2024
dea4a3d
Merge branch 'v3' of https://github.com/zarr-developers/zarr-python i…
jhamman Sep 18, 2024
7800f38
Merge branch 'v3' of https://github.com/zarr-developers/zarr-python i…
jhamman Sep 22, 2024
93b61fc
more merge conflict resolution
jhamman Sep 23, 2024
88afe52
Merge branch 'v3' of https://github.com/zarr-developers/zarr-python i…
jhamman Sep 29, 2024
fb6752d
fixups
jhamman Sep 29, 2024
0b1dedc
fixup zipstore
jhamman Sep 29, 2024
322918a
Apply suggestions from code review
jhamman Sep 29, 2024
a95d54a
Apply suggestions from code review
jhamman Sep 30, 2024
128eb53
add test
jhamman Sep 30, 2024
3c170ef
Merge branch 'v3' of https://github.com/zarr-developers/zarr-python i…
jhamman Sep 30, 2024
54ab9ef
extend test
jhamman Sep 30, 2024
77f2938
clean up parents
jhamman Sep 30, 2024
2295d76
debug race condition
jhamman Sep 30, 2024
5879d67
more debug
jhamman Sep 30, 2024
f54f6f2
document storage classes and some developer apis
jhamman Oct 1, 2024
3940d22
Update src/zarr/core/array.py
jhamman Oct 1, 2024
c9d1c50
Merge branch 'fix/dask-compat' into doc/storage
jhamman Oct 2, 2024
83fa3ac
Merge branch 'v3' of https://github.com/zarr-developers/zarr-python i…
jhamman Oct 7, 2024
f2137fb
Merge branch 'doc/storage' of github.com:jhamman/zarr-python into doc…
jhamman Oct 7, 2024
f44601b
Merge branch 'v3' into doc/storage
jhamman Oct 9, 2024
731c22a
Merge branch 'v3' of https://github.com/zarr-developers/zarr-python i…
jhamman Oct 10, 2024
4e93aa0
inherit docstrings from baseclass
jhamman Oct 10, 2024
dad94e2
Merge branch 'doc/storage' of github.com:jhamman/zarr-python into doc…
jhamman Oct 10, 2024
5477d32
fix sphinx warning
jhamman Oct 11, 2024
728d4f7
# docstring inherited
jhamman Oct 11, 2024
e293b47
Merge branch 'v3' of https://github.com/zarr-developers/zarr-python i…
jhamman Oct 11, 2024
cee730c
Merge branch 'v3' of https://github.com/zarr-developers/zarr-python i…
jhamman Oct 11, 2024
3f7290f
add storage guide
jhamman Oct 11, 2024
35a6428
add missing file
jhamman Oct 11, 2024
a78461e
update links
jhamman Oct 11, 2024
eff01ce
Merge branch 'v3' into doc/storage
jhamman Oct 11, 2024
c1f0923
Merge branch 'v3' into doc/storage
dstansby Oct 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/guide/consolidated_metadata.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ Usage

If consolidated metadata is present in a Zarr Group's metadata then it is used
by default. The initial read to open the group will need to communicate with
the store (reading from a file for a :class:`zarr.store.LocalStore`, making a
network request for a :class:`zarr.store.RemoteStore`). After that, any subsequent
the store (reading from a file for a :class:`zarr.storage.LocalStore`, making a
network request for a :class:`zarr.storage.RemoteStore`). After that, any subsequent
metadata reads get child Group or Array nodes will *not* require reads from the store.

In Python, the consolidated metadata is available on the ``.consolidated_metadata``
Expand All @@ -22,7 +22,7 @@ attribute of the ``GroupMetadata`` object.
.. code-block:: python

>>> import zarr
>>> store = zarr.store.MemoryStore({}, mode="w")
>>> store = zarr.storage.MemoryStore({}, mode="w")
>>> group = zarr.open_group(store=store)
>>> group.create_array(shape=(1,), name="a")
>>> group.create_array(shape=(2, 2), name="b")
Expand Down
1 change: 1 addition & 0 deletions docs/guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@ Guide
.. toctree::
:maxdepth: 1

storage
consolidated_metadata
101 changes: 101 additions & 0 deletions docs/guide/storage.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
Storage
=======

Zarr-Python supports multiple storage backends, including: local file systems,
Zip files, remote stores via ``fspec`` (S3, HTTP, etc.), and in-memory stores. In
Zarr-Python 3, stores must implement the abstract store API from
:class:`zarr.abc.store.Store`.

.. note::
Unlike Zarr-Python 2 where the store interface was built around a generic ``MutableMapping``
API, Zarr-Python 3 utilizes a custom store API that utilizes Python's AsyncIO library.

Implicit Store Creation
-----------------------

In most cases, it is not required to create a ``Store`` object explicitly. Passing a string
to Zarr's top level API will result in the store being created automatically.

.. code-block:: python

>>> import zarr
>>> zarr.open("data/foo/bar", mode="r") # implicitly creates a LocalStore
<Group file://data/foo/bar>
>>> zarr.open("s3://foo/bar", mode="r") # implicitly creates a RemoteStore
<Group s3://foo/bar>
>>> data = {}
>>> zarr.open(data, mode="w") # implicitly creates a MemoryStore
<Group memory://4791444288>

Explicit Store Creation
-----------------------

In some cases, it may be helpful to create a store instance directly. Zarr-Python offers four
built-in store: :class:`zarr.storage.LocalStore`, :class:`zarr.storage.RemoteStore`,
:class:`zarr.storage.ZipStore`, and :class:`zarr.storage.MemoryStore`.

Local Store
~~~~~~~~~~~

The :class:`zarr.storage.LocalStore` stores data in a nested set of directories on a local
filesystem.

.. code-block:: python

>>> import zarr
>>> store = zarr.storage.LocalStore("data/foo/bar", mode="r")
>>> zarr.open(store=store)
<Group file://data/foo/bar>

Zip Store
~~~~~~~~~

The :class:`zarr.storage.ZipStore` stores the contents of a Zarr hierarchy in a single
Zip file. The `Zip Store specification_` is currently in draft form.

.. code-block:: python

>>> import zarr
>>> store = zarr.storage.ZipStore("data.zip", mode="w")
>>> zarr.open(store=store, shape=(2,))
<Array zip://data.zip shape=(2,) dtype=float64

Remote Store
~~~~~~~~~~~~

The :class:`zarr.storage.RemoteStore` stores the contents of a Zarr hierarchy in following the same
logical layout as the ``LocalStore``, except the store is assumed to be on a remote storage system
such as cloud object storage (e.g. AWS S3, Google Cloud Storage, Azure Blob Store). The
:class:`zarr.storage.RemoteStore` is backed by `Fsspec_` and can support any Fsspec backend
that implements the `AbstractFileSystem` API,

.. code-block:: python

>>> import zarr
>>> store = zarr.storage.RemoteStore("gs://foo/bar", mode="r")
>>> zarr.open(store=store)
<Array <RemoteStore(GCSFileSystem, foo/bar)> shape=(10, 20) dtype=float32>

Memory Store
~~~~~~~~~~~~

The :class:`zarr.storage.RemoteStore` a in-memory store that allows for serialization of
Zarr data (metadata and chunks) to a dictionary.

.. code-block:: python

>>> import zarr
>>> data = {}
>>> store = zarr.storage.MemoryStore(data, mode="w")
>>> zarr.open(store=store, shape=(2, ))
<Array memory://4943638848 shape=(2,) dtype=float64>

Developing custom stores
------------------------

Zarr-Python :class:`zarr.abc.store.Store` API is meant to be extended. The Store Abstract Base
Class includes all of the methods needed to be a fully operational store in Zarr Python.
Zarr also provides a test harness for custom stores: :class:`zarr.testing.store.StoreTests`.

.. _Zip Store Specification: https://github.com/zarr-developers/zarr-specs/pull/311
.. _Fsspec: https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#consolidated-metadata
87 changes: 84 additions & 3 deletions src/zarr/abc/store.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@


class AccessMode(NamedTuple):
"""Access mode flags."""

str: AccessModeLiteral
readonly: bool
overwrite: bool
Expand All @@ -28,6 +30,24 @@ class AccessMode(NamedTuple):

@classmethod
def from_literal(cls, mode: AccessModeLiteral) -> Self:
"""
Create an AccessMode instance from a literal.

Parameters
----------
mode : AccessModeLiteral
One of 'r', 'r+', 'w', 'w-', 'a'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we define these as a string somewhere, and then re-use it here and lower down by making the docstring a format string? I worry about lists like this getting out of sync if they're duplicated across docstrings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided not to do that here because its just this one method. If we have a docstring template tool in the future, I think it would be great to bring that to bear here (and even more so on the Group/Array classes).


Returns
-------
AccessMode
The created instance.

Raises
------
ValueError
If mode is not one of 'r', 'r+', 'w', 'w-', 'a'.
"""
if mode in ("r", "r+", "a", "w", "w-"):
return cls(
str=mode,
Expand All @@ -40,6 +60,10 @@ def from_literal(cls, mode: AccessModeLiteral) -> Self:


class Store(ABC):
"""
Abstract base class for Zarr stores.
"""

_mode: AccessMode
_is_open: bool

Expand All @@ -49,6 +73,21 @@ def __init__(self, *args: Any, mode: AccessModeLiteral = "r", **kwargs: Any) ->

@classmethod
async def open(cls, *args: Any, **kwargs: Any) -> Self:
"""
Create and open the store.

Parameters
----------
*args : Any
Positional arguments to pass to the store constructor.
**kwargs : Any
Keyword arguments to pass to the store constructor.

Returns
-------
Store
The opened store instance.
"""
store = cls(*args, **kwargs)
await store._open()
return store
Expand All @@ -67,6 +106,20 @@ def __exit__(
self.close()

async def _open(self) -> None:
"""
Open the store.

Raises
------
ValueError
If the store is already open.
FileExistsError
If ``mode='w-'`` and the store already exists.

Notes
-----
* When ``mode='w'`` and the store already exists, it will be cleared.
"""
if self._is_open:
raise ValueError("store is already open")
if self.mode.str == "w":
Expand All @@ -76,14 +129,30 @@ async def _open(self) -> None:
self._is_open = True

async def _ensure_open(self) -> None:
"""Open the store if it is not already open."""
if not self._is_open:
await self._open()

@abstractmethod
async def empty(self) -> bool: ...
async def empty(self) -> bool:
"""
Check if the store is empty.

Returns
-------
bool
True if the store is empty, False otherwise.
"""
...

@abstractmethod
async def clear(self) -> None: ...
async def clear(self) -> None:
"""
Clear the store.

Remove all keys and values from the store.
"""
...

@abstractmethod
def with_mode(self, mode: AccessModeLiteral) -> Self:
Expand Down Expand Up @@ -116,6 +185,7 @@ def mode(self) -> AccessMode:
return self._mode

def _check_writable(self) -> None:
"""Raise an exception if the store is not writable."""
if self.mode.readonly:
raise ValueError("store mode does not support writing")

Expand Down Expand Up @@ -199,7 +269,7 @@ async def set_if_not_exists(self, key: str, value: Buffer) -> None:
Store a key to ``value`` if the key is not already present.

Parameters
-----------
----------
key : str
value : Buffer
"""
Expand Down Expand Up @@ -339,6 +409,17 @@ async def set_if_not_exists(self, default: Buffer) -> None: ...


async def set_or_delete(byte_setter: ByteSetter, value: Buffer | None) -> None:
"""Set or delete a value in a byte setter

Parameters
----------
byte_setter : ByteSetter
value : Buffer | None

Notes
-----
If value is None, the key will be deleted.
"""
if value is None:
await byte_setter.delete()
else:
Expand Down
2 changes: 2 additions & 0 deletions src/zarr/storage/__init__.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
from zarr.storage.common import StoreLike, StorePath, make_store_path
from zarr.storage.local import LocalStore
from zarr.storage.logging import LoggingStore
from zarr.storage.memory import MemoryStore
from zarr.storage.remote import RemoteStore
from zarr.storage.zip import ZipStore

__all__ = [
"LocalStore",
"LoggingStore",
"MemoryStore",
"RemoteStore",
"StoreLike",
Expand Down
Loading