Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working with ProxySource: examples. #259

Closed
wants to merge 19 commits into from
Closed

Conversation

omaech
Copy link
Collaborator

@omaech omaech commented Sep 18, 2024

No description provided.

Copy link
Member

@FrancescAlted FrancescAlted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several comments. I think we need a team discussion about some of them.

>>> ndarray = blosc2.asarray(data)
>>> proxy = blosc2.Proxy(ndarray)
>>> full_data = proxy.fetch()
>>> f"Full data cache: {full_data[:]}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be more graphical if you fetch just an slice (say [slice(0, 3), slice(0,2), so that when data is printed, one can see that only that slice has been fetched (i.e. other values are uninitialized).

[12 13][14 15][16 17]
[18 19]]
>>> slice_data = proxy[0:2, :]
>>> f"Slice data cache: {slice_data}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it makes little sense to show the slice, as the full array has been fetched already. Better put this above, as already suggested.

>>> self.chunks = [data.shape[i] for i in range(data.ndim)]
>>> self.blocks = [data.shape[i] for i in range(data.ndim)]
>>> self.dtype = data.dtype
>>> f"Data shape: {self.shape}, Chunks: {self.chunks}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many indentation here

>>> async def aget_chunk(self, nchunk):
>>> await asyncio.sleep(0.1)
>>> return self.data[nchunk]
>>> # Class that inherits from blosc2.Proxy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is trivial, it can go away.

>>> import asyncio
>>> class MyProxySource:
>>> def __init__(self, data):
>>> # If the source is multidimensional, it must have the attributes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the" -> "the next"

>>> ndarray = blosc2.asarray(data)
>>> proxy = blosc2.Proxy(ndarray)
>>> slice_1 = proxy[0:3, 0:3]
>>> f"Slice 1: {slice_1}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, print the entire proxy values to see that only slice [:3, :3] has been filled. If (10, 10) is too large for the dump, use e.g. (5, 5).

"""The dtype of :paramref:`self` or None if the data is unidimensional"""
"""The dtype of :paramref:`self` or None if the data is unidimensional

Examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are trivial. I'd remove examples for properties like this.

"""The shape of :paramref:`self`"""
"""The shape of :paramref:`self`

Examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

@@ -224,6 +329,16 @@ def vlmeta(self):
See Also
--------
:ref:`SChunk.vlmeta`

Examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

@@ -240,6 +355,26 @@ def fields(self):
See Also
--------
:ref:`NDField`

Examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure about this one. It brings some insight about how to use the fields property, but on the other hand, this has already been documented in the original NDArray.fields. Better a See also section?

Copy link
Member

@FrancescAlted FrancescAlted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new amendments, this looks good to me.

Copy link
Collaborator

@martaiborra martaiborra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some type annotations should be restored. Other than a comment on the Proxy example, LGTM.

Comment on lines 10 to 13
import blosc2
import numpy as np

import blosc2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to repeat import blosc2

@@ -50,7 +51,7 @@ def dtype(self) -> np.dtype:
pass

@abstractmethod
def get_chunk(self, nchunk: int) -> bytes:
def get_chunk(self, nchunk):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restore type annotations def get_chunk(self, nchunk: int) -> bytes:

@@ -226,7 +227,7 @@ def __init__(self, src: ProxySource or ProxyNDSource, urlpath: str = None, **kwa
for key in vlmeta:
self._schunk_cache.vlmeta[key] = vlmeta[key]

def fetch(self, item: slice | list[slice] = None) -> blosc2.NDArray | blosc2.schunk.SChunk:
def fetch(self, item=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restore type annotations

@@ -276,7 +272,7 @@ def fetch(self, item: slice | list[slice] = None) -> blosc2.NDArray | blosc2.sch

return self._cache

async def afetch(self, item: slice | list[slice] = None) -> blosc2.NDArray | blosc2.schunk.SChunk:
async def afetch(self, item=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dito

Comment on lines 317 to 323
>>> class MyProxy(blosc2.Proxy):
>>> def __init__(self, source):
>>> super().__init__(source)
>>> self._cache = source.data
>>> # Asynchronous method to get the cache data
>>> async def afetch(self, slice_=None):
>>> return self._cache if slice_ is None else self._cache[slice_]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not need to create a class MyProxy which inherits from blosc2.Proxy. You can directly create the proxy with proxy = blosc2.Proxy(source)

"""
# Populate the cache
self.fetch(item)
return self._cache[item]

@property
def dtype(self) -> np.dtype:
"""The dtype of :paramref:`self` or None if the data is unidimensional"""
def dtype(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dito

return self._cache.dtype if isinstance(self._cache, blosc2.NDArray) else None

@property
def shape(self) -> tuple[int]:
"""The shape of :paramref:`self`"""
def shape(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dito

return self._cache.shape if isinstance(self._cache, blosc2.NDArray) else len(self._cache)

def __str__(self):
return f"Proxy({self.src}, urlpath={self.urlpath})"

@property
def vlmeta(self) -> blosc2.schunk.vlmeta:
def vlmeta(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dito

@@ -378,7 +428,7 @@ def vlmeta(self) -> blosc2.schunk.vlmeta:
return self._schunk_cache.vlmeta

@property
def fields(self) -> dict:
def fields(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dito

@@ -404,7 +474,7 @@ def __init__(self, proxy: Proxy, field: str):
self.shape = proxy.shape
self.dtype = proxy.dtype

def __getitem__(self, item: slice | list[slice]) -> np.ndarray:
def __getitem__(self, item: slice):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dito

Copy link
Member

@FrancescAlted FrancescAlted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some comments.

>>> host = "https://demo.caterva2.net/"
>>> root = "b2tests"
>>> dir = "expr/"
>>> name = "ds-0-10-linspace-float64-(True, True)-a1-(60, 60)d.b2nd"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use regular examples in example root.

>>> f"Shape of the remote array: {remote_array.shape}"
>>> f"Chunks of the remote array: {remote_array.chunks}"
>>> f"Blocks of the remote array: {remote_array.blocks}"
>>> f"Dtype of the remote array: {remote_array.dtype}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd replace this by just:

>>> remote_array.shape
(60, 60)
...

@omaech omaech closed this Sep 25, 2024
@omaech
Copy link
Collaborator Author

omaech commented Sep 25, 2024

This pull request has been replaced by #267

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants