Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New accessor for a biomechanics open source library #3582

Closed
romainmartinez opened this issue Nov 28, 2019 · 6 comments
Closed

New accessor for a biomechanics open source library #3582

romainmartinez opened this issue Nov 28, 2019 · 6 comments

Comments

@romainmartinez
Copy link
Contributor

Hello and first of all, thank you for xarray.

With a colleague, we created pyomeca which is a Python library specialized in biomechanical data processing. Most of the data is multidimensional and so we have reimplemented by hand some functionality to access labels, etc. Except Xarray does it much better.

I am rewriting the entire library using xarray. motion reimplements the pyomeca functions with an accessor for xarray.

I have some questions about the architecture you recommend for a xarray-based library.

In pyomeca, I have three classes (RotoTrans, Markers3d and Analogs3d) with each having very specific functions.

I have two possible solutions to implement them with xarray: (1) write an accessor for each class or (2) inherit from DataArray, e.g:

import xarray

Rototrans class (xarray.DataArray):
    pass

According to your documentation, the accesor is the recommended method. However, for most functions, I need a three-dimensional table with specific dimensions ([axis, channels, time_frame]) and the accessor does not allow to perform these checks when creating the DataArray. At the same time, the second solution forces me to inherit a basic function that allows me to access the functions common to all classes

import xarray

class BaseArray(xarray.DataArray):
    def fct():
        pass
    
class Rototrans(Base):
    pass

Do you have any design recommendations?

@crusaderky
Copy link
Contributor

I would not advise subclassing. In most cases it should work but it's thoroughly untested and I would be unsurprised if you were to find a use case that accidentally reverts your subclass to the base class.

I don't think I fully understand your problem with accessors. Is it because you would need a hook that is triggered by DataArray. __init__? Why can't you initialise whatever you need to upon first access?

@crusaderky
Copy link
Contributor

Give a careful read to #3268 if statefulness (beyond what can be stored in xarray variables) is important for you.

@romainmartinez
Copy link
Contributor Author

Is it because you would need a hook that is triggered by DataArray. init

Yes, I need to test some properties when creating the DataArray such as shape and dimensions properties.

Why can't you initialise whatever you need to upon first access?

How do you do that?

@shoyer
Copy link
Member

shoyer commented Nov 29, 2019

What about simply wrapping xarray objects internally in your classes, along with utilities for converting to/from xarray objects? When users want to do domain specific stuff they can use your objects, and when they want to do something generic with xarray they can convert by calling a method like to_xarray.

@crusaderky
Copy link
Contributor

+1 to shoyer - encapsulation is by far the easiest approach.

Why can't you initialise whatever you need to upon first access?

How do you do that?

@xarray.register_dataset_accessor('foo')
class FooAccessor:
    def __init__(self, xarray_obj):
        self.obj = xarray_obj
        # <insert whatever health checks on xarray_obj>
        self.x = something(xarray_obj)

    @property
    def bar(self):
        # snip

    def baz(self):
        # snip

The accessor __init__ method will be invoked by xarray the first time you invoke array.foo.

Note, however, that it is not recommended to put expensive calculations in it, because the object will possibly be destroyed and reinitialised every time the array is transformed - or in other words, anything that is not a read-only method/property or a cell update.

I emphasized possibly because some transforms may not destroy and recreate your accessor instance, thus potentially causing it to be in a state that is incoherent with the attached xarray object. Every time you invoke a method, you should verify that whatever assumptions you relied upon to generate the state are still valid.

@romainmartinez
Copy link
Contributor Author

Ok thanks for the recommandations.

I'm thinking of using this kind of class to handle object creation:

import xarray as xr
import motion


class Analogs:
    def __new__(cls, ...):
        array = xr.DataArray(...)
        cls.verify(array)
        return array

    @staticmethod
    def verify(array):
        print("verification...")

    @classmethod
    def from_csv(cls, filename:str, ...):
        array = xr.DataArray(filename, ...)
        cls.verify(array)
        return array

Let's say I define another Markers class for another type of data.
Is it possible that my accessor exposes different functions according to class e, g:

@xr.register_dataarray_accessor("foo")
class DataArrayAccessor(object):
    def __init__(self, xarray_obj: xr.DataArray):
        self._obj = xarray_obj

    def to_csv(self):
        """
        This function is exposed to all objects
        but behave differently if it is a Markers or an Analogs
        """

    def marker_only_function(self):
        """
        This function is only exposed to 
        array created with the Marker class
        """

    def analog_only_function(self):
        """
        This function is only exposed to
        array created with the Analog class
        """

My ultimate goal is to have this kind of API:

analogs = (
    Analogs.from_csv("filename.csv")
    .foo.center()    # foo accessor generic method
    .foo.rectify()   # foo accessor specific method for Analogs object
    .mean()          # xarray method
)

@pydata pydata locked and limited conversation to collaborators Aug 2, 2023
@andersy005 andersy005 converted this issue into discussion #8044 Aug 2, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

4 participants