Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DatasetView class breaks Liskov's rule. #8855

Open
owenlittlejohns opened this issue Mar 19, 2024 · 5 comments
Open

DatasetView class breaks Liskov's rule. #8855

owenlittlejohns opened this issue Mar 19, 2024 · 5 comments
Labels
topic-DataTree Related to the implementation of a DataTree class topic-typing

Comments

@owenlittlejohns
Copy link
Contributor

What is your issue?

Working on migrating the datatree.py module into xarray/core revealed that the DatasetView class, which implements Dataset while disabling methods to mutate the object, breaks Liskov's substitution principle. The type for one of the overloads of DatasetView.__getitem__ is more general than the corresponding Dataset.__getitem__ signature (due to the use of Self in the Dataset signature).

# In Dataset:
class Dataset(...):
    ...
    @overload
    def __getitem__(self, key: Iterable[Hashable]) -> Self: ...

The use of Self means that signature inherited from the superclass has a return type of DatasetView, but the DatasetView signature is overridden to have a return type of Dataset (the more generalised parent).

To avoid this, a couple of implementations were attempted:

  • A class that tries to intercept the methods that mutate the Dataset using getattr. This does not catch the __setitem__ method, as it is a Magic Method, and those aren't affected by getattr.
  • A Metaclass that can intercept Magic Methods, too. Implementation was inspired from here. I didn't get it to fully work, and eventually realised this was getting too complicated given the scope of the original problem.
  • A mix-in for the mutating methods. I couldn't get this to work in the timescale agreed upon.
  • Resorting back to ignoring the mypy errors for now, so we can proceed with the migration (given that there isn't a significant implementation concern identified from these type issues).

Also note, there is a tangentially-related known mypy error when a property setter accepts an argument of a different type to the property itself (python/mypy#3004). This affects the assignment of Dataset objects to the DataTree.ds property. (Separate issue, but related)

@owenlittlejohns owenlittlejohns added the needs triage Issue that has not been reviewed by xarray team member label Mar 19, 2024
Copy link

welcome bot commented Mar 19, 2024

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@TomNicholas TomNicholas added topic-DataTree Related to the implementation of a DataTree class and removed needs triage Issue that has not been reviewed by xarray team member labels Mar 19, 2024
@TomNicholas
Copy link
Member

A class that tries to intercept the methods that mutate the Dataset using getattr. This does not catch the __setitem__ method, as it is a Magic Method, and those aren't affected by getattr.

This seems like it should be resolvable... But if we can satisfy mypy with some ignores we should definitely do that in the short term, we have much more interesting parts of datatree to get to, and we can always come back to this without making breaking changes.

@shoyer
Copy link
Member

shoyer commented Sep 8, 2024

I think the right way to fix this from a typing perspective would be to make Dataset inherit from DatasetView instead of the other way around. Naming could be a little tricky, though!

@TomNicholas
Copy link
Member

TomNicholas commented Sep 9, 2024

Would that work though? It would mean that isinstance(dt.ds, Dataset) would return False, which I wanted to avoid.

EDIT: But maybe that just isn't something we should be trying to make work.

@shoyer
Copy link
Member

shoyer commented Sep 9, 2024

Would that work though? It would mean that isinstance(dt.ds, Dataset) would return False, which I wanted to avoid.

EDIT: But maybe that just isn't something we should be trying to make work.

Indeed, I think the situation is similar to Python's dict vs the general collections.abc.Mapping protocol. Notice how MutableMapping inherits from Mapping.

Another option for us could be to add an immutable or frozen attribute to xarray.Dataset itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class topic-typing
Projects
Development

No branches or pull requests

3 participants