Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inherit pandas classes from abc.collections classes? #12056

Closed
max-sixty opened this issue Jan 15, 2016 · 12 comments
Closed

Inherit pandas classes from abc.collections classes? #12056

max-sixty opened this issue Jan 15, 2016 · 12 comments
Labels
Internals Related to non-user accessible pandas implementation Refactor Internal refactoring of code

Comments

@max-sixty
Copy link
Contributor

What do people think about inheriting pandas classes from abc.collections classes?

I think the main ones are DataFrame & Series inheriting from Mapping and Index from Sequence.

This would enable other libraries to infer classes' behavior in a tighter way, for example to work out whether an object is list-like but not dict-like, they can check the types, rather than hasattr(obj, '__iter__') and not hasattr(obj, 'values'). And it doesn't cost anything.

But as far as I'm aware, other libraries in the pydata ecosystem don't do this, so this would be out of the norm.

https://docs.python.org/3/library/collections.abc.html

@kawochen
Copy link
Contributor

I think you have misunderstood abc. These classes have special __subclasshook__ so that you don't have to inherit from them to be a subclass of them.
e.g.

In [172]: issubclass(pandas.Series, collections.Container)
Out[172]: True

So perhaps we can think about whether we want to make pandas objects satisfy the necessary contracts.

@max-sixty
Copy link
Contributor Author

Cheers @kawochen. It looks like Mapping doesn't implement that (its __subclasshook__ falls back to Sized).
I'm not sure why that is. It could be because it has some Mixin methods on top of its abstract methods? Whatever the reason, I'm fairly confident DataFrame satisfies the contracts

In [21]: df = pd.DataFrame()

In [22]: isinstance(df, collections.Mapping)
Out[22]: False

In [23]: isinstance(df, collections.Sized)
Out[23]: True

@kawochen
Copy link
Contributor

@MaximilianR nope it doesn't have items
edit: OK actually I don't know why.

@kawochen
Copy link
Contributor

I agree hasattr isn't pretty. We can think about defining a few custom abstract base classes for pandas using __subclasshook__ to replace those and functions like is_dict_like. I don't know about performance, but it's probably just an extra indirection.

@max-sixty
Copy link
Contributor Author

@kawochen have a look at the Mapping.__subclasshook__ - it can never return True:

In [27]: collections.Mapping.__subclasshook__??
Signature: collections.Mapping.__subclasshook__(C)
Source:
    @classmethod
    def __subclasshook__(cls, C):
        if cls is Sized:
            if any("__len__" in B.__dict__ for B in C.__mro__):
                return True
        return NotImplemented
File:      /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/_collections_abc.py
Type:      method

So I think for these, you would need to inherit from Mapping. Still open to being wrong though

@kawochen
Copy link
Contributor

oh you are right. It looks like you'd need to call collections.Mapping.register(DataFrame)

@max-sixty
Copy link
Contributor Author

Or add that as one of the classes DataFrame inherits from. That's fairly canonical, there's an example on the collections page: https://docs.python.org/3/library/collections.abc.html (although the benefits described are slightly orthogonal)

@kawochen
Copy link
Contributor

Hmm DataFrame.values is not callable though.

@max-sixty
Copy link
Contributor Author

That's true. It does break the contract a bit. I still think it's better than what exists now, albeit not perfect.

And .values -> .data in pandas 1.0??

@mitar
Copy link
Contributor

mitar commented Sep 20, 2017

I also think that pandas.DataFrame should register with abstract classes.

I think the following would work:

Sequence.register(pandas.DataFrame)

To my understanding it exposes everything necessary.

@keijak
Copy link

keijak commented Apr 16, 2019

Making DataFrame a subclass of Mapping would be problematic due to the semantics of DataFrame__len__.
len(df) returns the number of rows (=len(df.index)). It needs to be the number of columns to match the dict behavior.

@mroeschke mroeschke added Internals Related to non-user accessible pandas implementation Refactor Internal refactoring of code labels May 27, 2019
@jreback jreback added this to the No action milestone Jan 1, 2020
@jreback
Copy link
Contributor

jreback commented Jan 1, 2020

I don't think this is actually possible as we have some named properties e.g. index which have confliciing meaning with .index() iterables.

@jreback jreback closed this as completed Jan 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation Refactor Internal refactoring of code
Projects
None yet
Development

No branches or pull requests

6 participants