-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
General repr format for our internal ExtensionArrays #22846
Comments
So a similar random thought as I mentioned in #22511 , one possible alternative:
|
That seems fine. Would we update Categorical to include |
This and some other common-but-not-necessary methods could go in a mixin in something like |
PeriodArray is using a repr like
I'd be happy with something similar for IntegerArray. SparseArray can't share it, because it needs to indicate the sparse points. But I think it can be a bit more uniform, something like >>> pd.SparseArray([1, 0, 0, 4])
<pandas.SparseArray>
[1, 0, 0, 4]
Indicies: array([0, 3]), dtype: Sparse[int64, 0] Categorical could maybe be updated, but not a big deal. |
I personally like the fact that we have 'pandas' in the repr, like I did in the IntegerArray example above, or how you did the SparsArray (what PeriodArray doesn't have), to make it clear from it's repr it is a pandas array, without needing to check |
That's why I left it out of PeriodArray. I don't have a strong opinion there though. It's clearly not code since it's in brackets. We could even have |
I am puzzled why you are arbitrarily changing formats here from what we already have longstanding for all Index types. The repr for Index looks much nicer, doesn't have angle brackets, fits on a single line if its short, and properly separates commas. Creating a new format is non-trivial, which is ok to do generally, but only if we change everything. This is a huge, undesired breaking change. So big -1 on doing this. This impacts #23601 and is blocking for me. |
already have longstanding for all Index types
This is for arrays, not index. It's similar to Categorical.
doesn't have angle brackets
We don't know if the repr is code, so angle brackets are appropriate.
properly separates commas
Do you have an example of #23617
where commas aren't handled correctly?
This is a huge, undesired breaking change
#23617 is backwards compatible.
…On Mon, Nov 12, 2018 at 8:25 AM Jeff Reback ***@***.***> wrote:
I am puzzled why you are arbitrarily changing formats here from what we
already have longstanding for all Index types. The repr for Index looks
much nicer, doesn't have angle brackets, fits on a single line if its
short, and properly separates commas.
Creating a new format is non-trivial, which is ok to do generally, but
only if we change everything. This is a huge, undesired breaking change. So
big -1 on doing this. This impacts #23601
<#23601> and is blocking for me.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#22846 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHInrCa3XWPN4zXpdwPPdR0iwJsA0Dks5uuYTdgaJpZM4W7ipV>
.
|
You may not like the proposed format, but it is not "breaking". Nothing will be broken as this only affects new API (the internal EAs), not the reprs of Index or Series.
It is not arbitrarily. There are reasons to do this, mentioned somewhat in the original issue. Trying to put the original reasoning a bit clearer:
Since the Array class is a new API, we have the opportunity to discuss more freely what repr we would like to have. So let's discuss that (that's the reason I originally opened this issue). |
The more things are different the more inconsistencies across the code base in code, testing, and user experience. This is such a jaring huge change you now done't know what the relationship between Index and Array is. This is a bad thing.
Sure, this is the case, but it is not a reason to deviate.
This is a problem for MultiIndex as well. Again I am all for changing everything, just not a part.
This is also not true for Index generally, e.g. Period, not sure why this is an issue here. I thinking breaking this change (and adding associated hacky things like removing commas), is just bad code smell. I would be ok with adding the repr the same as Index. Then if you want to change the repr generally that is ok, but half and half does not fly. |
I might be more accepting if the repr is not multi-lined by default, I think this is also the cause of the 'need to remove the commas'. |
@jreback can you try to explain why you don't like the fact it is multi-line? Once you have a bit longer index, our current index repr is also multi-line (IntervalIndex is even multi-line by default, but that one is a bit inconsistent with the others) |
for a short repr this is very awkward. Sure when its get longer it is multi-line. The point is the separation of the class from the data. I personally think the Index repr is just fine. You need a much stronger argument to arbitrarily change it. and that is what you are proposing. Consistency is the king here. You are simply breaking this. |
So In [2]: pd.core.arrays.period_array(['2000', '2001'], 'D')
Out[2]: <PeriodArray>['2000-01-01', '2001-01-01'], Length: 2, dtype: period[D] rather than In [2]: pd.core.arrays.period_array(['2000', '2001'], 'D')
Out[2]:
<PeriodArray>
['2000-01-01', '2001-01-01']
Length: 2, dtype: period[D] would move you to a +1 on |
Currently
so i see some departures here which just don't make sense. quoting on the dtype, the length (in the current EA); this only shows up if its too long for the display normally. its 1 line, and I just don't like the angle brackets. |
One thing at a time. Do we agree that angle brackets are appropriate for the base repr, since we don't know if the repr is code? |
again no i don't agree, we have not done that elsewhere and therefore this is a completely breaking change. from a user perspective |
We haven't defined a repr for objects with arbitrary constructors though. This is different. |
sure we already have this repr for example, so i would assert this is NOT different.
|
We defined |
Triggered by the discussion I am having in #22511, I was also thinking we should look at our EAs reprs.
Currently we have for IntegerArray:
which looks like code, but actually is not valid code (because the constructor needs an array, has no dtype argument, and nan is actually no defined).
So also here (since here we still have the freedom to choose something without the concern of changing something), I think we should have some discussion about what we ideally want.
The text was updated successfully, but these errors were encountered: