You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I personally feel there is a bit of mess in the way to select elements in a Series :-)
The general idea is that .iloc and .loc have consistent behaviour for respectively demanding a position-based or a index(label)-based value, but are a bit slower than .ix and using directly [] which behaviour is not always consistent.
But I found these methods a bit inconsistent, also in terms of what to return if the labels are not found or the required position are out or range in the looked-up Series.
I compiled the following tables, that summarises the behaviour of these 4 methods of lookup depending (a) if the Series to look-up has an integer or a string index (I do not consider for the moment the date index), (b) if the required data is a single element, a slice index or a list (yes, the behaviour change!) and (c) if the index is found or not in the data.
The following tables works with pandas 0.17.1, NumPy 1.10.4, Python 3.4.3.
Case 1: Series with Integer index
s = pd.Series(np.arange(100,105), index=np.arange(10,15))
s
10 100
11 101
12 102
13 103
14 104
As you can see there are several inconsistencies, some of them even using .iloc and .loc.
The event of not founding the elements/indexing out of range is managed in three different ways: an exception is thrown, a null Series is returned or a Series with the demanded keys associated to NaN values is returned. For example s.loc['f':'h'] returns an Empty Series when s.loc[['f','h']] returns instead a KeyError. There should be a single way to handle missing elements, and eventually an optional parameter should say what to do when missing elements are encountered.
When using slicers, if the lookup is by position, the end element is excluded, but when the lookup is by label the final element is included!
.ix is redundant. There should be .iloc[] and .loc[] to have a guaranteed query by position and label respectively, and a faster way with a more complicated logic (but still well documented) when performance is a priority. s[] is just quicker to type than s.ix[], so for me the latter method is redundant.
The text was updated successfully, but these errors were encountered:
Haven't had a chance to read yours closely yet. Is there anything different from #9595? Actually, do you mind closing this issue and moving your post there to avoid fragmenting the discussion? I'll followup there later.
Hi, thanks for the quick response. #9595 refers in detail to the []/.ix issue, while my post is a bit more general, but yes, I guess they are strictly related, so feel free to merge them... thank you..
@sylvaticus virtually everything you showed is well-documented and expected, IOW,
.iloc does not include the right bound as its a positional indexer
.loc DOES include the right bound as its a label based indexer
.ix remains (and is not deprecated) mainly because it offers some slightly syntactic convenience on multi-axis indexing (IOW you can do combined label and positional indexing on different axes)
There is an issue somewhere where we discuss the handling of missing indexers when using a list-like (and whether you should raise or reindex-like when). At some point I think we need an option for this, e.g.
.loc(errors='raise' or 'ignore')[....], to handle both of these cases, which are both valid and used.
as far as performance. Well correctness matters first. You should not be using these indexers repeatedly in a loop. If you are then its a user error.
Hello, I personally feel there is a bit of mess in the way to select elements in a Series :-)
The general idea is that .iloc and .loc have consistent behaviour for respectively demanding a position-based or a index(label)-based value, but are a bit slower than .ix and using directly [] which behaviour is not always consistent.
But I found these methods a bit inconsistent, also in terms of what to return if the labels are not found or the required position are out or range in the looked-up Series.
I compiled the following tables, that summarises the behaviour of these 4 methods of lookup depending (a) if the Series to look-up has an integer or a string index (I do not consider for the moment the date index), (b) if the required data is a single element, a slice index or a list (yes, the behaviour change!) and (c) if the index is found or not in the data.
The following tables works with pandas 0.17.1, NumPy 1.10.4, Python 3.4.3.
Case 1: Series with Integer index
Case 2: Series with string index
As you can see there are several inconsistencies, some of them even using .iloc and .loc.
The event of not founding the elements/indexing out of range is managed in three different ways: an exception is thrown, a null Series is returned or a Series with the demanded keys associated to NaN values is returned. For example s.loc['f':'h'] returns an Empty Series when s.loc[['f','h']] returns instead a KeyError. There should be a single way to handle missing elements, and eventually an optional parameter should say what to do when missing elements are encountered.
When using slicers, if the lookup is by position, the end element is excluded, but when the lookup is by label the final element is included!
.ix is redundant. There should be .iloc[] and .loc[] to have a guaranteed query by position and label respectively, and a faster way with a more complicated logic (but still well documented) when performance is a priority. s[] is just quicker to type than s.ix[], so for me the latter method is redundant.
The text was updated successfully, but these errors were encountered: