-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2x~5x speed up for isel() in most cases #3533
Conversation
@shoyer I struggle to understand the meaning of {
k: v.to_index()
for k, v in self._variables.items()
if isinstance(v, IndexVariable)
} ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing! Great idea!
Dataset.indexes is currently equivalent to just calling to_index() on each appropriate variable. But eventually we want to separate the notion of an index from variables with names matching dimensions, and this is a prerequisite for that. |
I would very much appreciate if everybody could play around with this PR in their projects and confirm that nothing breaks! |
Merging soon if nobody speaks up! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested this but the logic here looks sound.
Implemented @shoyer 's suggestions and aligned to master. |
In it goes *ducks for cover* |
=) thanks @crusaderky |
* upstream/master: Fix map_blocks HLG layering (pydata#3598) Silence sphinx warnings: Round 2 (pydata#3592) 2x~5x speed up for isel() in most cases (pydata#3533) remove xarray again (pydata#3591) fix plotting with transposed nondim coords. (pydata#3441) make coarsen reductions consistent with reductions on other classes (pydata#3500) Resolve the version issues on RTD (pydata#3589) Add bottleneck & rasterio git tip to upstream-dev CI (pydata#3585)
…equiv * 'master' of github.com:pydata/xarray: (28 commits) Add nanmedian for dask arrays (pydata#3604) added pyinterp to related projects (pydata#3655) Allow incomplete hypercubes in combine_by_coords (pydata#3649) concat keeps attrs from first variable. (pydata#3637) Extend DatetimeAccessor properties and support `.dt` accessor for Timedelta (pydata#3612) update readthedocs.yml (pydata#3639) silence sphinx warnings round 3 (pydata#3602) Fix/quantile wrong errmsg (pydata#3635) Provide shape info in shape mismatch error. (pydata#3619) Minor doc fixes (pydata#3615) Respect user-specified coordinates attribute. (pydata#3487) Add Facetgrid.row_labels & Facetgrid.col_labels (pydata#3597) Fix pint integration tests (pydata#3600) Minor fix to combine_by_coords to allow for the combination of CFTimeIndexes separated by large time intervals (pydata#3543) Fix map_blocks HLG layering (pydata#3598) Silence sphinx warnings: Round 2 (pydata#3592) 2x~5x speed up for isel() in most cases (pydata#3533) remove xarray again (pydata#3591) fix plotting with transposed nondim coords. (pydata#3441) make coarsen reductions consistent with reductions on other classes (pydata#3500) ...
Yet another major improvement for #2799.
Achieve a 2x to 5x boost in isel performance when slicing small arrays by int, slice, list of int, scalar ndarray, or 1-dimensional ndarray.
Marked as WIP because this commands running the asv suite to verify there are no regressions for large arrays.
(on a separate note, we really need to add the small size cases to asv - as discussed in #3382).
This profoundly alters one of the most important methods in xarray and I must confess it makes me nervous, particularly as I am unsure if the test coverage of DataArray.isel() is as through as that for Dataset.isel().