[WIP] Imprecise indexer #22043

WeatherGod · 2018-07-24T20:47:46Z

This is a work-in-progress to add a tolerance attribute to the Index class, and to plumb its use throughout the Index machinery. My immediate goal at this point is to not break anything. There is still a lot more work to do before this is ready for prime time, but hopefully I can get some inputs on best practices for mucking about in such deep internals of pandas.

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2018-07-24T20:47:55Z

Hello @WeatherGod! Thanks for updating the PR.

In the file pandas/tests/indexes/test_base.py, following are the PEP8 issues :

Line 1267:9: E265 block comment should start with '# '
Line 1411:5: E265 block comment should start with '# '

Comment last updated on July 25, 2018 at 21:19 Hours UTC

WillAyd · 2018-07-24T22:12:50Z

Is this in reference to any particular issue or discussion?

jreback · 2018-07-24T22:35:11Z

you can already do this with reindex

what is the usecase?

shoyer · 2018-07-24T23:24:20Z

The most relevant issues are probably #9530 and #9817, as well as pydata/xarray#2217 downstream.

The use case here is the ability to make indexes that always do alignment using a tolerance. Pandas' current automatic alignment is not so useful when using floating point indexes, because that alignment is done without any consideration of near matches.

WeatherGod · 2018-07-25T00:18:17Z

Right, to summarize a bit, quite often with float64 indexes, you have two indexes which logically have similar keys, but because they were computed slightly differently, or came from different sources, they aren't binary identical.

I first tried implementing this just within Float64Index, but quickly ran into issues where I needed support implemented within the base class. Of course, once that happened, well, you need to implement a lot of this up into the other classes as well.

The basic premise of the design is that explicit will still always override implicit (which would be the tolerance attribute), which is why tolerance was added as an argument to many of the set operations. Also, any resulting indexes from these operations will have the tolerance that was used be set for its own tolerance attribute.

* took care of wrappers in datetimes and interval * fix tolerance handling in extended dtype index construction * fix unpickling of old pickles and a bug in numeric index unpickling * fix tolerance for constructor delegation in `__new__`.

WeatherGod · 2018-08-28T01:41:49Z

My employeer has changed priorities for me, so I have been unable to pursue this work any further, and I don't foresee any free time to spend on this. I hope someone else can take this work further, even if it is just going through and adding documentation.

The other major effort needed in this PR is to update the cython helpers for tolerance support, and unit tests.

jreback · 2018-11-23T03:22:27Z

nice idea. PR is stale. if you'd like to continue, pls ping.

WeatherGod mentioned this pull request Jul 24, 2018

tolerance for alignment pydata/xarray#2217

Open

WeatherGod added 2 commits July 24, 2018 21:18

Add 'tolerance' attribute to much of Index internals

18d89a3

Fixing several problems revealed by CI

9701987

WeatherGod force-pushed the imprecise_indexer branch from a48b7d0 to 9701987 Compare July 25, 2018 02:14

WeatherGod added 3 commits July 25, 2018 10:41

Fix a typo and did some rearranging

67e476f

More plumbing

623038e

More plumbing and fixing

c3e583c

* took care of wrappers in datetimes and interval * fix tolerance handling in extended dtype index construction * fix unpickling of old pickles and a bug in numeric index unpickling * fix tolerance for constructor delegation in `__new__`.

WeatherGod force-pushed the imprecise_indexer branch from 91fdf83 to c3e583c Compare July 25, 2018 21:19

gfyoung added Enhancement Indexing Related to indexing on series/frames, not to indexes themselves labels Jul 25, 2018

jreback closed this Nov 23, 2018

davidbrochart mentioned this pull request Jan 3, 2019

DataArray concat with tolerance pydata/xarray#2644

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Imprecise indexer #22043

[WIP] Imprecise indexer #22043

WeatherGod commented Jul 24, 2018

pep8speaks commented Jul 24, 2018 •

edited

Loading

WillAyd commented Jul 24, 2018

jreback commented Jul 24, 2018 •

edited

Loading

shoyer commented Jul 24, 2018

WeatherGod commented Jul 25, 2018

WeatherGod commented Aug 28, 2018

jreback commented Nov 23, 2018

[WIP] Imprecise indexer #22043

[WIP] Imprecise indexer #22043

Conversation

WeatherGod commented Jul 24, 2018

pep8speaks commented Jul 24, 2018 • edited Loading

Comment last updated on July 25, 2018 at 21:19 Hours UTC

WillAyd commented Jul 24, 2018

jreback commented Jul 24, 2018 • edited Loading

shoyer commented Jul 24, 2018

WeatherGod commented Jul 25, 2018

WeatherGod commented Aug 28, 2018

jreback commented Nov 23, 2018

pep8speaks commented Jul 24, 2018 •

edited

Loading

jreback commented Jul 24, 2018 •

edited

Loading