ENH: merge_asof() has left_index/right_index and left_by/right_by (#14253) #14531

chrisaycock · 2016-10-28T17:43:02Z

closes Add left_by/right_by and left_index/right_index to merge_asof() #14253
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

chrisaycock · 2016-10-28T17:44:38Z

I'll be the first to admit that there is some kludge in here, identified by comments in the source code. There are some code paths that can only be hit during a pd.merge_asof() when left_index or right_index is set, so no existing functionality will be affected.

I think for pandas 2.0 we'll need to take a more holistic view of how to do merges.

codecov-io · 2016-10-28T21:58:43Z

Current coverage is 85.20% (diff: 87.23%)

No coverage report found for master at 3552dc0.

Powered by Codecov. Last update 3552dc0...acad843

jreback · 2016-10-31T12:50:38Z

pandas/tools/merge.py

+                            join_names.append(rk)
+                        else:
+                            # kludge for merge_asof(right_index=True)
+                            right_keys.append(right.index.values)


this is not safe as it can cause dtype conversions

Which part? The .index.values or the .append()?

jreback · 2016-10-31T12:51:22Z

pandas/tools/merge.py

@@ -1008,7 +1072,7 @@ def _get_merge_keys(self):
        # validate tolerance; must be a Timedelta if we have a DTI
        if self.tolerance is not None:

-            lt = left_join_keys[self.left_on.index(self._asof_key)]


The _asof_key is defined to be self.left_on[-1]. But since left_on is empty when left_index is set, this line is invalid. Really we only need the time key, which is always at the end of left_join_keys.

jreback · 2016-10-31T12:51:38Z

pandas/tools/tests/test_merge_asof.py

@@ -117,6 +117,75 @@ def test_basic_categorical(self):
                            by='ticker')
        assert_frame_equal(result, expected)

+    def test_basic_left_index(self):
+


can you add the github issue as a comment

jreback · 2016-10-31T12:51:56Z

pandas/tools/merge.py

            raise MergeError("can only asof on a key for right")

+        if self.left_index and isinstance(self.left.index, MultiIndex):


can you tests for these conditions that are errors?

chrisaycock · 2016-11-07T16:35:44Z

Hmm, AppVeyor got a syntax error for a line that isn't mine:

from pip._vendor. import string_types

Seems like everyone is failing with this. Filed issue #14603.

chrisaycock · 2016-11-15T18:36:34Z

@jreback Would this change be more appropriate for 0.19.2 or 0.20.0?

jreback

looks pretty good. need some more tests coverage and see if you can remove the need to say kludge :) (IOW see if you can remove the kludge)

jreback · 2016-11-15T18:39:09Z

doc/source/whatsnew/v0.19.2.txt

+^^^^^^^^^^^^^^^^^^
+
+- ``pd.merge_asof()`` can take ``left_index``/``right_index`` and ``left_by``/``right_by`` (:issue:`14253`)
+


can take -> gained, and say arguments at the end

jreback · 2016-11-15T18:41:14Z

pandas/tools/merge.py

+                            right_keys.append(right[rk]._values)
+                        else:
+                            # kludge for merge_asof(right_index=True)
+                            right_keys.append(right.index.values)


this here, we don't like doing things like right.index.values, as that can cause dtype conversions. simply append right.index should work.

jreback · 2016-11-15T18:42:05Z

pandas/tools/merge.py

+                        left_keys.append(left[lk]._values)
+                        join_names.append(lk)
+                    else:
+                        # kludge for merge_asof(left_index=True)


why is this a kludge? any way to make this less kludgy?

This is unique to pd.merge_asof() because on and by are separate parameters. So a user could, for example, request left_index and left_by.

In a regular pd.merge(), users cannot specify both left_index and left_on. (Instead, users have a MultiIndex). That means the left_on in _get_merge_keys() is always empty in a pd.merge(), but a pd.merge_asof(left_index=True, left_by=...) will result in a left_on array with a None in the middle of it. See _validate_specification() for where this happens.

I put the kludge in there to handle this case. So perhaps it isn't a kludge at all and I should just put the above explanation at the top of the function. What do you think?

ok, so its simpler to handle all of the cases here (in basic merge) then?

I think for now it's simpler. It would take a massive overhaul of how generic merge works to fix this.

I'll add my notes as a comment to the function and resubmit.

jreback · 2016-11-15T18:43:12Z

pandas/tools/merge.py

+
+        if self.right_index and isinstance(self.right.index, MultiIndex):
+            raise MergeError("right can only have one index")
+


need to add tests for each of these error conditions

These tests already exist:

https://github.com/pandas-dev/pandas/pull/14531/files#diff-e00646757b932a2684fda4588f99009cR163

jreback · 2016-11-15T18:43:27Z

pandas/tools/merge.py

+        Field name to group by in the right DataFrame.
+
+        .. versionadded:: 0.19.2
+
    suffixes : 2-length sequence (tuple, list, ...)
        Suffix to apply to overlapping column names in the left and right


can you add some examples of using these parameters?

jreback · 2016-11-15T19:38:21Z

Would this change be more appropriate for 0.19.2 or 0.20.0?

There is a tiny API change (in that some of the positional args to pd.merge_asof changed) because adding new ones, but think that is ok as we introduced this in 0.19.0, though to be conservative we can defer to 0.20.0

jreback · 2016-11-18T11:51:20Z

@jorisvandenbossche

chrisaycock · 2016-11-21T18:08:54Z

@jreback @jorisvandenbossche It's all green finally.

jorisvandenbossche · 2016-12-14T16:12:37Z

@chrisaycock Thanks!

think that is ok as we introduced this in 0.19.0, though to be conservative we can defer to 0.20.0

I personally have a slight preference to keep this for 0.20.0, but as this was only introduced in 0.19.0, it's not a breakpoint for me

…ndas-dev#14253) (pandas-dev#14531)

…ndas-dev#14253) (pandas-dev#14531) (cherry picked from commit 84cad61)

Version 0.19.2 * tag 'v0.19.2': (78 commits) RLS: v0.19.2 DOC: update release notes for 0.19.2 TST: skip gbq upload test as flakey DOC: clean-up v0.19.2 whatsnew DOC: update Pandas Cheat Sheet (GH13202) DOC: Pandas Cheat Sheet TST: matplotlib 2.0 fix in log limits for barplot (GH14808) (pandas-dev#14957) flake8 fix import Remove test - from 0.20.0 PR slipped in PERF: fix getitem unique_check / initialization issue cache and remove boxing (pandas-dev#14931) CLN: Resubmit of GH14700. Fixes GH14554. Errors other than Indexing… Clean up construction of Series with dictionary and datetime index BUG: .fillna() for datetime64 with tz is passing thru floats BUG: Patch read_csv NA values behaviour ENH: merge_asof() has type specializations and can take multiple 'by' parameters (pandas-dev#13936) [Backport pandas-dev#14886] BUG: regression in DataFrame.combine_first with integer columns (GH14687) (pandas-dev#14886) Fixed KDE Plot to drop the missing values (pandas-dev#14820) ENH: merge_asof() has left_index/right_index and left_by/right_by (pandas-dev#14253) (pandas-dev#14531) TST: correct url for test file on s3 (xref pandas-dev#14587) ...

* releases: (78 commits) RLS: v0.19.2 DOC: update release notes for 0.19.2 TST: skip gbq upload test as flakey DOC: clean-up v0.19.2 whatsnew DOC: update Pandas Cheat Sheet (GH13202) DOC: Pandas Cheat Sheet TST: matplotlib 2.0 fix in log limits for barplot (GH14808) (pandas-dev#14957) flake8 fix import Remove test - from 0.20.0 PR slipped in PERF: fix getitem unique_check / initialization issue cache and remove boxing (pandas-dev#14931) CLN: Resubmit of GH14700. Fixes GH14554. Errors other than Indexing… Clean up construction of Series with dictionary and datetime index BUG: .fillna() for datetime64 with tz is passing thru floats BUG: Patch read_csv NA values behaviour ENH: merge_asof() has type specializations and can take multiple 'by' parameters (pandas-dev#13936) [Backport pandas-dev#14886] BUG: regression in DataFrame.combine_first with integer columns (GH14687) (pandas-dev#14886) Fixed KDE Plot to drop the missing values (pandas-dev#14820) ENH: merge_asof() has left_index/right_index and left_by/right_by (pandas-dev#14253) (pandas-dev#14531) TST: correct url for test file on s3 (xref pandas-dev#14587) ...

ENH: merge_asof() has left_index/right_index and left_by/right_by (#1…

f6f87b6

…4253)

chrisaycock mentioned this pull request Oct 28, 2016

ENH: merge_asof() has left_index/right_index and left_by/right_by (#14253) #14426

Closed

4 tasks

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design labels Oct 31, 2016

jreback reviewed Oct 31, 2016

View reviewed changes

Christopher C. Aycock added 3 commits October 31, 2016 15:53

Added tests for invalid left_index and left_on

9c4f718

Merge branch 'master' into GH14253

f3bed99

Assumes release as v0.20.0

56fe4c4

Christopher C. Aycock added 2 commits November 12, 2016 12:33

Merged master branch

e8416b1

Now assumes release as v0.19.2

243d8ed

jreback requested changes Nov 15, 2016

View reviewed changes

A few more changes in response to code review

2cd3f41

jreback approved these changes Nov 18, 2016

View reviewed changes

Christopher C. Aycock added 2 commits November 18, 2016 09:53

Explanation of required work-around for left_index with left_by

26cca27

Fixed lint error

acad843

jreback added this to the 0.19.2 milestone Nov 22, 2016

jorisvandenbossche merged commit 84cad61 into pandas-dev:master Dec 14, 2016

chrisaycock deleted the GH14253 branch December 14, 2016 16:48

ischurov pushed a commit to ischurov/pandas that referenced this pull request Dec 19, 2016

ENH: merge_asof() has left_index/right_index and left_by/right_by (pa…

82000f7

…ndas-dev#14253) (pandas-dev#14531)

jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Dec 24, 2016

ENH: merge_asof() has left_index/right_index and left_by/right_by (pa…

9a6a78f

…ndas-dev#14253) (pandas-dev#14531) (cherry picked from commit 84cad61)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: merge_asof() has left_index/right_index and left_by/right_by (#14253) #14531

ENH: merge_asof() has left_index/right_index and left_by/right_by (#14253) #14531

chrisaycock commented Oct 28, 2016

chrisaycock commented Oct 28, 2016

codecov-io commented Oct 28, 2016 •

edited

Loading

jreback Oct 31, 2016

chrisaycock Oct 31, 2016

jreback Oct 31, 2016

chrisaycock Oct 31, 2016

jreback Oct 31, 2016

jreback Oct 31, 2016

chrisaycock commented Nov 7, 2016 •

edited

Loading

chrisaycock commented Nov 15, 2016

jreback left a comment

jreback Nov 15, 2016

jreback Nov 15, 2016

jreback Nov 15, 2016

chrisaycock Nov 17, 2016

jreback Nov 18, 2016

chrisaycock Nov 18, 2016

jreback Nov 15, 2016

chrisaycock Nov 17, 2016

jreback Nov 15, 2016

jreback commented Nov 15, 2016

jreback commented Nov 18, 2016

chrisaycock commented Nov 21, 2016

jorisvandenbossche commented Dec 14, 2016

		raise MergeError("can only asof on a key for right")

		if self.left_index and isinstance(self.left.index, MultiIndex):

		^^^^^^^^^^^^^^^^^^

		- ``pd.merge_asof()`` can take ``left_index``/``right_index`` and ``left_by``/``right_by`` (:issue:`14253`)


		if self.right_index and isinstance(self.right.index, MultiIndex):
		raise MergeError("right can only have one index")

ENH: merge_asof() has left_index/right_index and left_by/right_by (#14253) #14531

ENH: merge_asof() has left_index/right_index and left_by/right_by (#14253) #14531

Conversation

chrisaycock commented Oct 28, 2016

chrisaycock commented Oct 28, 2016

codecov-io commented Oct 28, 2016 • edited Loading

Current coverage is 85.20% (diff: 87.23%)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisaycock commented Nov 7, 2016 • edited Loading

chrisaycock commented Nov 15, 2016

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Nov 15, 2016

jreback commented Nov 18, 2016

chrisaycock commented Nov 21, 2016

jorisvandenbossche commented Dec 14, 2016

codecov-io commented Oct 28, 2016 •

edited

Loading

chrisaycock commented Nov 7, 2016 •

edited

Loading