-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: correlation function accepts method being a callable #22684
ENH: correlation function accepts method being a callable #22684
Conversation
Hello @shadiakiki1986! Thanks for updating the PR.
Comment last updated on September 26, 2018 at 02:15 Hours UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea. Does this close an issue?
We'll need a release note in 0.24.0.txt
ci/requirements-optional-pip.txt
Outdated
@@ -14,7 +14,7 @@ lxml | |||
matplotlib | |||
nbsphinx | |||
numexpr | |||
openpyxl=2.5.5 | |||
openpyxl==2.5.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is autogenerated. #22689
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok should I roll this edit back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/source/computation.rst
Outdated
.. ipython:: python | ||
|
||
# histogram intersection | ||
histogram_intersection = lambda a, b: np.minimum( np.true_divide(a, a.sum()), np.true_divide(b, b.sum())).sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this pep8 compliant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in new commit
pandas/core/frame.py
Outdated
@@ -6652,10 +6652,12 @@ def corr(self, method='pearson', min_periods=1): | |||
|
|||
Parameters | |||
---------- | |||
method : {'pearson', 'kendall', 'spearman'} | |||
method : {'pearson', 'kendall', 'spearman', callable} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'll be `}, or callable on the outside of the options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved it to outside of the options
pandas/core/frame.py
Outdated
* pearson : standard correlation coefficient | ||
* kendall : Kendall Tau correlation coefficient | ||
* spearman : Spearman rank correlation | ||
* callable: callable with input two numpy 1d-array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"callable expecting two 1d ndarrays and returning a float" (does the callable get ndarrays or series?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ndarrays
@@ -789,6 +789,41 @@ def test_corr_invalid_method(self): | |||
with tm.assert_raises_regex(ValueError, msg): | |||
s1.corr(s2, method="____") | |||
|
|||
def test_corr_callable_method(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I care less about testing this exact way of computing the correlation, and more about ensure that the method is dispatched to.
Would it be possible to define a very simple "correlation" function that just returns something like the index of the columns? So the correlation of the nth or and mth column would be like (n + m)
. Not sure if that's possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will re-write the test tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test now includes a simpler correlation function. It is not possible to identify the nth/mth column as in your example because the correlation function itself does not know about the dataframe as a whole but only as each series on its own. The correlation function I chose is a simple 1 if exact equality else 0
This doesn't fix any issue, but I added a note in the 0.24.0 release notes |
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -17,6 +17,10 @@ New features | |||
|
|||
- ``ExcelWriter`` now accepts ``mode`` as a keyword argument, enabling append to existing workbooks when using the ``openpyxl`` engine (:issue:`3441`) | |||
|
|||
|
|||
- :meth:`DataFrame.corr` and :meth:`Series.corr` now accept a callable for generic calculation methods of correlation, e.g. histogram intersection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use your PR as the issue number. Also, no new line above this sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
10dcb68
to
2e22403
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments.
* pearson : standard correlation coefficient | ||
* kendall : Kendall Tau correlation coefficient | ||
* spearman : Spearman rank correlation | ||
* callable: callable with input two 1d ndarrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an example in Examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -764,6 +764,9 @@ def nancorr(a, b, method='pearson', min_periods=None): | |||
|
|||
|
|||
def get_corr_func(method): | |||
if callable(method): | |||
return method | |||
|
|||
if method in ['kendall', 'spearman']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* pearson : standard correlation coefficient | ||
* kendall : Kendall Tau correlation coefficient | ||
* spearman : Spearman rank correlation | ||
* callable: callable with input two 1d ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how to doc-string this signature here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I just leave it as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably fine as is. The type would be Callable[[ndarray, ndarray], float]
, but I'm not sure how familiar people are with typing yet.
@TomAugspurger yep agreed |
2e22403
to
c75bc10
Compare
Codecov Report
@@ Coverage Diff @@
## master #22684 +/- ##
==========================================
+ Coverage 92.18% 92.19% +<.01%
==========================================
Files 169 169
Lines 50819 50821 +2
==========================================
+ Hits 46850 46852 +2
Misses 3969 3969
Continue to review full report at Codecov.
|
c75bc10
to
3ca092a
Compare
pandas/core/frame.py
Outdated
|
||
>>> df = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)], | ||
... columns=['dogs', 'cats']) | ||
>>> df.corr(method = histogram_intersection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is failing our doctests. Is there an issue with the output?
Also, pep8: no spaces around the =
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
pandas/core/series.py
Outdated
>>> s1 = pd.Series([1, 0, 2, 1]) | ||
>>> s2 = pd.Series([2, 3, 0, 1]) | ||
>>> s1.corr(s2, method = histogram_intersection) | ||
0.416667 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will need to round this, or write it out at full precision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
* pearson : standard correlation coefficient | ||
* kendall : Kendall Tau correlation coefficient | ||
* spearman : Spearman rank correlation | ||
* callable: callable with input two 1d ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably fine as is. The type would be Callable[[ndarray, ndarray], float]
, but I'm not sure how familiar people are with typing yet.
df06de9
to
dc87331
Compare
# simple correlation example | ||
# returns 1 if exact equality, 0 otherwise | ||
my_corr = lambda a, b: 1. if (a == b).all() else 0. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use result= and expected= here, rather than expected_1 and such. its much easier to follow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
60b3498
to
4a69a70
Compare
- other than the listed strings for the `method` argument, accept a callable for generic correlation calculations
4a69a70
to
dbfd95f
Compare
thanks @shadiakiki1986 nice change! |
method
argument, accept a callable for generic correlation calculationsgit diff upstream/master -u -- "*.py" | flake8 --diff