-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: correlation function accepts method being a callable #22684
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -766,6 +766,8 @@ def nancorr(a, b, method='pearson', min_periods=None): | |
def get_corr_func(method): | ||
if method in ['kendall', 'spearman']: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. elif There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
from scipy.stats import kendalltau, spearmanr | ||
elif callable(method): | ||
return method | ||
|
||
def _pearson(a, b): | ||
return np.corrcoef(a, b)[0, 1] | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1910,23 +1910,37 @@ def corr(self, other, method='pearson', min_periods=None): | |
Parameters | ||
---------- | ||
other : Series | ||
method : {'pearson', 'kendall', 'spearman'} | ||
method : {'pearson', 'kendall', 'spearman'} or callable | ||
* pearson : standard correlation coefficient | ||
* kendall : Kendall Tau correlation coefficient | ||
* spearman : Spearman rank correlation | ||
* callable: callable with input two 1d ndarray | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure how to doc-string this signature here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should I just leave it as is? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably fine as is. The type would be |
||
and returning a float | ||
.. versionadded:: 0.24.0 | ||
min_periods : int, optional | ||
Minimum number of observations needed to have a valid result | ||
Returns | ||
------- | ||
correlation : float | ||
Examples | ||
-------- | ||
>>> import numpy as np | ||
>>> histogram_intersection = lambda a, b: np.minimum(a, b | ||
... ).sum().round(decimals=1) | ||
>>> s1 = pd.Series([.2, .0, .6, .2]) | ||
>>> s2 = pd.Series([.3, .6, .0, .1]) | ||
>>> s1.corr(s2, method=histogram_intersection) | ||
0.3 | ||
""" | ||
this, other = self.align(other, join='inner', copy=False) | ||
if len(this) == 0: | ||
return np.nan | ||
|
||
if method in ['pearson', 'spearman', 'kendall']: | ||
if method in ['pearson', 'spearman', 'kendall'] or callable(method): | ||
return nanops.nancorr(this.values, other.values, method=method, | ||
min_periods=min_periods) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -789,6 +789,38 @@ def test_corr_invalid_method(self): | |
with tm.assert_raises_regex(ValueError, msg): | ||
s1.corr(s2, method="____") | ||
|
||
def test_corr_callable_method(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm I care less about testing this exact way of computing the correlation, and more about ensure that the method is dispatched to. Would it be possible to define a very simple "correlation" function that just returns something like the index of the columns? So the correlation of the nth or and mth column would be like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok will re-write the test tomorrow There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The test now includes a simpler correlation function. It is not possible to identify the nth/mth column as in your example because the correlation function itself does not know about the dataframe as a whole but only as each series on its own. The correlation function I chose is a simple |
||
# simple correlation example | ||
# returns 1 if exact equality, 0 otherwise | ||
my_corr = lambda a, b: 1. if (a == b).all() else 0. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you use result= and expected= here, rather than expected_1 and such. its much easier to follow There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
# simple example | ||
s1 = Series([1, 2, 3, 4, 5]) | ||
s2 = Series([5, 4, 3, 2, 1]) | ||
expected = 0 | ||
tm.assert_almost_equal( | ||
s1.corr(s2, method=my_corr), | ||
expected) | ||
|
||
# full overlap | ||
tm.assert_almost_equal( | ||
self.ts.corr(self.ts, method=my_corr), 1.) | ||
|
||
# partial overlap | ||
tm.assert_almost_equal( | ||
self.ts[:15].corr(self.ts[5:], method=my_corr), 1.) | ||
|
||
# No overlap | ||
assert np.isnan( | ||
self.ts[::2].corr(self.ts[1::2], method=my_corr)) | ||
|
||
# dataframe example | ||
df = pd.DataFrame([s1, s2]) | ||
expected = pd.DataFrame([ | ||
{0: 1., 1: 0}, {0: 0, 1: 1.}]) | ||
tm.assert_almost_equal( | ||
df.transpose().corr(method=my_corr), expected) | ||
|
||
def test_cov(self): | ||
# full overlap | ||
tm.assert_almost_equal(self.ts.cov(self.ts), self.ts.std() ** 2) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an example in Examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done