Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Numerically stable Cython functions roll_cov and roll_corr #8326

Closed
wants to merge 3 commits into from

Conversation

jaimefrio
Copy link
Contributor

No description provided.

@jaimefrio
Copy link
Contributor Author

This is incomplete, and requires making changes to rolling_cov and rolling_corr to implement them using these new functions.

@jaimefrio
Copy link
Contributor Author

Some examples and timings that I think will show the benefits of having three different Cython implementations:

In [1]: from pandas.algos import roll_corr, roll_cov, roll_var
In [2]: x = np.random.rand(1000000)
In [3]: y = np.random.rand(1000000)

It produces the right results in the general case, and exactly 1.0 when the sequences are identical:

In [4]: np.allclose(roll_corr(x, y, 100, 1)[1:],
   ...:             roll_cov(x, y, 100, 1)[1:] / np.sqrt(roll_var(x, 100, 1)[1:] * roll_var(y, 100, 1)[1:]))
Out[4]: True

In [5]: np.all(roll_corr(x, x, 100, 1)[1:] == 1)
Out[5]: True

roll_var(x) is almost 1.5x faster than roll_cov(x, x), which I think justifies keeping it around:

In [6]: %timeit roll_var(x, 100, 1)
10 loops, best of 3: 23.7 ms per loop

In [7]: %timeit roll_cov(x, x, 100, 1)
10 loops, best of 3: 35.2 ms per loop

In [8]: %timeit roll_cov(x, y, 100, 1)
10 loops, best of 3: 37.5 ms per loop

And the roll_corr implementation is about 2x faster than implementing it in terms of roll_cov and roll_var, which would also force to make those slower:

In [9]: %timeit roll_corr(x, y, 100, 1)
10 loops, best of 3: 56.8 ms per loop

In [10]: %timeit roll_corr(x, x, 100, 1)
10 loops, best of 3: 42.4 ms per loop

In [11]: %timeit roll_cov(x, x, 100, 1) / np.sqrt(roll_var(x, 100, 1) * roll_var(y, 100, 1))
10 loops, best of 3: 116 ms per loop

Uses a Welford-like method for general computations, and includes
detection of exact zeros in the denominator and of exactly identical
sequences.
@jaimefrio
Copy link
Contributor Author

Lastly, the failing tests have to do with what I think are wrong expectations: roll_corr should be able to detect NaNs in places where roll_cov / (roll_std * roll_std) may very well not, and that should not be considered an erroneous behavior.

@jreback
Copy link
Contributor

jreback commented Sep 19, 2014

you may want to add the identities that you have above as tests as well

@seth-p
Copy link
Contributor

seth-p commented Sep 19, 2014

I think tests of those identities (and more) are already in _test_moments_consistency(), which is called by test_ewm_consistency(), test_expanding_consistency(), and test_rolling_consistency()).

@jaimefrio
Copy link
Contributor Author

This would force relaxing some of those tests, though, because the build is going to fail...

@seth-p
Copy link
Contributor

seth-p commented Sep 19, 2014

Which tests will fail? Note that the consistency checks only apply when x and y have the same pattern of NaN values:

                        if not x.isnull().equals(y.isnull()):
                            # can only easily test two Series with similar structure
                            continue

@jaimefrio
Copy link
Contributor Author

It's a size 3 sliding window going over two consecutive NaNs, e.g. [3, NaN, NaN, 4], when going from [3, NaN, NaN] to [NaN, NaN, 4], with minp = 0 and ddof = 0, the resulting variance is not exactly zero, and a NaN is not produced.

I could put in an if nobs <= 1: ssqdm_x = 0 and be done with it...

@seth-p
Copy link
Contributor

seth-p commented Sep 19, 2014

Ah. Another way to deal with it is to first deal with the value exiting the window (so in this case there would be no values in the window) before dealing with the value entering the window.

On Sep 19, 2014, at 6:47 PM, Jaime notifications@github.com wrote:

It's a size 3 sliding window going over two consecutive NaNs, e.g. [3, NaN, NaN, 4], when going from [3, NaN, NaN] to [NaN, NaN, 4], with minp = 0 and ddof = 0, the resulting variance is not exactly zero, and a NaN is not produced.

I could put in an if nobs <= 1: ssqdm_x = 0 and be done with it...


Reply to this email directly or view it on GitHub.

@seth-p
Copy link
Contributor

seth-p commented Sep 19, 2014

It looks like roll_corr() checks for repeated values, but roll_cov() and roll_var() do not. Any particular reason for that?

Putting aside for a moment the checks for repeated values, here are a couple questions:

  1. Is it worth having separate roll_var and roll_cov?
  2. Is it worth having separate roll_corr and roll_cov?

My thoughts:

  1. My gut tells me that there should just be roll_cov, and that it can be implemented in such a way that the time penalty for roll_cov(x, x) over a dedicated roll_var(x) is negligible, or at least tolerable. Easy for me to say...
  2. It's a good observation that a dedicated single-pass roll_corr(x, y) is much more efficient than roll_cov(x, y) / sqrt(roll_var(x) * roll_var(y)). Note that currently the code does the latter, so we're not talking about slowing things down (beyond point 1). Perhaps the cleanest solution is to have a single roll_cov_corr(x, y) that depending on a flag calculates either the covariance or the correlation? Am not sure what the time penalty for that would be (would it need an if inside the loop?).

Perhaps it's wishful thinking, but my inclination is to try to implement rolling_var, rolling_cov, and rolling_corr all in terms of a single single-pass roll_cov_corr -- which implements checks for all-constant series. Am I being too optimistic / delusional?

Note: As I mentioned previously, I think the code would be simpler (though perhaps not faster) if it first handled the old value (if any) exiting the window and then handled the new (if any) entering the window, rather than trying to handle all four combinations simultaneously.

@jreback jreback added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Numeric Operations Arithmetic, Comparison, and Logical operations labels Sep 19, 2014
@jreback jreback added this to the 0.15.1 milestone Sep 19, 2014
@jaimefrio
Copy link
Contributor Author

I have a sentimental attachment to the math for the case where one observation comes in and another goes out: you don't find it in the books or in wikipedia, so I had to work it out myself. But of course this is not about not hurting my feelings, it does have its merits...

The basic code for removing an observation prev is:

nobs -= 1
delta = prev - mean_x
mean_x -= delta / nobs
ssqdm_x -= delta *(prev - mean_x)

To add an observation requires:

nobs += 1
delta = val - mean_x
mean_x += delta / nobs
ssqdm_x += delta * (val - mean_x)

If you do both in a single step, it is something like

delta = val - prev
prev -= mean_x
mean_x += delta / nobs
val -= mean_x
ssqdm_x += (val + prev) * delta

I think the key thing here is that, when doing both together, you only have to do one division, which is the expensive operation. I just put together a simple_roll_var Cython function implementing your idea and did some timings:

In [2]: from pandas.algos import roll_var, simple_roll_var

In [3]: x = np.random.rand(1000000)
In [6]: %timeit roll_var(x, 100, 1)
10 loops, best of 3: 26.4 ms per loop
In [7]: %timeit simple_roll_var(x, 100, 1)
10 loops, best of 3: 34.7 ms per loop

In [11]: x = np.random.rand(1000)
In [12]: %timeit roll_var(x, 10, 1)
10000 loops, best of 3: 26.3 µs per loop
In [13]: %timeit simple_roll_var(x, 100, 1)
10000 loops, best of 3: 37.5 µs per loop

So there seems to be a 50% penalty to pay for not doing the "remove first, then add" case in a single shot. The timings I posted before kind of show a similar slow down for doing roll_var with roll_cov. Throw in checks for exact zeros, and you are looking at making roll_var something like 2x slower than the best it can be.

The "remove first, then add" strategy also seems to be a little less numerically stable, although the practical implications of that are negligible.

One last problem of implementing roll_var in terms of roll_cov is that you can no longer do the if ssqdm_x <0: ssqdm_x = 0.

I guess I really like the idea of roll_var staying, and its deal-with-all-cases separately approach not changing. ;-)

I am not directly opposed to it detecting exact zeros, but I think it is worth understandingwhy do we want that behavior, if roll_corr is going to handle it separately.

LAstly. I am much more open to implementing a monstrous roll_cov_corr that handles both functions, even if that makes roll_cov slower than it could be.

@seth-p
Copy link
Contributor

seth-p commented Sep 20, 2014

Hmm. Interesting.

On Sep 19, 2014, at 9:13 PM, Jaime notifications@github.com wrote:

I have a sentimental attachment to the math for the case where one observation comes in and another goes out: you don't find it in the books or in wikipedia, so I had to work it out myself. But of course this is not about not hurting my feelings, it does have its merits...

The basic code for removing an observation prev is:

nobs -= 1
delta = prev - mean_x
mean_x -= delta / nobs
ssqdm_x -= delta *(prev - mean_x)
To add an observation requires:

nobs += 1
delta = val - mean_x
mean_x += delta / nobs
ssqdm_x += delta * (val - mean_x)
If you do both in a single step, it is something like

delta = val - prev
prev -= mean_x
mean_x += delta / nobs
val -= mean_x
ssqdm_x += (val + prev) * delta
I think the key thing here is that, when doing both together, you only have to do one division, which is the expensive operation. I just put together a simple_roll_var Cython function implementing your idea and did some timings:

In [2]: from pandas.algos import roll_var, simple_roll_var

In [3]: x = np.random.rand(1000000)
In [6]: %timeit roll_var(x, 100, 1)
10 loops, best of 3: 26.4 ms per loop
In [7]: %timeit simple_roll_var(x, 100, 1)
10 loops, best of 3: 34.7 ms per loop

In [11]: x = np.random.rand(1000)
In [12]: %timeit roll_var(x, 10, 1)
10000 loops, best of 3: 26.3 µs per loop
In [13]: %timeit simple_roll_var(x, 100, 1)
10000 loops, best of 3: 37.5 µs per loop
So there seems to be a 50% penalty to pay for not doing the "remove first, then add" case in a single shot. The timings I posted before kind of show a similar slow down for doing roll_var with roll_cov. Throw in checks for exact zeros, and you are looking at making roll_var something like 2x slower than the best it can be.

The "remove first, then add" strategy also seems to be a little less numerically stable, although the practical implications of that are negligible.

One last problem of implementing roll_var in terms of roll_cov is that you can no longer do the if ssqdm_x <0: ssqdm_x = 0.

I guess I really like the idea of roll_var staying, and its deal-with-all-cases separately approach not changing. ;-)

I am not directly opposed to it detecting exact zeros, but I think it is worth understandingwhy do we want that behavior, if roll_corr is going to handle it separately.

LAstly. I am much more open to implementing a monstrous roll_cov_corr that handles both functions, even if that makes roll_cov slower than it could be.


Reply to this email directly or view it on GitHub.

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 2, 2015
@jreback
Copy link
Contributor

jreback commented May 9, 2015

closing pls reopen if/when updated

@jreback jreback closed this May 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants