-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplement .polyfit()
with apply_ufunc
#5933
Conversation
.polyfit()
with apply_ufunc
.polyfit()
with apply_ufunc
.polyfit()
with apply_ufunc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
@@ -3867,11 +3867,10 @@ def polyfit( | |||
Degree of the fitting polynomial. | |||
skipna : bool, optional | |||
If True, removes all invalid values before fitting each 1D slices of the array. | |||
Default is True if data is stored in a dask.array or if there is any | |||
invalid values, False otherwise. | |||
Default is True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could do
- skipna : bool, optional
+ skipna : bool, default: True
instead.
return tuple( | ||
np.full(len(var) * [order], np.nan) for var in output_core_dims | ||
) | ||
output = np.polyfit(x, y, deg, rcond=rcond, full=full, w=w, cov=cov) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy recommends to use Polynomial.fit <numpy.polynomial.polynomial.Polynomial.fit>
class - did you consider switching to this (no requirement just a question).
output_core_dims = [("degree",)] | ||
output_vars = ["{name}polyfit_coefficients"] | ||
|
||
def _wrapper(x, y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might not be worth it, but you could avoid a if
conditional in the inner loop as follows:
def _wrapper_skipna(x, y):
...
def _wrapper_noskipna(x, y):
...
if skipna:
_wrapper = _wrapper_skipna
else:
_wrapper = _wrapper_noskipna
@slevang are you still interested to continue this PR? I think that would be a worthwhile addition and should not be too much left to do. (What would be nice, however, are tests for the issues this fixes.) |
Definitely! I got distracted is all, and @dcherian posted a nice solution in #5629 that could allow us to preserve the ability to fit along a chunked dimension using blockwise operations and the dask I'm happy to pick this back up and finish it off if there is consensus on the right way forward, but the blockwise approach seemed promising so I put this on hold. |
@slevang Yeah, the blockwise approach seems indeed nice. You're very welcome to continue with the blockwise approach in a different PR if you want to. |
Not sure I understand the blockwise approach well enough to make a PR, but maybe I'll give it a try at some point. |
I think you can use a lot of @dcherian's code as a base and then for starters see if it simply passes all the tests (including the ones you added here). If you make a draft PR here it's easier to help out as well if you're getting stuck. |
Underlying problem should be solved by #9766 |
polyfit()
#4554polyfit
with weights alters the DataArray in place #5644pre-commit run --all-files
whats-new.rst
Reimplement
polyfit
usingapply_ufunc
rather thandask.array.linalg.lstsq
. This should solve a number of issues with memory usage and chunking that were reported on the current version ofpolyfit
. The main downside is that variables chunked along the fitting dimension cannot be handled with this approach.There is a bunch of fiddly code here for handling the differing outputs from
np.polyfit
depending on the values of thefull
andcov
args. Depending on the performance implications, we could simplify some by keeping these inapply_ufunc
and dropping later. Much of this parsing would still be required though, because the only way to get the covariances is to setcov=True, full=False
.A few minor departures from the previous implementation:
rank
andsingular_values
diagnostic variables returned bynp.polyfit
are now returned on a pointwise basis, since these can change depending on skipped nans.np.polyfit
also returns thercond
used for each fit which I've included here.allow_rechunk=True
and warn about memory implications.skipna=True
, since the previous behavior seemed to be a limitation of the computational method.transpose
operation to putdegree
as the first dimension. This is arbitrary though, and actually the opposite of howcurvefit
returns ordering. So we could match up withcurvefit
but it would be breaking for polyfit.No new tests have been added since the previous suite was fairly comprehensive. Would be great to get some performance reports on real-world data such as the climate model detrending application in #5629.