-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic curvefit implementation #4849
Conversation
This is great, thanks for submitting this! I just had a go with it, and it worked nicely. I have a couple of suggestions for improving it though:
Also, the whole argument inspection thing probably deserves a few dedicated tests, in addition to testing the fitting functionality. |
I think the way I configured things now does replicate the polyfit results. For example: ds = xr.tutorial.open_dataset('air_temperature')
ds['air2'] = ds.air.copy()
ds.polyfit(dim='time', deg=2)
<xarray.Dataset>
Dimensions: (degree: 3, lat: 25, lon: 53)
Coordinates:
* degree (degree) int64 2 1 0
* lat (lat) float64 75.0 72.5 70.0 ... 20.0 17.5 15.0
* lon (lon) float64 200.0 202.5 205.0 ... 327.5 330.0
Data variables:
air_polyfit_coefficients (degree, lat, lon) float64 -1.162e-32 ... 1.13...
air2_polyfit_coefficients (degree, lat, lon) float64 -1.162e-32 ... 1.14... Compared to this: def square(x, a, b ,c):
return a*np.power(x, 2) + b*x + c
ds.curvefit(x=ds.time, dim='time', func=square)
<xarray.Dataset>
Dimensions: (lat: 25, lon: 53, param: 3)
Coordinates:
* lat (lat) float32 75.0 72.5 70.0 ... 20.0 17.5 15.0
* lon (lon) float32 200.0 202.5 205.0 ... 327.5 330.0
* param (param) <U1 'a' 'b' 'c'
Data variables:
air_curvefit_coefficients (param, lat, lon) float64 -1.162e-32 ... 1.13...
air2_curvefit_coefficients (param, lat, lon) float64 -1.162e-32 ... 1.14... In both cases, each variable in the dataset returns a separate coefficients variable, and all fittable coefficients are stacked along a dimension,
Yeah this would be good. Should be easy to look for default values in the function itself using
Looks like this could be possible with a call to |
You're right! My bad. The consistency with polyfit looks good.
Oh nice! That looks like it would allow for ND functions fit to ND data. It looks like there is a dask version of ravel which might be useful. (And judging by the comments on that blog post I think @StanczakDominik would appreciate this feature too!) |
Some more progress here.
The best way to specify the fitting coordinates is a bit tricky to figure out. My original use case for this was needing to fit a relationship between two time/lat/lon dataarrays with the fit done over all time. But probably a more common use would be to just fit a curve over one or two dimensions that already exist in your data. So it would be great to handle these possibilities seamlessly. What I've settled on for now is a # Fit a 1d function in time, returns parameters with dims (x, y)
da.curvefit(coords='time', ...)
# Fit a 2d function in space, returns parameters with dims (t)
da.curvefit(coords=['x', 'y'], ...)
# Fit a 1d function with another 3d dataarray and aggregate over time, returns parameters with dims (x, y)
da.curvefit(coords=da1, reduce_dim='time', ...) The logic to make this work got a bit complicated, since we need to supply the right Will eventually need to add tests and improve docs and examples. Tests especially I could use some help on. |
Added a couple usage examples in the docs, including one that replicates the scipy example of fitting multiple peaks. Because of the wrapper function and variable args, this requires supplying |
Added some checks that will raise errors if Also added minimal tests, but these should probably be expanded. |
I've been playing around with this some more, and found the performance to be much better using a process-heavy dask scheduler. For example: import xarray as xr
import numpy as np
import time
import dask
def exponential(x, a, xc):
return np.exp((x - xc) / a)
x = np.arange(-5, 5, 0.001)
t = np.arange(-5, 5, 0.01)
X, T = np.meshgrid(x, t)
Z1 = np.random.uniform(low=-5, high=5, size=X.shape)
Z2 = exponential(Z1, 3, X) + np.random.normal(scale=0.1, size=Z1.shape)
ds = xr.Dataset(
data_vars=dict(var1=(["t", "x"], Z1), var2=(["t", "x"], Z2)),
coords={"t": t, "x": x},
)
ds = ds.chunk({'x':10})
def test_fit():
start = time.time()
fit = ds.var2.curvefit(
coords=ds.var1,
func=exponential,
reduce_dim="t",
).compute()
print(f'Fitting time: {time.time() - start:.2f}s')
with dask.config.set(scheduler='threads'):
test_fit()
with dask.config.set(scheduler='processes'):
test_fit()
with dask.distributed.Client() as client:
test_fit()
with dask.distributed.Client(n_workers=8, threads_per_worker=1) as client:
test_fit() On my 8-core machine, takes:
According to this the underlying scipy routines should be thread safe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @slevang This is an amazing first PR!
It's very thorough and nicely written. I have just minor comments.
The only major comment is that I suggest refactoring some code out to a couple of helper functions which can then be tested independently. A few more test cases would be nice but I think you've covered most of the functionality.
Thanks for the review @dcherian! The latest commit has been refactored with a couple helper functions and associated tests, and any steps that served no purpose other than consistency with If you can think of any more specific test cases that should be included, happy to add them. |
This seems ready to be merged? |
I think so. I pushed a merge commit to get this up to date with the current release. |
Thanks @slevang. Sorry for the delay! |
pre-commit run --all-files
whats-new.rst
api.rst
This is a simple implementation of a more general curve-fitting API as discussed in #4300, using the existing scipy
curve_fit
functionality wrapped withapply_ufunc
. It works for arbitrary user-supplied 1D functions that ingest numpy arrays. Formatting and nomenclature of the outputs was largely copied from.polyfit
, but could probably be improved.