Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization #92

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from
Draft

Conversation

stijnvanhoey
Copy link
Contributor

  • I have checked to ensure there aren't other open Pull Requests for the same update/change
  • I made sure all new and existing data sets have a description in the data/README.md file
  • I have updated the documentation accordingly.

This PR is a blueprint and first attempt to provide an optimization functionality taking into account the extra_time or lagtime concept requested by @twallema:

  • it relies on xarray to define the data to fit against. Currently just a single array, but this makes it extendable to more measured variables at the same time.
  • I renamed the current mcmc.py file to optimize, as this has not yet any mcmc functionality as functions
  • I'm not completely satisfied about the model-pars versus other par (lag_time), but it is a first attempt to make the same sse function useable with fixed and optimized lag_time.

See the example notebook 0.2-stijnvanhoey-refactor-optimization.ipynb for an example of the usage, which I tried to keep close to the current fit_pso function.

@twallema we should NOT have both sse and SSE there and neither remain with fit and fit_pso but as current test and work in progress, I kept them both for the time being.

Note I only worked on the sse optimiziation as such and not on the MCMC part

@stijnvanhoey stijnvanhoey marked this pull request as draft June 8, 2020 21:29
@stijnvanhoey
Copy link
Contributor Author

@twallema feel free to test it out (maybe checkout to my branch to try);

@jorisvandenbossche, there are still a lot of shortcuts and hardcoded elements, which requires further abstraction, so any input is welcome. We could actually make the fit a model method as well...

@@ -54,27 +54,30 @@ def get_COVID19_SEIRD_parameters():
-----------
parameters = get_COVID19_SEIRD_parameters()
"""
abs_dir = os.path.dirname(__file__)
rel_dir = os.path.join(abs_dir, "../../../data/raw/model_parameters")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do a merge of master, as I think this already changed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did update this morning and merged with the adjustments Jenna did earlier.


# run model
time = len(data.time) + lag_time # at least this length
output = model.sim(time)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also run with [-lag_time, time] ? (or does the model not work with negative time?)

Copy link
Collaborator

@twallema twallema Jun 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solve_ivp from SciPy requires the user to define the times at which the model output should be returned, they can be negative without any problem. So assigning the time as [ -lag_time, time] in the model output xarray, and assigning the time as [0 1 2 ... 7] in the data xarray should allow for pairing without using values as suggested below. However, this does not really abstract the lag_time. I'm also wondering if this approach will work if the times do not match, f.i. when using a discrete-time model with step-length 0.235, some form of extrapolation will be necessary to match outputs. I've seen xarray can be linked to SciPy interpolate: see http://xarray.pydata.org/en/stable/generated/xarray.DataArray.interp.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default will be to only align those values where the time values match exactly, but you can indeed make this more flexible (eg http://xarray.pydata.org/en/stable/generated/xarray.DataArray.reindex.html also allows a simple "nearest" or a certain tolerance that should be allowed)


# define new parameters in model
for i, parameter in enumerate(model_parameters):
model.parameters.update({parameter: theta[i]})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we should introduce a concept of "cloning" a model, to avoid changing the original one when setting parameters like this for an optimization run

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to define this in a separate issue/PR to provide a more generic solution.

output_to_compare = subset_output.sel(time=slice(lag_time, lag_time - 1 + len(data)))

# calculate sse -> we could maybe enable more function options on this level?
sse = np.sum((data.values - output_to_compare.values)**2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check that the "time" actually matched? (with the provided data) So if you adjust the time of the model output with lag_time, this calculation could be done with the xarray object (so without .values), and then this alignment on the time dimension will be done automatically

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants