Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute all notebooks / clear notebook output #1246

Merged
merged 43 commits into from
Jan 15, 2024

Conversation

Doresic
Copy link
Contributor

@Doresic Doresic commented Dec 13, 2023

All of the example notebooks are stored on GitHub with their output generated which is a lot of figures and text, taking up storage unnecessarily. I've removed the output, but I still need to check the notebook output is being generated for the online docs.

@codecov-commenter
Copy link

codecov-commenter commented Dec 13, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (ebdf99c) 84.31% compared to head (038d1b7) 84.22%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1246      +/-   ##
===========================================
- Coverage    84.31%   84.22%   -0.09%     
===========================================
  Files          151      151              
  Lines        12331    12331              
===========================================
- Hits         10397    10386      -11     
- Misses        1934     1945      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dweindl dweindl changed the title Clear output of all notebooks (Drafting) Execute all notebooks / clear notebook output (Drafting) Dec 15, 2023
dweindl and others added 4 commits December 15, 2023 14:50
I reduced the notebook output and run time by tweaking some small things like numbers of starts, rerunning, removing PEtab model compilation output, and added seeds for reproducability. Changes per notebook:
- `amici.ipynb` -- TODO, didn't do yet
- `censored.ipynb` -- reduce n_starts 10 -> 3
- `getting_started.ipynb` -- n_starts 100 -> 10 (do we need more for a nicer waterfall plot?), added seed for reproducability, Fides verbose=logging.ERROR, create_problem verbose=False (TODO amici.swig_wrappers logging)
- `hierarchical.ipynb` -- change to only create_problem with verbose=False, set a seed for reproducability
- TODO rest...
I made the following changes in the example notebooks:
- `amici.py` -- verbose=False on model compilation, seed for reproducability, clear outputs
- `juliy.py` -- set number of starts 100 --> 20, set seed for reproducability, samples 10000 -> 2000, clear outputs
- `nonlinear_monotone` -- verbose=False on model compilation
- `ordinal` -- verbose=False on model compilation
- `petab_import` -- verbose=False on model compilation
- `sampler_study` -- verbose=False on model compilation
- `store` -- verbose=False on model compilation, fides verbose ERROR
- `synthetic_data` -- n_starts 100 -> 20, verbose=False on model compilation
- `workflow_comparison` -- verbose=False on model compilation, n_starts 25 - 10
@Doresic
Copy link
Contributor Author

Doresic commented Jan 3, 2024

I reduced the notebook output and run time by tweaking some small things like numbers of starts, rerunning, removing PEtab model compilation output, and added seeds for reproducability. Changes per notebook:

  • amici.py -- verbose=False on model compilation, seed for reproducability, clear outputs
  • censored.ipynb -- reduce n_starts 10 -> 3
  • getting_started.ipynb -- n_starts 100 -> 10 (do we need more for a nicer waterfall plot?), added seed for reproducability, Fides verbose=logging.ERROR, create_problem verbose=False (TODO amici.swig_wrappers logging)
  • hierarchical.ipynb -- change to only create_problem with verbose=False, set a seed for reproducibility
  • julia.py -- set number of starts 100 --> 20, set seed for reproducability, samples 10000 -> 2000, clear outputs
  • nonlinear_monotone -- verbose=False on model compilation
  • ordinal -- verbose=False on model compilation
  • petab_import -- verbose=False on model compilation
  • sampler_study -- verbose=False on model compilation
  • store -- verbose=False on model compilation, fides verbose ERROR
  • synthetic_data -- n_starts 100 -> 20, verbose=False on model compilation
  • workflow_comparison -- verbose=False on model compilation, n_starts 25 - 10

@Doresic Doresic self-assigned this Jan 3, 2024
@Doresic
Copy link
Contributor Author

Doresic commented Jan 5, 2024

The docs test is very slow and ultimately times out at this point:

reading sources... [ 56%] example
0.00s - Debugger warning: It seems that frozen modules are being used, which may
0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.
0.00s - Debugger warning: It seems that frozen modules are being used, which may
0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.
Warning:  Could not parse YAML metadata at line 5 column 1: not an object
Warning:  Could not parse YAML metadata at line 15 column 1: not an object
reading sources... [ 58%] example/amici
0.00s - Debugger warning: It seems that frozen modules are being used, which may
0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.
reading sources... [ 60%] example/censored
0.00s - Debugger warning: It seems that frozen modules are being used, which may
0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.

I'm not sure what is the issue with these frozen modules. As much as I could find out it did have something to do with Jupyter and python 3.11. Searching for a solution to speed up the example build...

@dweindl
Copy link
Member

dweindl commented Jan 5, 2024

I'm not sure what is the issue with these frozen modules. As much as I could find out it did have something to do with Jupyter and python 3.11. Searching for a solution to speed up the example build...

Thanks for the cleanup so far.

Don't worry about this message, not relevant here. I silenced it.

Further opportunity for optimization:

getting_started.ipynb:

result = optimize.minimize(problem=problem, 
                           optimizer=optimizer_scipy_lbfgsb,
                           engine=engine,
                           n_starts=100)

... we probably can reduce n_starts there.

same notebook: computing profiles for 3 parameter should be sufficient, we don't need all.

@Doresic
Copy link
Contributor Author

Doresic commented Jan 5, 2024

Even when running locally (tox -e doc), reading sources is taking too long. It already starts at the conversion_reaction.ipynb notebook.
I think it might have something to do with this: when I run the notebook alone with python 3.10 it all runs nicely. If I run it with 3.11 the first optimization starts printing a lot of these errors:
image
So much so, that the whole notebook freezes.

@Doresic
Copy link
Contributor Author

Doresic commented Jan 5, 2024

I'm not sure what is the issue with these frozen modules. As much as I could find out it did have something to do with Jupyter and python 3.11. Searching for a solution to speed up the example build...

Thanks for the cleanup so far.

Don't worry about this message, not relevant here. I silenced it.

Further opportunity for optimization:

getting_started.ipynb:

result = optimize.minimize(problem=problem, 
                           optimizer=optimizer_scipy_lbfgsb,
                           engine=engine,
                           n_starts=100)

... we probably can reduce n_starts there.

same notebook: computing profiles for 3 parameter should be sufficient, we don't need all.

If this is the point I think it is, I left it as 100 startpoints so the global optimum found is nice enough to have nice profiles. It doesn't take that long to run (relative to the profiling). But running 3 profiles is definitely better.

@dweindl
Copy link
Member

dweindl commented Jan 5, 2024

a lot of these errors:

see also #1159

@dweindl
Copy link
Member

dweindl commented Jan 5, 2024

To speed things up:

  • I think we can use a faster model in getting_started, not sure we need anything ODE/PEtab-based in there. we can just reference the other example.
  • I don't think we need to show profiles and sampling using 5 different types of Objectives. Quite some examples can be reduced. e.g., https://pypesto.readthedocs.io/en/latest/example/conversion_reaction.html seems to be rather redundant and could be removed after moving maybe the ensemble part somewhere else. maybe the construction of the objective can be moved to the amici notebook if not already there. also for the julia notebook, i think it sufficient to show optimization, the rest should be straight-forward.

To not get held up for too long, I'd suggest to pull a couple of changes out of this PR (started: #1268) and merge what's already good/fast enough.

dweindl added a commit that referenced this pull request Jan 10, 2024
dweindl added a commit that referenced this pull request Jan 10, 2024
#1278)

* Doc: Clear notebook output (fixed parameters, prior definition, store)

A subset of #1246.

* Really remove output / fix typos

* cleanup
@dweindl dweindl changed the title Execute all notebooks / clear notebook output (Drafting) Execute all notebooks / clear notebook output Jan 10, 2024
@Doresic
Copy link
Contributor Author

Doresic commented Jan 10, 2024

There still seems to be too little time for the readthedocs build. It stops at

...
�[2Kreading sources... [ 79%] example/prior_definition
�[2Kreading sources... [ 81%] example/sampler_study
�[2Kreading sources... [ 82%] example/sampling_diagnostics
�[2Kreading sources... [ 84%] example/store

Command killed due to timeout or excessive memory consumption

Will take a look at the rest of the notebooks, specifically the sampling ones, I have a feeling they are taking too long to build.

@Doresic
Copy link
Contributor Author

Doresic commented Jan 11, 2024

New changes:

  • sampling_diagnostics -- n_samples 10000 -> 1000 on both sampling
  • sampler_study -- n_samples 1e4 -> 1e3 where not necessary to have an extremely good sampling result, add seed for reproducibility (to have a consistently low burn-in to be able to have only 1e3 samples).
  • custom_objective_function -- using SingleCoreEngine is better with simple problems like these. The overhead of MultiProcessing is not worth it. (7 seconds vs 0.7 seconds) Left a note that in general with more expensive problems, it is recommended to use the MultiProcessEngine
  • getting_started -- It's still taking a bit too long. Reduced the starts before profiling 100 -> 25, and samples 10000 -> 1000. Results are basically the same. Additionally, added a note on the general amount of starts one would use.
  • ordinal -- further reduce n_starts from 10 to 3, and use MultiProcessing with 3 procs.
  • nonlinear_monotone -- further reduce n_starts from 10 to 3, and use MultiProcessing with 3 procs.
  • censored -- use MultiProcessing with 3 procs.
  • hierarchical -- small change, set n_procs of MultiProcessing from 6 to 3 as we run only 3 starts.
  • ... Will edit if more changes are made

New changes:
- `sampling_diagnostics` -- n_samples 10000 -> 1000 on both sampling
- `sampler_study` -- n_samples 1e4 -> 1e3 where not necessary to have an extremely good sampling result, add seed for reproducibility (to have a consistently low burn-in to be able to have only 1e3 samples).
@Doresic Doresic marked this pull request as ready for review January 11, 2024 11:55
Doresic added a commit that referenced this pull request Jan 11, 2024
Copy link
Member

@dilpath dilpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@stephanmg stephanmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning this up.

Doresic added a commit that referenced this pull request Jan 12, 2024
…tor (#1245)

* Hier. par. plotting for fides

* hierarchical 'hess' bug

* Fix storing of inner parameters in result

Inner parameters were not being stored when the result was stored. This was due to a problem with putting a dictionary into an HDF5 format. I've transformed it into 2 lists now: a INNER_PARAMETERS_VALUES and INNER_PARAMETER_NAMES list.

* adjusted documentation of hierarichal opt.

* Fix some type issues

* Small fix

* Save only values, not names

* Extend to all optimizers

* Remove redundant

* Small update

* Update parameters.py

* Small typo

* Fix some bugs, use lists for saving instead

* Fix tests

* Update hierarchical parameter plot test

* Largescale renaming and restructuring

- Restructuring of the hierarchical scaling+offset method. The base classes used by all non-quantitative and semi-quantitative data types are put in the `base_...` files. All classes related to the scaling+offset method (for relative data) is moved to the `relative` folder, analogous to other data types.
- Renaming: the naming scheme doesn't follow the method, but the data type now. It's more direct and removes the ambiguity of `OptimalScaling` for ordinal data and `scaling_offset` for relative data.
- Optimal scaling -> ordinal
- Spline approximation -> semiquantitative
- Scaling + offset -> relative

* Update base_problem.py

* Fix Docstrings

* Daniel&Dilan review changes

* Small change

* More Dilan Review changes

* Fix testing

* Fix base test

* Rename notebooks (again)

* Move inner pars to decorator

* Collect decorators for minimize

* Maren review changes

* Rename & fix notebooks

* Include notebook changes

* FIx tests

* Add output

* Include relative into Collector

Included the relative calculator into the Inner calculator collector. Now relative data can be used together with any other non-quantitative data type. A lot of TODOs to still clean up. But it works nicely on all examples, except one which I think is a bad one (Boehm_mixed_test with nonlinearities: 1 spline 1 relative 1 known: very bad convergence... fits look okayish...)

* Small documentation update

* Documentation update

* Update the notebooks

* Update spline_approximation.py

* Change data_types from list to set

* Use add_sim_grad_to_opt_grad

* Rename to semiquantitative in example

* mode RES with only RELATIVE data

* Add petab data type validation

* DO TODOs

* Small TODO

* FIx api.rst module name changes

* Update CODEOWNERS module name changes

* Fix underline too short & api.rst

* Fix plotting routines

* Inline literal error

* Testing error

* Testing error 2

* Test literal error

* Fix missing indentation

* Fix inline

* Daniel, Fabian & Stephan review changes

* Notebook changes

* Dilan & Daniel review changes

* Change title of relative_data.ipynb

* Remove observableParameter from solvers

* Notebook changes from #1246

* Fabian review change

---------

Co-authored-by: Lea@Mac <leaseep@gmail.com>
Copy link
Member

@dweindl dweindl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Doresic Doresic merged commit 1c5580d into develop Jan 15, 2024
18 checks passed
@Doresic Doresic deleted the remove_example_notebook_output branch January 15, 2024 10:18
This was referenced Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants