Multiple strategies to calibrate the FaIR model.
anaconda
forpython3
python>=3.7
- for the Cummins calibration,
R>=4.1.1
andcmake>=3.2
conda env create -f environment.yml
conda activate fair-calibrate
cd r_scripts
R
> source("setup.r")
Edit the environment.yml
file and add the package you require to the list. If it is conda installable and available under the list of channels provided, it should be picked up. To then update the environment, run
conda env update -f environment.yml --prune
The .env
file contains environment variables that should be changed in order to produce the calibration. These are FaIR version, calibration version, constraints context and number of samples.
# example .env
CALIBRATION_VERSION=1.4
FAIR_VERSION=2.1.3
CONSTRAINT_SET="all-2022"
PRIOR_SAMPLES=1600000 # how many prior samples to draw
POSTERIOR_SAMPLES=841 # final posterior ensemble size
BATCH_SIZE=500 # how many scenarios to run in parallel
WORKERS=40 # how many cores to use for parallel runs
FRONT_SERIAL=0 # for debugging, how many serial runs to do first
FRONT_PARALLEL=0 # after serial runs, how many parallel runs to test
PLOTS=True # Produce plots?
PROGRESS=False # show progress bar? (good for interactive, bad
# on HPC batch jobs)
DATADIR=/path/to/datadir # A local location to download large external
# datafiles (a cache directory)
The output will be produced in output/fair-X.X.X/vY.Y.Y/Z/
where X is the FaIR version, Y is the calibration version and Z is the constraint set used. Multiple constraint philosphies can be applied for the same set of calibrations (e.g. AR6, 2022 observations, etc.). No posterior data will be committed to Git owing to size, but the intention is that the full output data will be on Zenodo.
- Create the
.env
file - see above - Set up environments for python and R (see above)
- Check the recipe inside the
run
bash script ./run
During diagnosis and debugging, scripts can be run individually, but must be run from the directories in which they reside (5 subdirectories deep). If you do this, activate your conda environment too (with conda activate fair-calibrate
).
Under the existing pattern -- which you are free to change in the run
recipe -- scripts are automatically run in numerical order by the workflow if they are prefixed with a two digit number and an underscore, in this order:
calibration/
sampling/
constraining/
- Create your workflow scripts inside
input/fair-X.X.X/vY.Y/Z
(copy an existing calibration to get started) - Set up environments for python and R (see above)
- Update your
.env
file to point to the correct fair version (X.X.X), calibration version (vY.Y) and constraint set (Z) - Check the recipe inside the
run
bash script ./run
- Check output. Ensure the performance metrics are documented and diagnostic plots look sensible.
- If releasing a new calibration: update the relevant sections of the Wiki
./create_zenodo_zip
- Upload to Zenodo
- I get different results from the 3-layer model calibration between using pre-compiled R binary for for Mac compared to building the R binary from source on CentOS7; both using R-4.1.1, and again using the Arc4 HPC. The Arc4 results are used. A future TODO would be to switch to
py-bobyqa
which is the optimizer used in the R code, and remove dependence on R, which may improve performace. - Related to above, scipy's multivariate normal and sparse matrix algebra routines seem fragile, and change between scipy versions (1.8, 1.9, 1.10). If anyone trying to reproduce this runs into "positive semidefinite" errors, raise an issue.
More details on each calibration version are in the Wiki.
It is critical that each calibration version and calibration set is well documented, as they may be used by others: often, differences in the responses in climate emulators are more a function of calibration than of model structural differences (I don't have a single good reference to prove this yet, but trust me). New calibrations will not be accepted without a Wiki entry.