Skip to content

Commit

Permalink
Jupyter docker: new full build with latest of almost everything excep…
Browse files Browse the repository at this point in the history
…t xclim and ravenpy to smooth transition (#121)

# Overview

This new full build has latest of almost everything except `xclim` and
`ravenpy` as intermediate step to smooth transition to `pandas` 2.2 freq
strings changes.

## Changes

- New: save conda env export, DockerHub build logs and Jenkins test
result in the repo to track changes much more easily between releases

- Jenkins: add `SAVE_RESULTING_NOTEBOOK_TIMEOUT` for slow notebooks or
slow machine

- Jupyter env changes:
- add `conda-pack` so we can export the conda env outside of the docker
image if need to run locally without docker
  - upgrade from Python 3.9 to 3.11
  - Relevant changes (alphabetical order):
```diff
-  - birdy=0.8.4=pyh1a96a4e_0
+      - birdhouse-birdy==0.8.7

# major upgrade from v2 to v3
-  - bokeh=2.4.3=pyhd8ed1ab_3
+  - bokeh=3.4.1=pyhd8ed1ab_0

-  - cartopy=0.21.1=py39h6e7ad6e_0
+  - cartopy=0.23.0=py311h320fe9a_0

-  - cf_xarray=0.8.0=pyhd8ed1ab_0
+  - cf_xarray=0.9.0=pyhd8ed1ab_0

-  - cfgrib=0.9.10.4=pyhd8ed1ab_0
+  - cfgrib=0.9.11.0=pyhd8ed1ab_0

-  - cftime=1.6.2=py39h2ae25f5_1
+  - cftime=1.6.3=py311h1f0f07a_0

-  - climpred=2.3.0=pyhd8ed1ab_0
+  - climpred=2.4.0=pyhd8ed1ab_0

-  - clisops=0.9.6=pyh1a96a4e_0
+  - clisops=0.13.0=pyhca7485f_0

-  - dask=2023.5.1=pyhd8ed1ab_0
+  - dask=2024.5.0=pyhd8ed1ab_0

-  - geopandas=0.13.0=pyhd8ed1ab_0
+  - geopandas=0.14.4=pyhd8ed1ab_0

-  - hvplot=0.8.3=pyhd8ed1ab_0
+  - hvplot=0.9.2=pyhd8ed1ab_0

-  - numpy=1.23.5=py39h3d75532_0
+  - numpy=1.24.4=py311h64a7726_0

-  - numba=0.57.0=py39hb75a051_1
+  - numba=0.59.1=py311h96b013e_0

# major upgrade from v1 to v2
-  - pandas=1.3.5=py39hde0f152_0
+  - pandas=2.1.4=py311h320fe9a_0

# major upgrade to v1
-  - panel=0.14.4=pyhd8ed1ab_0
+  - panel=1.4.2=pyhd8ed1ab_0

# major upgrade from v1 to v2
-  - pydantic=1.10.8=py39hd1e30aa_0
+  - pydantic=2.7.1=pyhd8ed1ab_0

# Python 3.9 to 3.11
-  - python=3.9.16=h2782a2a_0_cpython
+  - python=3.11.6=hab00c5b_0_cpython

-  - raven-hydro=0.2.1=py39h8e2dbb5_1
+  - raven-hydro=0.2.4=py311h64a4d7b_0

-  - ravenpy=0.12.1=py39hf3d152e_0
+      - ravenpy==0.13.1

-  - rioxarray=0.14.1=pyhd8ed1ab_0
+  - rioxarray=0.15.5=pyhd8ed1ab_0

-  - roocs-utils=0.6.4=pyh1a96a4e_0
+  - roocs-utils=0.6.8=pyhd8ed1ab_0

-  - scipy=1.9.1=py39h8ba3f38_0
+  - scipy=1.13.0=py311h517d4fd_1

-  - xarray=2023.1.0=pyhd8ed1ab_0
+  - xarray=2023.8.0=pyhd8ed1ab_0

-  - xclim=0.43.0=py39hf3d152e_1
+  - xclim=0.47.0=py311h38be061_0

-  - xesmf=0.7.1=pyhd8ed1ab_0
+  - xesmf=0.8.5=pyhd8ed1ab_0

-  - xskillscore=0.0.24=pyhd8ed1ab_0
+  - xskillscore=0.0.26=pyhd8ed1ab_0

+  - xscen=0.8.2=pyhd8ed1ab_0

+      - figanos==0.3.0

-      - xncml==0.2
+      - xncml==0.4.0

```


## Test

- Deployed as "beta" image in production for bokeh visualization
performance regression testing.
- Manual test notebook
https://github.com/Ouranosinc/PAVICS-landing/blob/master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-5Visualization.ipynb
for bokeh visualization performance and it looks fine.
- Jenkins build:
- Default notebooks, all passed:
https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/blob/54792e6510adfcd1bb21e1bd31fdfa36c5c634e0/docker/saved_buildout/jenkins-buildlogs-default.txt
- Raven notebooks, only known `HydroShare_integration.ipynb` failing:
https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/blob/931cfc924a147d07b59e88badff9f170e852a03b/docker/saved_buildout/jenkins-buildlogs-raven.txt


## Related Issue / Discussion

- Matching notebook fixes:
  - Pavics-sdi: PR Ouranosinc/pavics-sdi#321
  - Finch: PR url: None
- PAVICS-landing: PR
Ouranosinc/PAVICS-landing#78
  - RavenPy: PR CSHS-CWRA/RavenPy#356
  - Resolves Ouranosinc/PAVICS-landing#65
  - Resolves Ouranosinc/PAVICS-landing#66

- Deployment to PAVICS:
bird-house/birdhouse-deploy#453

- Jenkins-config changes for new notebooks: PR url: None

- Other issues found while working on this one
  - computationalmodelling/nbval#204
  - jupyterlab-contrib/jupyter-archive#132
  - CSHS-CWRA/RavenPy#357
  - CSHS-CWRA/RavenPy#361
  - CSHS-CWRA/RavenPy#362

- Previous release: PR
#134


## Additional Information

Full diff conda env export:

81deb99...931cfc9#diff-e8f2a6a53085ae29bb7cedc701c1d345a330651ae971555e85a5c005e94f4cd9


Full new conda env export:

https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/blob/931cfc924a147d07b59e88badff9f170e852a03b/docker/saved_buildout/conda-env-export.yml


DockerHub build log

https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/blob/931cfc924a147d07b59e88badff9f170e852a03b/docker/saved_buildout/docker-buildlogs.txt
  • Loading branch information
tlvu authored May 9, 2024
2 parents 81987bf + 54792e6 commit c7af8b8
Show file tree
Hide file tree
Showing 13 changed files with 3,006 additions and 38 deletions.
6 changes: 4 additions & 2 deletions Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ pipeline {
// https://jenkins.io/doc/book/pipeline/syntax/
agent {
docker {
image "pavics/workflow-tests:py39-230601-1-update240116"
image "pavics/workflow-tests:py311-240506-update240508"
label 'linux && docker'
}
}
Expand Down Expand Up @@ -82,7 +82,7 @@ Requires 'weaver' component to be active on the target 'PAVICS_HOST' server
string(name: 'RAVENPY_REPO', defaultValue: 'CSHS-CWRA/RavenPy',
description: 'https://github.com/CSHS-CWRA/RavenPy repo or fork to test against.', trim: true)
booleanParam(name: 'TEST_ESGF_COMPUTE_API_REPO', defaultValue: false,
description: 'Check the box to test esgf-compute-api repo.')
description: 'Check the box to test esgf-compute-api repo. Kept here for historical reasons only, not working anymore.')
string(name: 'ESGF_COMPUTE_API_BRANCH', defaultValue: 'devel',
description: 'ESGF_COMPUTE_API_REPO branch to test against.', trim: true)
string(name: 'ESGF_COMPUTE_API_REPO', defaultValue: 'ESGF/esgf-compute-api',
Expand All @@ -100,6 +100,8 @@ Requires 'weaver' component to be active on the target 'PAVICS_HOST' server
booleanParam(name: 'SAVE_RESULTING_NOTEBOOK', defaultValue: true,
description: '''Check the box to save the resulting notebooks of the run.
Note this is another run, will double the time and no guaranty to have same error as the run from py.test.''')
string(name: 'SAVE_RESULTING_NOTEBOOK_TIMEOUT', defaultValue: '240',
description: 'Timeout in sec for nbconvert. For slow notebooks or slow machine', trim: true)
}

triggers {
Expand Down
2 changes: 1 addition & 1 deletion binder/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM pavics/workflow-tests:py39-230601-1-update240116
FROM pavics/workflow-tests:py311-240506-update240508

USER root

Expand Down
12 changes: 6 additions & 6 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ FROM continuumio/miniconda3
# Use mamba for much improved performance over conda.
# The 'channel_priority strict' did help conda but it was not enough.
RUN conda update conda -n base && \
conda install mamba -n base -c conda-forge -c defaults && \
conda install mamba conda-pack -n base -c conda-forge -c defaults && \
conda clean --all --yes && \
conda config --set channel_priority strict && \
wget -qO- https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -C /usr/local -xvj bin/micromamba
Expand All @@ -21,7 +21,7 @@ RUN apt-get update && \
# Create user jenkins for our Jenkins e2e notebooks test suite.
# Change /opt/conda folder permissions for jupyter-conda extension.
RUN groupadd --gid 1000 jenkins \
&& useradd --uid 1000 --gid jenkins --create-home jenkins && \
&& useradd --uid 1000 --gid jenkins --shell /bin/bash --create-home jenkins && \
chmod -R a+rwX /opt/conda

COPY environment.yml /environment.yml
Expand All @@ -43,10 +43,8 @@ COPY environment.yml /environment.yml
# Conda was stuck at this step:
# DEBUG conda.common._logic:_run_sat(607): Invoking SAT with clause count: 2500273
#
# Python 3.10 cause this "ValueError: `popmean.shape[axis]` must equal 1." in
# homepage nb 4, see https://github.com/Ouranosinc/PAVICS-landing/issues/65
RUN umask 0000 && \
mamba create --name birdy --channel conda-forge --channel defaults xclim ravenpy python=3.9 --yes && \
mamba create --name birdy --channel conda-forge --channel defaults xclim ravenpy python=3.11 --yes && \
mamba env update --name birdy --file /environment.yml && \
mamba clean --all --yes

Expand Down Expand Up @@ -97,7 +95,9 @@ RUN wget https://raw.githubusercontent.com/jupyter/docker-stacks/$DOCKER_STACKS_
wget https://raw.githubusercontent.com/jupyter/docker-stacks/$DOCKER_STACKS_COMMIT/base-notebook/jupyter_notebook_config.py --output-document /etc/jupyter/jupyter_notebook_config.py && \
chmod a+rx /usr/local/bin/start.sh /usr/local/bin/start-singleuser.sh /usr/local/bin/start-notebook.sh /usr/local/bin/fix-permissions && \
chmod a+r /etc/jupyter/jupyter_notebook_config.py && \
mkdir /notebook_dir && chown jenkins /notebook_dir && \
mkdir -p /notebook_dir/writable-workspace && chown jenkins /notebook_dir/writable-workspace && \
mkdir -p /notebook_dir/pavics-homepage && chown jenkins /notebook_dir/pavics-homepage && \
chown root:root /notebook_dir && chmod a-w /notebook_dir && \
chmod a+rwX -R /opt/conda/envs/birdy/fonts && \
mkdir /opt/conda/pkgs/cache && \
chown jenkins:jenkins -R /opt/conda/pkgs/cache && \
Expand Down
10 changes: 6 additions & 4 deletions docker/Dockerfile.testing
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# For testing quickly without having to do a full rebuild.

FROM pavics/workflow-tests:py39-230601-1-update231122
FROM pavics/workflow-tests:py311-240506

#ENV ESMFMKFILE="/opt/conda/envs/birdy/lib/esmf.mk"

Expand All @@ -11,9 +11,11 @@ USER root
# Use 'update' for existing and 'install' for new package.
# Keep same channel ordering to not revert anything.
RUN umask 0000 \
&& pip install --no-cache-dir https://github.com/bird-house/threddsclient/archive/refs/heads/master.zip
# && mamba install -c conda-forge -c cdat -c bokeh -c plotly -c pyviz/label/dev -c defaults -n birdy salib \
# && mamba clean --all --yes
&& pip uninstall -y ravenpy birdhouse-birdy \
&& pip install --no-cache-dir --no-deps ravenpy==0.13.1 birdhouse-birdy==0.8.7 \
&& mamba install -c conda-forge -c cdat -c bokeh -c plotly -c pyviz/label/dev -c defaults -n birdy jupyterlab-git==0.44.0 \
&& mamba clean --all --yes
# && pip install --no-cache-dir --upgrade figanos
# && pip uninstall -y ravenpy \
# && mamba install -c conda-forge -c cdat -c bokeh -c plotly -c defaults -n birdy ravenpy aiohttp

Expand Down
52 changes: 32 additions & 20 deletions docker/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ channels:
- conda-forge
- cdat
- bokeh
- plotly # for jupyter-dash
# - plotly # for jupyter-dash
# - pyston
- pyviz/label/dev # for jupyter-panel-proxy, panel
- defaults
Expand All @@ -19,8 +19,13 @@ dependencies:

# Pin latest xclim and ravenpy to avoid downgrading during the second installation phase.
# Mamba is quicker to solve dependencies than conda, but it is less precise so accidental downgrades can happen.
- xclim >= 0.43.0
- ravenpy >= 0.12.1
- xclim >= 0.47.0
- ravenpy >= 0.13.0

# https://anaconda.org/conda-forge/xscen
# A climate change scenario-building analysis framework, built with xclim/xarray.
# PIN to 0.8.2 for xclim 0.47.0 compat
- xscen == 0.8.2

#- dask # from xclim and ravenpy
#- distributed
Expand All @@ -32,13 +37,10 @@ dependencies:
- matplotlib
# - xarray # from xclim and ravenpy
# - numpy # from xclim and ravenpy
# TODO: unpin numpy, pinned for hvplot.quadmesh(rasterize=True)
# datashade=True is an alias
# See https://github.com/holoviz/hvplot/issues/1073
- numpy <= 1.23.5
- numpy
# TODO: unpin cf_xarray due to https://github.com/xarray-contrib/cf-xarray/issues/442
- cf_xarray != 0.8.1
- birdy
- birdy >= 0.8.7
# - owslib>=0.23.0 # from ravenpy
# - netcdf4 # from ravenpy
# TODO: remove libnetcdf PIN because https://github.com/Ouranosinc/PAVICS-landing/issues/66
Expand All @@ -49,6 +51,10 @@ dependencies:
- cfgrib
- pydap
- cartopy >= 0.21.0
# Fixes cartopy bug arising with scipy 1.11.
# https://github.com/Ouranosinc/pavics-sdi/pull/298
# https://github.com/Ouranosinc/pavics-sdi/issues/294
- pykdtree
- descartes # Is this really needed???
# - rasterio # from ravenpy
# - gdal # for osgeo, from ravenpy
Expand All @@ -60,15 +66,14 @@ dependencies:
- pyogrio
- scikit-image
- ipyleaflet
- threddsclient
- threddsclient >= 0.4.5
- bokeh
- regionmask
- siphon
- jupyter_bokeh
- pscript
- h5netcdf
# TODO: remove panel pin when Analogues dashboard works
- panel <= 0.14.4
- panel >= 1.2.2
# https://github.com/holoviz/panel
- pyviz_comms # (was labextension pyviz/jupyterlab_pyviz in jupyterlab v2)
- holoviews
Expand All @@ -79,6 +84,8 @@ dependencies:
# https://github.com/bird-house/birdhouse-deploy/pull/63#issuecomment-668270608
# pinning hvplot did not solve the problem with violin plot.
- hvplot
# https://anaconda.org/conda-forge/dash
- dash >= 2.16.1
# https://streamlit.io/
# https://anaconda.org/conda-forge/streamlit
- streamlit
Expand All @@ -89,6 +96,9 @@ dependencies:
# https://python-pptx.readthedocs.io/en/latest/
# https://anaconda.org/conda-forge/python-pptx
- python-pptx
# openpyxl: library to read/write Excel 2010 xlsx/xlsm files
# https://anaconda.org/conda-forge/openpyxl
- openpyxl
- nc-time-axis
# - cftime # from xclim and ravenpy
# - statsmodels # for ravenpy
Expand All @@ -102,11 +112,7 @@ dependencies:
# Plugin for building and loading intake catalogs for earth system data sets
# holdings, such as CMIP (Coupled Model Intercomparison Project) and CESM
# Large Ensemble datasets.
# Pin intake-esm since newer version activated validation of optional fields and broke our notebooks
# ValidationError: 1 validation error for ESMCatalogModel
# aggregation_control
# field required (type=value_error.missing)
- intake-esm <= 2021.8.17
- intake-esm >= 2023.6.14
# load netCDF, Zarr and other multi-dimensional data (xarray_image, netcdf,
# grib, opendap, rasterio, remote-xarray, zarr)
- intake-xarray
Expand Down Expand Up @@ -140,7 +146,8 @@ dependencies:
- esgf-compute-api
# https://anaconda.org/conda-forge/esgf-pyclient (for pavics-sdi esgf-dap.ipynb)
- esgf-pyclient
- cdms2
# Disable cdms2 because it was forcing python downgrade to 3.10 and below.
#- cdms2
# Disable vcs because it was forcing python downgrade to below 3.9.
# See https://github.com/CDAT/vcs/issues/457
# package vcs-8.1-py_0 requires vtk-cdat >8.1, but none of the providers can be installed
Expand Down Expand Up @@ -168,10 +175,12 @@ dependencies:
# extension to produce .py files from notebook .ipynb files
- jupytext
# jupyterlab extension for git
- jupyterlab-git
- jupyterlab-git >= 0.44.0
# Voilà turns Jupyter notebooks into standalone web applications
- voila
- jupyter-archive
# PIN jupyter-archive due to
# https://github.com/jupyterlab-contrib/jupyter-archive/issues/132
- jupyter-archive <= 3.3.4
# https://github.com/jtpio/jupyterlab-topbar
- jupyterlab-topbar
# https://github.com/jtpio/jupyterlab-system-monitor (was from jupyterlab-topbar)
Expand All @@ -180,7 +189,6 @@ dependencies:
- nbresuse # needed by jupyterlab-system-monitor
# xeus-python: back-end kernel implementing the Jupyter Debug Protocol
- xeus-python
- jupyter-dash
# https:://github.com/jupyterhub/jupyter-server-proxy
- jupyter-server-proxy
# https://github.com/dask/dask-labextension
Expand Down Expand Up @@ -225,6 +233,10 @@ dependencies:
# https://pypi.org/project/fstd2nc/
# Converts RPN standard files (from Environment Canada) to netCDF files.
- fstd2nc
# https://pypi.org/project/figanos/
# Outils pour produire des graphiques informatifs sur les impacts des
# changements climatiques.
- figanos
# visual debugger for Jupyter Notebook, not working with JupyterLab at this moment
- pixiedust
# block execution of 'run_all_cells' until user input finished
Expand Down
Loading

0 comments on commit c7af8b8

Please sign in to comment.