Skip to content

Commit

Permalink
Set up reproducible Python environments with conda-lock (#2901)
Browse files Browse the repository at this point in the history
* Add a conda-lock setup for discussion.
* Move python-snappy into project.dependencies in pyproject.toml
* Remove sphinx-autoapi from pypi deps, and no longer required snappy-python
* Switch to using conda-forge version of recordlinkage v0.16
* Update conda-lock.yml now that all dependencies are available on conda-forge
* Consolidate conda env files under environments/ dir
* Add a GitHub action to relock dependencies
* Quote the pip install command
* Remove pip install of pudl from environment.yml
* Rename workflow
* Only build lockfile from pyproject.toml, don't install extras.
* Just install conda-lock, not pudl, before running conda-lock.
* install conda-lock with pip
* Move all remaining dev-environment.yml deps to pyproject.toml
* Add other platforms; make draft PR against dev.
* Comment out dev base branch for now.
* Remove pandas extras and recordlinkage deps from pyproject.toml
* Use conda-lock --micromamba rather than --mamba
* Don't specify grpcio, or specific recordlinkage version
* Render platform-specific environment files in github action
* Fix paths relative to environments directory
* Add some comment notes to workflow
* Render environment for Read The Docs.
* Use environment not explicit rendered lockfile
* Add readthedocs specific sphinx extension
* Don't render explicit conda env for RTD since it can't read it.
* Build linux-aarch64 lockfile. Use conda-lock.yml in workflows.
* Comment out non-working linux-aarch64 platform for now.
* Switch to using rendered lockfiles.
* Remove deprecated environment files
* Switch to using a micromamba docker image
* Install git into the docker image.
* Use micromamba and unrendered multi-platform lockfile.
* Add main category to micromamba environment creation.
* Use conda-lock not base as env name
* Add a conda-lock setup for discussion.
* Move python-snappy into project.dependencies in pyproject.toml
* Remove sphinx-autoapi from pypi deps, and no longer required snappy-python
* Add linux-aarch64 platform back into conda-lock settings.
  • Loading branch information
zaneselvans committed Nov 14, 2023
1 parent 6cb0793 commit 95a855c
Show file tree
Hide file tree
Showing 22 changed files with 35,019 additions and 168 deletions.
27 changes: 9 additions & 18 deletions .github/workflows/tox-pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,10 @@ jobs:
- name: Install Conda environment using mamba
uses: mamba-org/setup-micromamba@v1
with:
environment-file: test/test-environment.yml
environment-file: environments/conda-lock.yml
environment-name: pudl-dev
cache-environment: true
condarc: |
channels:
- conda-forge
- defaults
channel_priority: strict
create-args: --category main dev docs test datasette

- name: Log environment details
run: |
Expand Down Expand Up @@ -78,13 +75,10 @@ jobs:
- name: Install Conda environment using mamba
uses: mamba-org/setup-micromamba@v1
with:
environment-file: test/test-environment.yml
environment-file: environments/conda-lock.yml
environment-name: pudl-dev
cache-environment: true
condarc: |
channels:
- conda-forge
- defaults
channel_priority: strict
create-args: --category main dev docs test datasette

- name: Log environment details
run: |
Expand Down Expand Up @@ -131,13 +125,10 @@ jobs:
- name: Install Conda environment using mamba
uses: mamba-org/setup-micromamba@v1
with:
environment-file: test/test-environment.yml
environment-file: environments/conda-lock.yml
environment-name: pudl-dev
cache-environment: true
condarc: |
channels:
- conda-forge
- defaults
channel_priority: strict
create-args: --category main dev docs test datasette

- name: Log environment details
run: |
Expand Down
48 changes: 34 additions & 14 deletions .github/workflows/update-lockfile.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,38 +3,56 @@ name: update-lockfile

on:
workflow_dispatch:
# schedule:
# At 5:28am UTC Monday and Thursday
# - cron: 28 5 * * MON,THU
#schedule:
# - cron: "0 9 * * 1-5" # Weekdays at 9AM UTC
#pull_request:
# paths:
# - "pyproject.toml"

jobs:
conda-lock:
# Don't run scheduled job on forks.
if: (github.event_name == 'schedule' && github.repository == 'catalyst-cooperative/pudl') || (github.event_name != 'schedule')
defaults:
run:
# Ensure the environment is activated
# <https://github.com/mamba-org/provision-with-micromamba#important>
shell: bash -l {0}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# If running on a schedule, run on dev.
# If running from workflow_dispatch, run on whatever the chosen branch/ref was.
# with:
# ref: dev
- name: Install Micromamba
uses: mamba-org/setup-micromamba@v1
with:
environment-file: environments/conda-lock.yml
environment-name: pudl-dev

- name: Install pudl from branch
run: pip install --editable "./[dev,docs,test,datasette]"
environment-name: conda-lock
create-args: >-
python=3.11
conda-lock
- name: Run conda-lock to recreate lockfile from scratch
run: |
rm environments/conda-lock.yml
cd environments
rm conda-lock.yml
conda-lock \
--file=environments/dev-environment.yml \
--file=pyproject.toml \
--lockfile=environments/conda-lock.yml
--micromamba \
--file=../pyproject.toml \
--lockfile=conda-lock.yml
conda-lock render \
--kind explicit \
--kind env \
--dev-dependencies \
--extras docs \
--extras datasette \
conda-lock.yml
conda-lock render \
--kind env \
--extras docs \
--platform linux-64 \
--filename-template "readthedocs-{platform}.conda.lock" \
conda-lock.yml
cd ..
- name: Open a pull request
uses: peter-evans/create-pull-request@v5
with:
Expand All @@ -55,3 +73,5 @@ jobs:
labels: dependencies, conda-lock
reviewers: zaneselvans
delete-branch: true
# base: dev
draft: true
9 changes: 3 additions & 6 deletions .github/workflows/zenodo-cache-sync.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,13 +47,10 @@ jobs:
- name: Install Conda environment using mamba
uses: mamba-org/setup-micromamba@v1
with:
environment-file: test/test-environment.yml
environment-file: environments/conda-lock.yml
environment-name: pudl-dev
cache-environment: true
condarc: |
channels:
- conda-forge
- defaults
channel_priority: strict
create-args: --category main dev docs test datasette

- name: Log environment details
run: |
Expand Down
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ codecov.sh
.env_pudl/
*wheel-metadata
dask-worker-space*
devtools/user-requirements.txt
devtools/user-environment.yml
environments/user-requirements.txt
environments/user-environment.yml
.vscode/*
commit.txt
devtools/profiles/
Expand Down
4 changes: 1 addition & 3 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ build:

# Define the python environment using conda / mamba
conda:
environment: docs/docs-environment.yml
environment: environments/readthedocs-linux-64.conda.lock.yml

# Build documentation in the docs/ directory with Sphinx
sphinx:
Expand All @@ -27,5 +27,3 @@ python:
install:
- method: pip
path: .
extra_requirements:
- doc
35 changes: 0 additions & 35 deletions devtools/environment.yml

This file was deleted.

16 changes: 0 additions & 16 deletions docker-compose.yml

This file was deleted.

47 changes: 20 additions & 27 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,29 +1,29 @@
FROM condaforge/mambaforge:23.3.1-1
FROM mambaorg/micromamba:1.5.1

USER root

SHELL [ "/bin/bash", "-exo", "pipefail", "-c" ]

# Install curl and js
# awscli requires unzip, less, groff and mandoc
# hadolint ignore=DL3008
RUN apt-get update && apt-get install --no-install-recommends -y curl jq unzip less groff mandoc \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN apt-get update && \
apt-get install --no-install-recommends -y git curl jq unzip less groff mandoc && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

# Configure gsutil authentication
# hadolint ignore=DL3059
RUN printf '[GoogleCompute]\nservice_account = default' > /etc/boto.cfg

# Install awscli2
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && unzip awscliv2.zip && ./aws/install

# Create a non-root user inside the container
# hadolint ignore=DL3059
RUN useradd -Ums /bin/bash catalyst

ENV CONTAINER_HOME=/home/catalyst
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && \
unzip awscliv2.zip && \
./aws/install

# Switch to being the catalyst user and go into the copied repo
USER catalyst
# Switch back to being non-root user and get into the home directory
USER $MAMBA_USER
ENV CONTAINER_HOME=/home/$MAMBA_USER
WORKDIR ${CONTAINER_HOME}

# Install flyctl
Expand All @@ -32,41 +32,34 @@ ENV PATH="${CONTAINER_HOME}/.fly/bin:$PATH"

ENV CONDA_PREFIX=${CONTAINER_HOME}/env
ENV PUDL_REPO=${CONTAINER_HOME}/pudl
ENV CONDA_RUN="conda run --no-capture-output --prefix ${CONDA_PREFIX}"
ENV PYTHON_VERSION="3.11"
ENV CONDA_RUN="micromamba run --prefix ${CONDA_PREFIX}"

ENV CONTAINER_PUDL_WORKSPACE=${CONTAINER_HOME}/pudl_work
ENV PUDL_INPUT=${CONTAINER_PUDL_WORKSPACE}/data
ENV PUDL_INPUT=${CONTAINER_PUDL_WORKSPACE}/input
ENV PUDL_OUTPUT=${CONTAINER_PUDL_WORKSPACE}/output
ENV DAGSTER_HOME=${CONTAINER_PUDL_WORKSPACE}/dagster_home

# Create data input/output directories
RUN mkdir -p ${PUDL_INPUT} ${PUDL_OUTPUT} ${DAGSTER_HOME}

# Copy dagster configuration file
COPY docker/dagster.yaml ${DAGSTER_HOME}/dagster.yaml

# Create a conda environment based on the specification in the repo
COPY test/test-environment.yml test/test-environment.yml
RUN mamba create --copy --prefix ${CONDA_PREFIX} --yes python=${PYTHON_VERSION} && \
# Then we can use mamba env update, which can parse the environment.yml file:
mamba env update --prefix ${CONDA_PREFIX} --file test/test-environment.yml && \
conda clean -afy


COPY environments/conda-lock.yml environments/conda-lock.yml
RUN micromamba create --prefix ${CONDA_PREFIX} --yes --category main dev docs test datasette --file environments/conda-lock.yml && \
micromamba clean -afy
# Copy the cloned pudl repository into the user's home directory
COPY --chown=catalyst:catalyst . ${CONTAINER_HOME}
COPY --chown=${MAMBA_USER}:${MAMBA_USER} . ${CONTAINER_HOME}

# TODO(rousik): The following is a workaround for sudden breakage where conda
# can't find libraries contained within the environment. It's unclear why!
ENV LD_LIBRARY_PATH=${CONDA_PREFIX}/lib
# We need information from .git to get version with setuptools_scm so we mount that
# directory without copying it into the image.
RUN --mount=type=bind,source=.git,target=${PUDL_REPO}/.git \
${CONDA_RUN} pip install --no-cache-dir -e './[dev,doc,test,datasette]' && \
${CONDA_RUN} pip install --no-cache-dir --editable . && \
# Run the PUDL setup script so we know where to read and write data
${CONDA_RUN} pudl_setup


# Run the unit tests:
CMD ["conda", "run", "--no-capture-output", "--prefix", "${CONDA_PREFIX}", "pytest", "test/unit"]
12 changes: 0 additions & 12 deletions docs/docs-environment.yml

This file was deleted.

Loading

0 comments on commit 95a855c

Please sign in to comment.