Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up reproducible Python environments with conda-lock #2968

Merged
merged 8 commits into from
Nov 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/bot-auto-merge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ name: bot-auto-merge
on:
workflow_run:
types: [completed]
workflows: ["tox-pytest"]
workflows: ["pytest"]

jobs:
bot-auto-merge:
Expand Down
11 changes: 6 additions & 5 deletions .github/workflows/build-deploy-pudl.yml
Original file line number Diff line number Diff line change
Expand Up @@ -94,11 +94,12 @@ jobs:
gcloud compute instances update-container "$GCE_INSTANCE" \
--zone "$GCE_INSTANCE_ZONE" \
--container-image "docker.io/catalystcoop/pudl-etl:${{ env.GITHUB_REF }}" \
--container-command "conda" \
--container-command "micromamba" \
--container-arg="run" \
--container-arg="--no-capture-output" \
--container-arg="-p" \
--container-arg="/home/catalyst/env" \
--container-arg="--prefix" \
--container-arg="/home/mambauser/env" \
--container-arg="--attach" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conda swallows all of the logs if you don't include --no-capture-output but it sounds like --attach "" should work:

Attach to stdin, stdout and/or stderr. -a "" for disabling stream redirection

Maybe we run part of a full build to make sure the logs are coming through before we merge into dev?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did run one full build, I think it should be in the build outputs if you want to take a look at the logs and see if they look like you'd expect them to. I think using pytest-xdist does obscure some of the test logging, but also shaves a couple of hours off the nightly builds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the output from the parallelized tests if you want to take a look:

gcloud storage cp gs://nightly-build-outputs.catalyst.coop/ece863d762c9a8790c7baff5397a0c6ee30609bb-conda-lockfile/pudl-etl.log .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These logs look good to me!

--container-arg='' \
--container-arg="bash" \
--container-arg="./docker/gcp_pudl_etl.sh" \
--container-env-file="./docker/.env" \
Expand All @@ -116,8 +117,8 @@ jobs:
--container-env DAGSTER_PG_PASSWORD="$DAGSTER_PG_PASSWORD" \
--container-env DAGSTER_PG_HOST="104.154.182.24" \
--container-env DAGSTER_PG_DB="dagster-storage" \
--container-env PUDL_SETTINGS_YML="/home/catalyst/src/pudl/package_data/settings/etl_full.yml" \
--container-env FLY_ACCESS_TOKEN=${{ secrets.FLY_ACCESS_TOKEN }} \
--container-env PUDL_SETTINGS_YML="/home/mambauser/src/pudl/package_data/settings/etl_full.yml" \
# Start the VM
- name: Start the deploy-pudl-vm
Expand Down
59 changes: 23 additions & 36 deletions .github/workflows/tox-pytest.yml → .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
name: tox-pytest
name: pytest

on:
pull_request:
Expand All @@ -10,19 +10,15 @@ on:
- ready_for_review

env:
PUDL_OUTPUT: /home/runner/pudl-work/output
PUDL_INPUT: /home/runner/pudl-work/data/
PUDL_OUTPUT: /home/runner/pudl-work/output/
PUDL_INPUT: /home/runner/pudl-work/input/
DAGSTER_HOME: /home/runner/pudl-work/dagster_home/

jobs:
ci-static:
ci-docs:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
tox-env:
- linters
- docs
defaults:
run:
shell: bash -l {0}
Expand All @@ -32,16 +28,12 @@ jobs:
with:
fetch-depth: 2

- name: Install Conda environment using mamba
- name: Install conda-lock environment with micromamba
uses: mamba-org/setup-micromamba@v1
with:
environment-file: test/test-environment.yml
environment-file: environments/conda-lock.yml
environment-name: pudl-dev
cache-environment: true
condarc: |
channels:
- conda-forge
- defaults
channel_priority: strict

- name: Log environment details
run: |
Expand All @@ -51,13 +43,13 @@ jobs:
conda config --show
printenv | sort
- name: Build ${{ matrix.tox-env}} with Tox
- name: Lint and build PUDL documentation with Sphinx
run: |
tox -e ${{ matrix.tox-env }}
pip install --no-deps --editable .
make docs-build
- name: Upload coverage
uses: actions/upload-artifact@v3
if: ${{ matrix.tox-env == 'docs' }}
with:
name: coverage-docs
path: coverage.xml
Expand All @@ -75,16 +67,12 @@ jobs:
with:
fetch-depth: 2

- name: Install Conda environment using mamba
- name: Install conda-lock environment with micromamba
uses: mamba-org/setup-micromamba@v1
with:
environment-file: test/test-environment.yml
environment-file: environments/conda-lock.yml
environment-name: pudl-dev
cache-environment: true
condarc: |
channels:
- conda-forge
- defaults
channel_priority: strict

- name: Log environment details
run: |
Expand All @@ -99,9 +87,10 @@ jobs:
which sqlite3
sqlite3 --version
- name: Run unit tests with Tox
- name: Run PUDL unit tests and collect test coverage
run: |
tox -e unit -- --durations 0
pip install --no-deps --editable .
make pytest-unit
- name: Upload coverage
uses: actions/upload-artifact@v3
Expand Down Expand Up @@ -131,13 +120,9 @@ jobs:
- name: Install Conda environment using mamba
uses: mamba-org/setup-micromamba@v1
with:
environment-file: test/test-environment.yml
environment-file: environments/conda-lock.yml
environment-name: pudl-dev
cache-environment: true
condarc: |
channels:
- conda-forge
- defaults
channel_priority: strict

- name: Log environment details
run: |
Expand Down Expand Up @@ -180,7 +165,8 @@ jobs:

- name: Run integration tests, trying to use GCS cache if possible
run: |
tox -e integration -- --gcs-cache-path=gs://zenodo-cache.catalyst.coop --durations 0
pip install --no-deps --editable .
make pytest-integration
- name: Upload coverage
uses: actions/upload-artifact@v3
Expand All @@ -194,13 +180,13 @@ jobs:
ci-coverage:
runs-on: ubuntu-latest
needs:
- ci-docs
- ci-unit
- ci-integration
- ci-static
steps:
- uses: actions/checkout@v4
- name: Download coverage
id: download-unit
id: download-coverage
uses: actions/download-artifact@v3
with:
path: coverage
Expand All @@ -216,6 +202,7 @@ jobs:
runs-on: ubuntu-latest
if: ${{ always() }}
needs:
- ci-docs
- ci-unit
- ci-integration
steps:
Expand Down
84 changes: 84 additions & 0 deletions .github/workflows/update-conda-lockfile.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
name: update-conda-lockfile

on:
workflow_dispatch:
schedule:
- cron: "0 9 * * 1-5" # Weekdays at 9AM UTC
push:
paths:
- "pyproject.toml"
- "environments/*"
- ".github/workflows/update-conda-lockfile.yml"

# What branch does this action run on?
# - workflow_dispatch: Whatever branch it was run against.
# - schedule: Always the same branch (will be dev or main)
# - push: Base branch of the PR.

jobs:
update-conda-lockfile:
runs-on: ubuntu-latest
if: ${{ (github.event_name == 'push' && github.actor != 'pudlbot') || (github.event_name == 'schedule' && github.repository == 'catalyst-cooperative/pudl') || (github.event_name == 'workflow_dispatch') }}
defaults:
run:
shell: bash -l {0}
steps:
- name: Set GITHUB_REF for use with workflow_dispatch
if: ${{ (github.event_name == 'workflow_dispatch') }}
run: |
echo "GITHUB_REF="${{ github.ref_name }} >> $GITHUB_ENV
- name: Set GITHUB_REF for use with schedule
if: ${{ (github.event_name == 'schedule') }}
run: |
echo "GITHUB_REF=dev" >> $GITHUB_ENV
- name: Set GITHUB_REF for use with push
if: ${{ (github.event_name == 'push') }}
run: |
echo "GITHUB_REF="${{ github.ref_name }} >> $GITHUB_ENV
Comment on lines +36 to +38
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're setting GITHUB_REF = github.ref_name when github.event_name == "push" or "workflow_dispatch". Why not handle it in one action step?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They were different at some point while I was developing the setup, and it felt more legible to enumerate the 3 cases we're trying to cover. But I could go either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! I'm down to keep it as is if it feels more legible.

- name: Log final value of GITHUB_REF
run: |
echo "Final GITHUB_REF:" ${{ env.GITHUB_REF }}
- uses: actions/checkout@v4
with:
token: ${{ secrets.PUDL_BOT_PAT }}
ref: ${{ env.GITHUB_REF }}
- name: Install Micromamba
uses: mamba-org/setup-micromamba@v1
with:
environment-name: conda-lock
create-args: >-
python=3.11
conda-lock
prettier
- name: Run conda-lock to recreate lockfile from scratch
run: |
make conda-clean
make conda-lock.yml
- name: Commit updated conda lockfiles to branch
# If running on push due to dependency changes, commit directly to the base
# branch of the existing PR. Don't trigger the workflow again if we're already
# running it as pudlbot (to avoid infinite recursion).
if: ${{ (github.event_name == 'push' && github.actor != 'pudlbot') }}
uses: stefanzweifel/git-auto-commit-action@v5
with:
file_pattern: "environments/*"
commit_message: "Update conda-lock.yml and rendered conda environment files."
Comment on lines +59 to +66
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's stopping the action from running again on a push?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second time, the action is being run by pudlbot. This felt a little hacky, but it worked. Is there a more elegant way to do it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the docs for the push trigger again and it don't look like there is a way to filter based on who triggered the push.

Would it make sense test this condition and abort earlier in the workflow?

- name: Make a PR to merge updated conda lockfiles
# If we are relocking dependencies on a schedule or workflow_dispatch, we need
# to make our own PR to check whether the updated environment actually solves
# and the tests pass.
if: ${{ (github.event_name == 'schedule' && github.repository == 'catalyst-cooperative/pudl') || (github.event_name == 'workflow_dispatch') }}
uses: peter-evans/create-pull-request@v5
with:
commit-message: "Update conda-lock.yml and rendered conda environment files."
title: Update Lockfile
body: >
This pull request relocks the dependencies with conda-lock.
It is triggered by [update-conda-lockfile](https://github.com/catalyst-cooperative/pudl/blob/main/.github/workflows/update-conda-lockfile.yml).
labels: dependencies, conda-lock
reviewers: zaneselvans
zaneselvans marked this conversation as resolved.
Show resolved Hide resolved
branch: update-conda-lockfile
base: ${{ env.GITHUB_REF }}
draft: true
delete-branch: true
57 changes: 0 additions & 57 deletions .github/workflows/update-lockfile.yml

This file was deleted.

11 changes: 4 additions & 7 deletions .github/workflows/zenodo-cache-sync.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ env:
PUBLIC_ZENODO_CACHE_BUCKET: gs://zenodo-cache.catalyst.coop
GITHUB_REF: ${{ github.ref_name }} # This is changed to dev if running on a schedule
PUDL_OUTPUT: ~/pudl-work/output
PUDL_INPUT: ~/pudl-work/data/
PUDL_INPUT: ~/pudl-work/input/

jobs:
zenodo-cache-sync:
Expand Down Expand Up @@ -47,13 +47,10 @@ jobs:
- name: Install Conda environment using mamba
uses: mamba-org/setup-micromamba@v1
with:
environment-file: test/test-environment.yml
environment-file: environments/conda-lock.yml
environment-name: pudl-dev
cache-environment: true
condarc: |
channels:
- conda-forge
- defaults
channel_priority: strict
create-args: --category main dev docs test datasette

- name: Log environment details
run: |
Expand Down
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ codecov.sh
.env_pudl/
*wheel-metadata
dask-worker-space*
devtools/user-requirements.txt
devtools/user-environment.yml
environments/user-requirements.txt
environments/user-environment.yml
.vscode/*
commit.txt
devtools/profiles/
Expand Down
6 changes: 2 additions & 4 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ version: 2
build:
os: ubuntu-22.04
tools:
python: mambaforge-4.10
python: mambaforge-22.9

# Define the python environment using conda / mamba
conda:
environment: docs/docs-environment.yml
environment: environments/conda-linux-64.lock.yml

# Build documentation in the docs/ directory with Sphinx
sphinx:
Expand All @@ -27,5 +27,3 @@ python:
install:
- method: pip
path: .
extra_requirements:
- doc
Loading
Loading