MNT: Use conda-lock lock file for reproducible environment #32

matthewfeickert · 2023-08-11T04:31:00Z

This PR does 3 things:

Define the uploader's high level dependencies in a conda environment.yml file.
Provide a Bash script (could have also been a noxfile but I thought that people might want to keep things less fancy and just go with something very straightforward) that uses the same Docker image that the Action's Dockerfile uses as a base image to install conda-lock and then create a conda-lock.yml lock file for the environment.yml which fully specifies the environment down to the hash level.
Using this lock file in the Action's Dockerfile have the Action install the specified locked environment efficiently and reproducibly.

Advantages of this method

Updating to a new release of anaconda-client can be tested through the CI tests with high confidence that no breaking changes will follow. anaconda-client has many dependencies, that are (correctly) not restricted, which means that specifying the anaconda-client version number alone is not enough as all these dependencies can float beneath it and in addition to potentially breaking any anaconda-client release in the future can be a security risk. The only way around this is through a hash level lock file which specifies the exact binary that should be installed.
- With no blame towards the Anaconda open source team, I have had to submit patches to conda-forge to unbreak anaconda-client before as its SemVer nature is not checked explicitly, and so securing reproducible environments for it is rather important.
The update is easy:
1. Update the environment.yml and then run bash lock.sh.
2. Commit the updated environment.yml and conda-lock.yml to version control and make a new PR.
3. If the upload tests pass then this new environment can be released to everyone.
Using a lock file means that no environment solve is done during the build. It was already done when the lock file was created, so to create the environment micromamba just needs to download each binary listed in the lock file. (Of course, as this environment is so simple this saves very little time, but it is true in general regardless of the size of the environment.)
This (hopefully!) means that the SemVer nature of this project's versioning can be leaned into and making new releases is viewed as a trivially verifiable and fast chore.

I'd like to make release 0.2.0 after this is merged.

matthewfeickert · 2023-08-11T04:35:02Z

lock.sh

+#!/bin/bash
+
+# This should be the base image the Docker image built by the GitHub Action uses
+BUILD_IMAGE="mambaorg/micromamba:1.4.9-bullseye-slim"
+docker pull "${BUILD_IMAGE}"
+
+# Don't need to ensure emulation possible on non-x86_64 platforms at the
+# `docker run` level as the platform is specified in the conda-lock command.
+docker run \
+    --rm \
+    --volume "${PWD}":/work \
+    --workdir /work \
+    "${BUILD_IMAGE}" \
+    /bin/bash -c "\
+        micromamba install --yes --channel conda-forge conda-lock && \
+        conda-lock lock --micromamba --platform linux-64 --file environment.yml"


For maintainability I would like to know if this Bash script is clear enough as to what is happening and why (hopefully the name lock.sh helps it a bit as well).

matthewfeickert · 2023-08-11T04:39:54Z

Dockerfile

+# lock file created by running lock.sh in the top level of the repository
+COPY --chown=mambauser conda-lock.yml /conda-lock.yml


The lock file is now under version control so we have it accessible to us at the Docker image build time to be able to COPY in.

matthewfeickert · 2023-08-11T04:45:53Z

lock.sh

+    "${BUILD_IMAGE}" \
+    /bin/bash -c "\
+        micromamba install --yes --channel conda-forge conda-lock && \
+        conda-lock lock --micromamba --platform linux-64 --file environment.yml"


The default user for mambaorg/micromamba:1.4.9-bullseye-slim is a non-root user with id 1000 so the generated conda-lock.yml is under the ownership of the user who ran the lock.sh script. So you don't have to mess around with getting a file owned by root and then needing to do permission and ownership changes on it.

bsipocz · 2023-08-15T01:39:32Z

Do you, or anyone else know what this means on a PR? Is the action configured improperly?

Also, we haven't stated preferred dev workflow, but I always went with the assumption of fork's feature branch+PR is the preferred way rather than keeping feature branches on the main repo. Could we do that in the future?

matthewfeickert · 2023-08-15T04:06:40Z

Do you, or anyone else know what this means on a PR? Is the action configured improperly?

Things are working as intended. "deployed to remove-old-wheels" means that the remove-old-wheels environment (added in PR #12) was used in the workflow.

This bit that uses the environment

upload-nightly-action/.github/workflows/ci.yml

Lines 56 to 62 in 2b318b6

    
           cleanup: 
        
             runs-on: ubuntu-latest 
        
             needs: [test] 
        
             # Set required workflow secrets in the environment for additional security 
        
             # https://github.com/scientific-python/upload-nightly-action/settings/environments 
        
             environment: 
        
               name: remove-old-wheels

was added in PR #27 because you asked for the test upload to be removed in the same workflow it was uploaded in (#27 (comment)).

I always went with the assumption of fork's feature branch+PR is the preferred way rather than keeping feature branches on the main repo. Could we do that in the future?

No, as the CI needs to run

upload-nightly-action/.github/workflows/ci.yml

Lines 17 to 21 in 2b318b6

    
           jobs: 
        
             test: 
        
               name: "test upload via action" 
        
               runs-on: ubuntu-latest 
        
               if: github.repository == 'scientific-python/upload-nightly-action'

and the CI requires secrets that are repo/org specific.

upload-nightly-action/.github/workflows/ci.yml

Line 54 in 2b318b6

anaconda_nightly_upload_token: ${{ secrets.UPLOAD_TOKEN }}

upload-nightly-action/.github/workflows/ci.yml

Line 76 in 2b318b6

anaconda --token ${{ secrets.ANACONDA_TOKEN }} remove \

There is pull_request_target triggers but that also introduces security threats and so should be avoided unless there is a very good reason to use otherwise.

I see no problems in having (short lived dev) branches other than main exist.

matthewfeickert · 2023-10-04T16:06:17Z

This PR would have prevented Issue #43 from affecting users. @Carreau can you review?

(Relocked and bumped to reshow that this works.)

* Add lock.sh which uses the same Docker image used for deployment to install conda-lock and then build the conda-lock lock file from the high level environment.yml.

* Add high level environment.yml and use conda-lock v2.x to create a hash level conda-lock.yml lock file of the environment. ``` conda-lock lock --micromamba --platform linux-64 --file environment.yml ``` * COPY conda-lock.yml lock file to Dockerfile during build. * Use the conda-lock.yml lock file to create an upload-nightly-action micromamba environment and activate it to have access to anaconda-client.

MridulS

This looks good, thanks @matthewfeickert!

We have 2 reviews, not sure if we should wait for more?
If not I can merge this and maybe we can do a 0.2.0 release?

tupui · 2023-10-04T19:09:34Z

Just adding a note here. If we start to use lock files, we need to be careful when integrating changes. There have been cases of supply chain attack using lock files. It's very easy to miss a change in a lock file as these are usually very large and people rarely check the diffs.

bsipocz · 2023-10-04T19:17:50Z

Does it have to be a full on conda lock, cannot we simply stay within pip land and use a much smaller file there?

matthewfeickert · 2023-10-04T19:25:12Z

Just adding a note here. If we start to use lock files, we need to be careful when integrating changes. There have been cases of supply chain attack using lock files. It's very easy to miss a change in a lock file as these are usually very large and people rarely check the diffs.

@tupui Can you elaborate? If you mean by injection of malicious software by manually editing the lock file then this is checked in two ways:

Only allowing incremental updates (with conda-lock --update <target package>) and not entire re-locks by outside contributors.
You can also check the file by checking out the PR branch and rerunning lock.sh. The diff should immeditaltey show if there is anything suspicious.

Does it have to be a full on conda lock, cannot we simply stay within pip land and use a much smaller file there?

Anaconda does not support anaconda-client on PyPI, so no. pip-compile is not going to be singificanly smaller though, so that wouldn't help anyway.

MridulS · 2023-10-04T19:31:43Z

Just adding a note here. If we start to use lock files, we need to be careful when integrating changes. There have been cases of supply chain attack using lock files. It's very easy to miss a change in a lock file as these are usually very large and people rarely check the diffs.

We can also just make github run the lock.sh script and ask it to resolve and generate the lock file, but it would probably be a bit overkill. I don't really think there would be too much activity/changes in lock file here.

matthewfeickert · 2023-10-04T19:33:45Z

If not I can merge this and maybe we can do a 0.2.0 release?

@MridulS, yes once @tupui and @bsipocz respond (assuming they don't have any further concerns) then I would very much like to get this merged and to get a v0.2.0 out.

matthewfeickert · 2023-10-06T15:12:00Z

Gentle ping to @tupui and @bsipocz (before it gets to be Friday night in Europe). If there are no further comments by end of the day US time I'm going to merge this as we already have 2 approving reviews from @Carreau and @MridulS. If there are additional comments later we can always iterate in a new Issue.

tupui · 2023-10-06T15:13:00Z

Sorry all good 👍 I just wanted to bring that up.

bsipocz · 2023-10-06T16:19:11Z

It's an ok from me too. But would advise against a Friday night release.

matthewfeickert · 2023-10-06T16:23:20Z

But would advise against a Friday night release.

This is a fix for a currently broken procees.

matthewfeickert added the enhancement New feature or request label Aug 11, 2023

matthewfeickert added this to the 0.2.0 milestone Aug 11, 2023

matthewfeickert self-assigned this Aug 11, 2023

matthewfeickert temporarily deployed to remove-old-wheels August 11, 2023 04:32 — with GitHub Actions Inactive

matthewfeickert requested review from stefanv and bsipocz August 11, 2023 04:33

matthewfeickert commented Aug 11, 2023

View reviewed changes

matthewfeickert force-pushed the mnt/use-conda-lock branch from 337b9e7 to c4b9a4f Compare August 15, 2023 01:30

matthewfeickert temporarily deployed to remove-old-wheels August 15, 2023 01:31 — with GitHub Actions Inactive

matthewfeickert mentioned this pull request Aug 15, 2023

ENH: adding list of packages to exclude from cleanup #36

Merged

matthewfeickert requested review from jarrodmillman and Carreau August 26, 2023 00:55

matthewfeickert mentioned this pull request Oct 4, 2023

Action breaks with python 3.12 - clyent/anaconda-client #43

Closed

matthewfeickert requested a review from MridulS October 4, 2023 16:12

matthewfeickert added 2 commits October 4, 2023 11:19

MNT: Add lock file builder shell script

ce02075

* Add lock.sh which uses the same Docker image used for deployment to install conda-lock and then build the conda-lock lock file from the high level environment.yml.

matthewfeickert force-pushed the mnt/use-conda-lock branch from c4b9a4f to 08ffdeb Compare October 4, 2023 16:26

matthewfeickert temporarily deployed to remove-old-wheels October 4, 2023 16:28 — with GitHub Actions Inactive

Carreau approved these changes Oct 4, 2023

View reviewed changes

MridulS approved these changes Oct 4, 2023

View reviewed changes

matthewfeickert requested a review from tupui October 4, 2023 19:37

jarrodmillman approved these changes Oct 6, 2023

View reviewed changes

matthewfeickert merged commit 5fb764c into main Oct 6, 2023
2 checks passed

matthewfeickert deleted the mnt/use-conda-lock branch October 6, 2023 16:24

matthewfeickert mentioned this pull request Jan 23, 2024

ENH: Add optional anaconda_nightly_upload_organization argument #47

Merged

matthewfeickert mentioned this pull request Jun 6, 2024

Using hashes for all actions #81

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNT: Use conda-lock lock file for reproducible environment #32

MNT: Use conda-lock lock file for reproducible environment #32

matthewfeickert commented Aug 11, 2023 •

edited

Loading

matthewfeickert Aug 11, 2023

matthewfeickert Aug 11, 2023

matthewfeickert Aug 11, 2023

bsipocz commented Aug 15, 2023

matthewfeickert commented Aug 15, 2023 •

edited

Loading

matthewfeickert commented Oct 4, 2023 •

edited

Loading

MridulS left a comment

tupui commented Oct 4, 2023

bsipocz commented Oct 4, 2023

matthewfeickert commented Oct 4, 2023 •

edited

Loading

MridulS commented Oct 4, 2023

matthewfeickert commented Oct 4, 2023

matthewfeickert commented Oct 6, 2023 •

edited

Loading

tupui commented Oct 6, 2023

bsipocz commented Oct 6, 2023

matthewfeickert commented Oct 6, 2023

		# lock file created by running lock.sh in the top level of the repository
		COPY --chown=mambauser conda-lock.yml /conda-lock.yml

MNT: Use conda-lock lock file for reproducible environment #32

MNT: Use conda-lock lock file for reproducible environment #32

Conversation

matthewfeickert commented Aug 11, 2023 • edited Loading

Advantages of this method

matthewfeickert Aug 11, 2023

Choose a reason for hiding this comment

matthewfeickert Aug 11, 2023

Choose a reason for hiding this comment

matthewfeickert Aug 11, 2023

Choose a reason for hiding this comment

bsipocz commented Aug 15, 2023

matthewfeickert commented Aug 15, 2023 • edited Loading

matthewfeickert commented Oct 4, 2023 • edited Loading

MridulS left a comment

Choose a reason for hiding this comment

tupui commented Oct 4, 2023

bsipocz commented Oct 4, 2023

matthewfeickert commented Oct 4, 2023 • edited Loading

MridulS commented Oct 4, 2023

matthewfeickert commented Oct 4, 2023

matthewfeickert commented Oct 6, 2023 • edited Loading

tupui commented Oct 6, 2023

bsipocz commented Oct 6, 2023

matthewfeickert commented Oct 6, 2023

matthewfeickert commented Aug 11, 2023 •

edited

Loading

matthewfeickert commented Aug 15, 2023 •

edited

Loading

matthewfeickert commented Oct 4, 2023 •

edited

Loading

matthewfeickert commented Oct 4, 2023 •

edited

Loading

matthewfeickert commented Oct 6, 2023 •

edited

Loading