Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hash checking against RECORD #888

Open
charliermarsh opened this issue Jan 11, 2024 · 4 comments
Open

Support hash checking against RECORD #888

charliermarsh opened this issue Jan 11, 2024 · 4 comments
Labels
enhancement New feature or request security wish Not on the immediate roadmap

Comments

@charliermarsh
Copy link
Member

We should validate the hash of each individual file in the wheel against the hash recorded in RECORD. (This is distinct from the hash-checking mode described in #131 and #474.)

@konstin
Copy link
Member

konstin commented Jan 11, 2024

During installation or for an existing venv?

install-wheel-rs can do during installation, but it's turned off by default for perf reason (sha256 is slow) and because pip also didn't validate last time i checked.

For an existing venv, https://github.com/konstin/poc-monotrail/blob/main/crates/monotrail/src/verify_installation.rs implements this.

@charliermarsh
Copy link
Member Author

For us, installation is linking, so it should really happen when unzipping into the cache.

@charliermarsh
Copy link
Member Author

Or we could do it after-the-fact as part of our venv validation...

@charliermarsh charliermarsh added enhancement New feature or request wish Not on the immediate roadmap labels Jan 11, 2024
@charliermarsh charliermarsh added wish Not on the immediate roadmap and removed wish Not on the immediate roadmap labels Jan 11, 2024
@vors
Copy link

vors commented Aug 13, 2024

Related topic (let me know if I should create a separate issue):

Currently most python packaging tooling doesn't have any kind of guardrails against the clobbering of files in one whl from another whl. This is basically undefined behavior -- depending on the order the final venv will be different. There are packages in the wild that include things like empty __init__.py files that do clobber each other and it's not a problem in practice (most notably google cloud family of packages). There are also cases when people put README.md into the top-level folder so it ends up in site-packages/README.md and conflicts.

@hauntsaninja proposed that we could do a hash check against the RECORD file entry and maybe have a flag to fail loudly in case of violation -- that should simultaneously allow clobbering of empty __init__.py files and prevent undefined behaviors.

What do you think about this idea? Maybe there is a better way to achieve more strict semantic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request security wish Not on the immediate roadmap
Projects
None yet
Development

No branches or pull requests

4 participants