Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Library builders do not verify integrity of downloaded files #308

Open
wiktorn opened this issue Feb 17, 2020 · 9 comments · May be fixed by #312
Open

Library builders do not verify integrity of downloaded files #308

wiktorn opened this issue Feb 17, 2020 · 9 comments · May be fixed by #312

Comments

@wiktorn
Copy link

wiktorn commented Feb 17, 2020

Currently functions in library_builders.sh do not verify integrity of downloaded files.

The good practice would be to include SHA256 hashes of the files among the versions and verify the checksum and fail the build on mismatch.

This can be achieved by adding new optional argument to fetch_unpack in common_utils.sh, which would take sha256 sum to verify against downloaded file.

Is this something, you'd like to see PR on?

Originally reported as osmcode/pyosmium-wheel-build#2, but as we move towards manylinux2010 the only "unsafe" downloads will be those done by multibuild.

@matthew-brett
Copy link
Collaborator

Yes, sure, a PR would be good. But how will the hash get passed to fetch_unpack?

@wiktorn
Copy link
Author

wiktorn commented Feb 17, 2020

I was planning adding arguments up the chain - in build_simple and build_* functions for specific libraries. And for all versions referenced in beginning of the file would be extended with hash values.

Though, going through the code now I notice, that for openssl there is already a sha256 check after call to fetch_unpack.

I'm leaning towards my primary plan (verification in fetch_unpack) but maybe I should follow openssl way?

@matthew-brett
Copy link
Collaborator

matthew-brett commented Feb 17, 2020 via email

@wiktorn
Copy link
Author

wiktorn commented Feb 17, 2020

Just for your consideration, any other library (once maliciously rigged) may export the same symbols that openssl exports and if it's loaded before openssl, it will be it's function called and not openssl's. So guarding openssl works as long as openssl is the only library that you link against in your wheel.

As I understand, your concern is, that it might be difficult to have two optional arguments in fetch_unpack and have the possibility to provide only one of them.

Maybe the solution will be just to add separate function fetch_unpack_check that will do also hash checking. I'll play a bit and come back with possible approaches here.

@matthew-brett
Copy link
Collaborator

matthew-brett commented Feb 17, 2020 via email

@wiktorn
Copy link
Author

wiktorn commented Feb 18, 2020

I misunderstood because openssl download is now verified against the hash, so I thought you meant openssl built within wheels.

I did a bit of research and:

  1. hashtable is not possible as macOS is stuck with bash3 due to licensing
  2. indirect variables ("${!var_name}") like ZLIB_HASH=... looks reasonable, if it would be used within build_simple and not within fetch_unpack, where we have URL and filename. <filename>_HASH=... will be for most cases improper variable name
  3. create a file, e.g. archives/sha256.sums which would contain all hashes for all filenames. To check them, grep to separate file is necessary, as Centos 6's sha256sum doesn't support --ignore-missing option

As for now, I prefer option #2 above and to make it work, I would split fetch_unpack to two functions (fetch returning $out_archive and unpack), and call check_sha256sum in between them. There it will be easy to use specific variable or use indirect expansion, to get proper variable to get hash. This will also provide same interface to change version and hash value for wheel builders.

@matthew-brett
Copy link
Collaborator

Sorry - just checking - for option 2 - you are proposing that - say - in build_bzip - that after fetching the bzip archive, the code should check for a variable BZIP_HASH, which, if present would contain the correct hash for the downloaded file. The user would have to define BZIP_HASH ZLIB_HASH and so on for all packages they wanted to check, and create the hashes corresponding to the versions they plan to download.

@wiktorn wiktorn linked a pull request Feb 22, 2020 that will close this issue
@wiktorn
Copy link
Author

wiktorn commented Feb 22, 2020

Yes, that's my proposal. I've submitted a PR as it may be easier, what exactly I'm proposing

@native-api
Copy link
Contributor

native-api commented Oct 30, 2020

For a previous implementation that does this functionality, you can take a look at fetch_tarball() in Python-build. It consumes a URL that's optionally suffixed with #<hash>, either MD5 or SHA256. Quite a convenient way to specify checksums together with URLs if you ask me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants