Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Builds are not (fully) reproducible due to file permissions stored in .whl #362

Open
miccoli opened this issue Aug 7, 2020 · 7 comments
Labels
needs-discussion Needs broader discussion / PyPA consensus

Comments

@miccoli
Copy link

miccoli commented Aug 7, 2020

In order to obtain a fully reproducible build one has to build the wheel with the same umask.

Here how to reproduce the issue; tested with wheel 0.34.2, using pypa/sampleproject as an example.

$ export SOURCE_DATE_EPOCH=$(git log -n 1 --pretty=%ct)
$ echo $SOURCE_DATE_EPOCH 
1593523015
$ umask 022
$ python setup.py --quiet bdist_wheel
$ sha3-512sum -N 32 dist/sampleproject-2.0.0-py3-none-any.whl 
A64A8921  dist/sampleproject-2.0.0-py3-none-any.whl

but changing the umask I have

$ umask 000
$ python setup.py --quiet bdist_wheel
$ sha3-512sum -N 32 dist/sampleproject-2.0.0-py3-none-any.whl 
FED67824  dist/sampleproject-2.0.0-py3-none-any.whl

This is due to the fact that file permissions, stored in the .whl file, are affected by the umask at build time.

$ zipinfo dist/sampleproject-2.0.0-py3-none-any.whl
Archive:  dist/sampleproject-2.0.0-py3-none-any.whl
Zip file size: 4208 bytes, number of entries: 10
-rw-rw-rw-  2.0 unx      111 b- defN 20-Jun-30 13:16 sample/__init__.py
-rw-r--r--  2.0 unx        9 b- defN 20-Jun-30 13:16 sample/package_data.dat
-rw-rw-rw-  2.0 unx       43 b- defN 20-Jun-30 13:16 sample/simple.py
-rw-r--r--  2.0 unx        9 b- defN 20-Jun-30 13:16 sampleproject-2.0.0.data/data/my_data/data_file
-rw-r--r--  2.0 unx     1081 b- defN 20-Jun-30 13:16 sampleproject-2.0.0.dist-info/LICENSE.txt
-rw-rw-rw-  2.0 unx     3043 b- defN 20-Jun-30 13:16 sampleproject-2.0.0.dist-info/METADATA
-rw-rw-rw-  2.0 unx       92 b- defN 20-Jun-30 13:16 sampleproject-2.0.0.dist-info/WHEEL
-rw-rw-rw-  2.0 unx       40 b- defN 20-Jun-30 13:16 sampleproject-2.0.0.dist-info/entry_points.txt
-rw-rw-rw-  2.0 unx        7 b- defN 20-Jun-30 13:16 sampleproject-2.0.0.dist-info/top_level.txt
?rw-rw-r--  2.0 unx      843 b- defN 20-Jun-30 13:16 sampleproject-2.0.0.dist-info/RECORD
10 files, 5278 bytes uncompressed, 2740 bytes compressed:  48.1%

I think that file permissions should be normalised in the .whl file, and not dependant on the build environment. This same approach is implemented in flit, since v. 0.12

@agronholm
Copy link
Contributor

I think that file permissions should be normalised in the .whl file, and not dependant on the build environment

I fully agree, and I've raised this point in discussions with other PyPA people. So far there have been no objections.

@agronholm agronholm added the needs-discussion Needs broader discussion / PyPA consensus label Aug 10, 2021
@nanonyme
Copy link

Isn't it a reasonable assumption that if you want reproducible build system, you set umask in it?

@miccoli
Copy link
Author

miccoli commented Mar 14, 2023

Isn't it a reasonable assumption that if you want reproducible build system, you set umask in it?

May be, but umask settings is a little bit of extra information that you have to explicitly set in order to reproduce a build.
Other build systems (flit, hatch) do normalize permission in the wheel file, and I do not see any reason against permission normalization.

May be this can be implemented in a new major release, if there are concerns about back compatibility.

For sure this is not a critical topic, it is just about which is the best practice when constructing .whl archives.

@nanonyme
Copy link

nanonyme commented Mar 14, 2023

What about executable permission? It is apparently expected wheel will pick up that from workspace. Normalisation can't simply ignore file permissions.

@miccoli
Copy link
Author

miccoli commented Mar 14, 2023

What about executable permission?

I did not define what exactly "normalizing file permissions" means, but I'm not advocating just nuking all permissions. Maybe a sensible approach would be to mimic git and use

  • octal 755 for all executable files
  • octal 644 otherwise
    ...

@agronholm
Copy link
Contributor

That would not be very reproduceable on Windows.

@Olindholm
Copy link

For a long time I've been trying to create reproducible builds (wheels). I can't say why I didn't find any information about it earlier. But today I did find out about the SOURCE_DATE_EPOCH environment variable. Amazing I thought! Finally building the same thing twice resulted in the same hash.

However, I ran into this problem not soon after when I started trying to build inside docker images. I was fumbling around quite some time before I figured out that it was a permissions discrepancy inside zip files (wheels) between what I build on my host, and what I build inside docker images. A real pain to be honest.

But anyway, seems i got it working really nice now. Inside my setup.py I have the following:

# Set build date and umask (makes wheels reprocudible)
os.environ["SOURCE_DATE_EPOCH"] = "315532800"  # 1980-01-01 00:00 UTC
os.umask(0o022)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-discussion Needs broader discussion / PyPA consensus
Projects
None yet
Development

No branches or pull requests

4 participants