Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use deterministic time for generated sdist files #142

Merged
merged 1 commit into from
Apr 19, 2021

Conversation

abn
Copy link
Member

@abn abn commented Mar 22, 2021

When generating setup.py and PKG-INFO files, ensure that generated
files use a deterministic timestamp to enhance reproducibility of
source distributions.

Resolves: python-poetry/poetry#1102

@abn abn requested a review from a team March 22, 2021 00:56
When generating setup.py and PKG-INFO files, ensure that generated
files use a deterministic timestamp to enhance reproducibility of
source distributions.
@abn abn force-pushed the deterministic-sdists branch from 7bca967 to ee3989e Compare March 22, 2021 01:05
@Natureshadow
Copy link

Natureshadow commented Mar 22, 2021

I propose that this code should use SOURCE_DATE_EPOCH from environment if set.

cf. https://reproducible-builds.org/docs/source-date-epoch/

(Maybe setting this should even be required, and the timestamp left unchanged if the variable is not set.)

Copy link

@Natureshadow Natureshadow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested earlier, I would propose to use SOURCE_DATE_EPOCH.

Defaulting to 0 if it is not set might be ok, I am not sure of that. I somehow tend to leaving the current timestamp as default.

Comment on lines +99 to +100
tar_info.mtime = 0
tar_info = self.clean_tarinfo(tar_info)
Copy link

@Natureshadow Natureshadow Mar 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tar_info.mtime = 0
tar_info = self.clean_tarinfo(tar_info)
tar_info.mtime = int(os.environ.get("SOURCE_DATE_EPOCH", 0))
tar_info = self.clean_tarinfo(tar_info)

Comment on lines +107 to +108
tar_info.mtime = 0
tar_info = self.clean_tarinfo(tar_info)
Copy link

@Natureshadow Natureshadow Mar 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tar_info.mtime = 0
tar_info = self.clean_tarinfo(tar_info)
tar_info.mtime = int(os.environ.get("SOURCE_DATE_EPOCH", 0))
tar_info = self.clean_tarinfo(tar_info)

@Natureshadow
Copy link

Natureshadow commented Mar 22, 2021

Oh, quoting from https://reproducible-builds.org/specs/source-date-epoch/ , it seems that honouring this variable is a MUST to comply with the reproducible builds standard. I was not aware of this. As such, I make my proposal a bit harder ;).

It also talks about "timestamp clamping", which says that all timestamps that do not use the value from this variable must be earlier, so 0 looks good. It does, however, look wrong to me to use such a pseudo-random magic value. I would therefore stick with time.time() if the source epoch is not set.

@Natureshadow
Copy link

Oh, quoting from https://reproducible-builds.org/specs/source-date-epoch/ , it seems that honouring this variable is a MUST to comply with the reproducible builds standard. I was not aware of this. As such, I make my proposal a bit harder ;).

Had a chat with the people form reproducible-builds.org.

It seems there were a few misunderstnadings on my side:

  • SOURCE_DATE_EPOCH is mainly for binary builds, to align it with the source timestamp
  • 0 is perfectly fine for reproducibility
  • Using a timestamp that somehow reflects the last time the source was modified would be good. One idea is looking at git's HEAD if available, another would be to copy the mtime of pyproject.toml.

So if the maintainers want to keep it simple, just sticking with 0 seems fine.

@Natureshadow
Copy link

@stephsamson It seems you reviewed this PR. Did you take my comments into account?

@kasteph
Copy link
Member

kasteph commented Apr 17, 2021

@Natureshadow for the purposes of this PR, I think that keeping tar_info.mtime to 0 is fine as you mentioned here yourself or did I misunderstand? Also, SOURCE_DATE_EPOCH as you said is for binary builds so I find that that is out of scope for sdist and subsequently this PR and python-poetry#1102.

@Natureshadow
Copy link

@stephsamson Yes, I did say that using 0 is fine as well. I just wanted to know your opinion, as you left my change proposals open.

@kasteph
Copy link
Member

kasteph commented Apr 19, 2021

@Natureshadow I think that your suggestion, specifically for checking mtime of pyproject.toml to use as a timestamp for binary distributions, is a good candidate for a separate issue/refactoring.

@kasteph kasteph merged commit f478ec3 into python-poetry:master Apr 19, 2021
@abn abn deleted the deterministic-sdists branch April 19, 2021 12:16
@sdispater sdispater mentioned this pull request Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reproducible sdist builds
3 participants