Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for generating buildinfo file as subpackage #1532

Closed
wants to merge 5 commits into from

Conversation

fepitre
Copy link

@fepitre fepitre commented Feb 7, 2021

TODO:

  • Add 'Binary' field with all built RPMs package names
  • Add Checksums for at least source RPM
  • Add additional rpm macros (--define, --with/--without, etc.)

Reference: https://wiki.debian.org/ReproducibleBuilds/BuildinfoFiles, Thread "Reproducible builds" on Fedora devel list.

TODO:
- Add 'Binary' field with all built RPMs package names
- Add Checksums for at least source RPM
- Add additional rpm macros (--define, --with/--without, etc.)
fepitre added a commit to fepitre/rpmreproduce that referenced this pull request Feb 11, 2021
- Few refactor
- Handle detached RPM sig from Koji

Related upstream change: rpm-software-management/rpm#1532
fepitre added a commit to fepitre/rpmreproduce that referenced this pull request Feb 11, 2021
- Few refactor
- Handle detached RPM sig from Koji
- Related RPM change: rpm-software-management/rpm#1532
@fepitre
Copy link
Author

fepitre commented Feb 21, 2021

Just a little ping here to know if anyone has already take a look on the global form for this. @Conan-Kudo any feedback if you have time? Thank you very much.

@pmatilai
Copy link
Member

pmatilai commented Mar 29, 2021

If you want feedback here, you should explain what it is and why we should care. Outside references are okay for additional background, but the what it does and why (as in rationale for the whole) needs to be explained right here so that it can be discussed.

So start by explaining what problem you're trying to solve, and not how other people might've chosen to solve it.

@fepitre
Copy link
Author

fepitre commented Mar 29, 2021

So start by explaining what problem you're trying to solve, and not how other people might've chosen to solve it.

The purpose of having an automatically generated subpackage buildinfo is to have an exact description of the build environment used to obtain RPMs from a spec file.

Having this, we can rebuild in the same condition any src.rpm (see my WIP tool called rpmreproduce and compare with the original RPMs. Also, if a given package is supposed to be reproducible and rebuild differs, something is wrong and needs to be investigated.

For example, https://github.com/fepitre/rpmreproduce/blob/master/tests/data/terminator-buildinfo-2.1.0-1.fc32.noarch.rpm is a generated buildinfo package.

Copy link
Contributor

@DemiMarie DemiMarie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using “whitelist” and properly quote variables containing special characters

scripts/rpmbuildinfo Outdated Show resolved Hide resolved
Comment on lines 72 to 74
rpm -qa --queryformat '%{epoch}:%{name}-%{version}-%{release}.%{arch}\n' \
| LC_ALL=C sort -t: -k2 \
| sed -e 's/^(none)://; /\.(none)$/d; s/^/ /' >> "$BUILDINFO"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly this requires -o pipefail if we want to handle errors robustly.

scripts/rpmbuildinfo Show resolved Hide resolved
scripts/rpmbuildinfo Outdated Show resolved Hide resolved
Comment on lines 60 to 68
Format: 1.0-rpm
Build-Architecture: $(uname -m)
Source: $RPM_PACKAGE_NAME
Epoch: $RPM_PACKAGE_EPOCH
Version: ${RPM_PACKAGE_VERSION}
Release: ${RPM_PACKAGE_RELEASE}
Architecture: $RPM_ARCH
Build-Origin: $(getos)
Build-Path: $RPM_BUILD_DIR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.0-rpm makes no sense in this context. You're defining an rpm-based format.

RPM convention is to use CamelCase rather than hyphenation, so please be consistent with that (e.g. BuildArchitecture vs Build-Architecture). Please use RPM names (BuildOS vs Build-Origin, BuildDir vs Build-Path, etc.).

Also, please consistently call shell variables with ${} format.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're defining an rpm-based format.

This is actually a debian-defined format1 and at the time we was mocking this format2 we thought it would be good to maintain compatibility, because we expect much of the tooling will be shared. Obviously things like package version specification will be RPM-specific (hence -rpm in Format:), but the rest, like exchange, signature verification, comparison etc. does not need to be different. So the original intention is to maintain as close format to the original as possible.

Footnotes

  1. https://wiki.debian.org/ReproducibleBuilds/BuildinfoFiles, https://manpages.debian.org/bullseye/dpkg-dev/deb-buildinfo.5.en.html

  2. https://github.com/woju/rpmbuildinfo during Reproducible Builds Summit 2016

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is valuable for us, given that we have properties in RPM that don't exist in Debian, and I'd rather see the format using terminology native to RPM.

Copy link
Member

@Conan-Kudo Conan-Kudo Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also of note, the name Source: instead of Name: is not clear, so that clearly would also have to change, since Source means something else entirely in RPM land.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is valuable for us, given that we have properties in RPM that don't exist in Debian

Can you please suggest which properties those are exactly, and specifically how differences in their content might affect reproducibility of the packages? The buildinfo file was purposefully designed to not include all available information, only the relevant to reproducible builds, because recording too much would not actually be useful when analysing (un)reproducibility. The underlying assumption is that the build process of a package will be made as robust as possible, that is, allow as much variability in environment as reasonably possible, which in theory should allow to record less information in buildinfo.

Build-Path: $RPM_BUILD_DIR
EOF

printf 'Installed-Build-Depends:\n' >> "$BUILDINFO"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an appropriate name for this, since you are describing the environment. It's not just various BuildRequires. I would suggest EnvironmentRequires.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I'd rather not have Debian-isms in here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Conan-Kudo thank you for your feedback and comments. Generally I would have loved to have a generic format not being one distro specific to ease manipulating this file among several rebuild tools but I guess it would not be straightforward. I'm waiting some feedback from @bmwiedemann too then I would propose to adapt the work according to your comments.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really insist on a different format for RPM, I'd suggest the file suffix be changed to something else than .buildinfo (maybe .rpmbuildinfo?). This will at least make distinguishing the files easier. Archlinux has .BUILDINFO (all caps) I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with another filename.

# Toolchain.
ENV_WHITELIST="$ENV_WHITELIST CC CPP CXX OBJC OBJCXX PC FC M2C AS LD AR RANLIB MAKE AWK LEX YACC"
# Toolchain flags.
ENV_WHITELIST="$ENV_WHITELIST CFLAGS CPPFLAGS CXXFLAGS OBJCFLAGS OBJCXXFLAGS GCJFLAGS FFLAGS LDFLAGS ARFLAGS MAKEFLAGS"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will not provide any relevant information, unless you set them, which is typically done via call to %configure macro in %build section.

Copy link
Author

@fepitre fepitre Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand your point. This is not to set with %configure. It's the job of rebuilder like https://github.com/fepitre/rpmreproduce to ensure that same flags, env values would be the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @voxik is saying is that you will not be able to capture the variables because they only exist in the %build step. We'd need a hook to export it from there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Thank you for the enlightenment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@voxik @Conan-Kudo any advice on the best way to create this hook?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it enough to catch them once, for example in %prep? Should be the same, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not. Different variables are initialized at different stages.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to get env variables rpmbuild was started in (before it modified them)?

Copy link
Contributor

@DemiMarie DemiMarie Oct 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/proc/self/environ from Lua?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The environment in various stages of the build depends on various packages in the build environment, so I don't think we should try to capture it. It shoud be reproduced when the same packages are installed. But it seems that rpm passes the environment passed in from the outside into the build environment, so capturing that seems useful. /proc/self/environ sounds like a reasonable option.

OTOH, if the builds are done in mock, the environment would be sanitized anyway. mock now uses systemd-nspawn, and it standarizes the environment to a large degree. Nevertheless, it does inject some information, like container_host_*. This was added in systemd/systemd@e1bb4b0. I wonder if we should add a switch to systemd-nspawn to opt out of setting those variables and make mock use it. The environment inside of mock should not care at all about the host.

@fepitre
Copy link
Author

fepitre commented Oct 13, 2021

@bmwiedemann do you have some feedback to give from several comments here? As you work on reproducible builds for openSUSE you are certainly interested by this new feature.

@bmwiedemann
Copy link
Contributor

Interested in something like this for openSUSE. We already have something comparable called _buildenv (XML) e.g. in https://build.opensuse.org/package/binaries/openSUSE:Factory/bash/standard - but that is created on the obs_worker level.

ArchLinux seems to have their own format with = as delimiter.
In most places, the buildinfo files can probably be handled as opaque files and just things like source,binary,version would be interesting outside. That is not hard to extract in either format, so there is no strong reason to follow Debian here.

OTOH, this is not a .spec file, but just another output from the build process, similar to the build log, so it does not need to follow rpm conventions either.

So from my view, using either the Debian or ArchLinux (marshalling) format has some advantage over making up a third format.

Though, I would not keep confusing Debianisms in keys that will not be meaningful outside of the respective distribution.

@Conan-Kudo
Copy link
Member

Koji has a similar build environment record, though it's stored in the Koji database rather than as a file. We do archive environment artifacts from Mock with builds too, though.

@fepitre
Copy link
Author

fepitre commented Oct 14, 2021

Thank you all for your feedback, I'll prepare another iteration soon with all the comments here.


printf '%s' "\
Format: 1.0
BuildArchitecture: ${RPM_BUILD_ARCH}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to capture the host architecture for the noarch case, so we know if an architecture influenced the build?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's a good info to add indeed.

Comment on lines +78 to +79
# Whitelist from Debian's Dpkg:
# https://salsa.debian.org/dpkg-team/dpkg/-/blob/main/scripts/Dpkg/Build/Info.pm#L51
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this comment anymore now that it doesn't match?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this does not make much sense now as we are certainly going to adapt this list in the context of rpmbuild.

@Conan-Kudo
Copy link
Member

@fepitre I suspect you'll want to capture the shell variables used in each stage (%prep, %build, %install, and %check), since those are all mutation environments. Unlike dpkg-buildpackage and makepkg, rpmbuild makes new shell environments for each stage.

@htgoebel
Copy link

@Conan-Kudo I strongly recommends using the Debianisic names, as this simplifies tooling. Having different names, in addition to already requiring different packaging tools, makes tooling for reproducibility even harder.

Yes, this makes names not match the RPM style. Anyhow, there is no use in creating yet another format, just for this reason. If the format is the same, tooling could share most of the code. I the format and names are diverging, this would require a lot of work adapting existing tooling. This will waste a lot of valuable and rare developer time.

@Conan-Kudo
Copy link
Member

@Conan-Kudo I strongly recommends using the Debianisic names, as this simplifies tooling. Having different names, in addition to already requiring different packaging tools, makes tooling for reproducibility even harder.

I am not willing to allow us to be boxed into Debian-isms. It will cause problems for us.

Yes, this makes names not match the RPM style. Anyhow, there is no use in creating yet another format, just for this reason. If the format is the same, tooling could share most of the code. I the format and names are diverging, this would require a lot of work adapting existing tooling. This will waste a lot of valuable and rare developer time.

And you'd be wasting ours to map RPM semantics to Debian ones. Our tooling works in RPM conventions and semantics, which are different than Debian.

@SantiagoTorres
Copy link

Bump here, is there anything here that could use help? I see the conversation died down a bit, but there were a couple of outstanding points

@fepitre
Copy link
Author

fepitre commented Jul 22, 2022

To be honest, I'm not sure how to converge for this in the case of RPM. As there is many build system like Koji, OBS, etc., they kind of hosting already what's needed as "buildinfo". I'm just wondering how much simpler it is to get these information from build platforms directly instead of trying to reach a consensus over distributions.

@marmarek
Copy link

Retrieving data from a build system may work as a temporary solution, but IMO we still need some file format (something redistributable) with build info. This will be necessary for example to compare build environments between build systems (like, between rebuilders), or just to have some way to have this information in authenticated (signed) form.

RPM_ARCH="$7"

RPM_BUILD_NAME="${RPM_PACKAGE_NAME}-${RPM_PACKAGE_VERSION}-${RPM_PACKAGE_RELEASE}.${RPM_ARCH}"
RPM_BUILD_ARCH="$(uname -m)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will get you the host arch, not the build arch.

Copy link
Member

@pmatilai pmatilai Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on definition. Looking at https://manpages.debian.org/testing/dpkg-dev/deb-buildinfo.5.en.html, this perhaps tries to be Build-Architecture counterpart which means distro architecture, but that could differ from uname -m. It seems to me this was added from your request to track the build arch in case of noarch packages, and from that POV seems kinda reasonable to me, just that there may be a more rpm compatble source of information for that. But it's not really possible to discuss the correctness of a value without a definition of what that value is supposed to mean in the first place.


RPM_BUILD_NAME="${RPM_PACKAGE_NAME}-${RPM_PACKAGE_VERSION}-${RPM_PACKAGE_RELEASE}.${RPM_ARCH}"
RPM_BUILD_ARCH="$(uname -m)"
RPM_BUILD_OS="$(getos)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will get you the host OS rather than the build OS.

}

RPM_BUILD_ROOT="$1"
RPM_BUILD_DIR="$2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUILDROOT and BUILD_DIR may leak relatively sensitive information about usernames etc and are a slap in the face of reproducability for no good reason at all. If anything, inclusion of these should be an opt-in feature (eg enable for troubleshooting purposes only)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this is necessary because it can affect the build significantly. Those paths regularly get embedded in source files, debuginfo, and documentation metadata. That's why I had to change redhat-rpm-config last year to stop Koji builds with doxygen doc builds from failing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this is necessary because it can affect the build significantly. Those paths regularly get embedded in source files, debuginfo, and documentation metadata. That's why I had to change redhat-rpm-config last year to stop Koji builds with doxygen doc builds from failing.

That embedding is a bug in the package in question and should be reported as such. BUILDROOT and BUILD_DIR should be used only when bugs require it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rpm checks against embedded BUILDROOT in the packages already and fails the build when found, we should probably extend that to cover BUILD DIR because references to that are equally a bug, always. Had to look up the redhat-rpm-config change - Doxygen needs a whack in the head, really. I also believe it has options to disable that, if not then we should file a bug on it.

Yeah the "security" aspect is lame, and source + debuginfo packages having them is one thing, it's just that for a moment I thought we're talking about the main binary packages ... but we're not, this is supposed to go into a subpackage. Which I'm not sure makes a whole lot of sense in the first place: you have to fish the plaintext file from inside an rpm. It'd be different if the relevant data could be queried in an rpm header itself, but we don't want to add two dozen new tags specific for just this either. Is there a rationale for this particular design somewhere? (it should be in the commit messages of this thing!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this is necessary because it can affect the build significantly. Those paths regularly get embedded in source files, debuginfo, and documentation metadata. That's why I had to change redhat-rpm-config last year to stop Koji builds with doxygen doc builds from failing.

That embedding is a bug in the package in question and should be reported as such. BUILDROOT and BUILD_DIR should be used only when bugs require it.

Sorry, I don't care whether it's a bug or not. It's reality. Since that's a required component of build reproducibility, it needs to be recorded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I suggested putting it into "an src.rpm like package", which would be a separate (sub-)package, but not necessarily a regular binary package which is expected to be installed, nor the main src.rpm (which would be the right place from rpm POV in many ways but not so much for the practical world). The main package is indeed the wrong place for it for various reasons.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to stick it somewhere, the debugsource package might be as good of a place as any?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to stick it somewhere, the debugsource package might be as good of a place as any?

Do all packages have debugsource? AFAIK noarch usually do not, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's true...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate buildinfo RPM seems like the best option to me.

@ffesti
Copy link
Contributor

ffesti commented Nov 22, 2022

This doesn't seem to go anywhere. So I am closing this PR here. Feel free to re-open or may be open a new one. The discussion is becoming a bit unwieldy.

@ffesti ffesti closed this Nov 22, 2022

getos() {
# shellcheck disable=SC1091
test -r /etc/os-release && . /etc/os-release
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os-release is also /usr/lib/os-release. Please use this instead:

test -e /etc/os-release && os_release='/etc/os-release' || os_release='/usr/lib/os-release'
. "${os_release}"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your conditional assumes the file exists at all. It needs to handle when it's not present at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. So something like this:

if test -e /etc/os-release; then
   . /etc/os-release
elif test -e /usr/lib/os-release; then
   . /usr/lib/os-release
fi

Differences from the original code:

  • do not suppress permission errors by using -r instead of -e
  • use /usr/lib/os-release if appropriate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me.


{ \
printf 'EnvironmentRequires:\n'; \
rpm -qa --queryformat '%{epoch}:%{name}-%{version}-%{release}.%{arch}\n' \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why this ordering. I would expect:
'%{name}-%{epoch}:%{version}-%{release}.%{arch}\n'

(It seems --queryformat doesn't support more complex expressions, so the sed for (none): still seems needed ;( )

Copy link
Member

@Conan-Kudo Conan-Kudo Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That old ordering is probably influenced by the old Debian tooling using YUM instead of DNF, and YUM was weird about this. We should absolutely use the normal format that RPM and DNF use instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.