-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify builds are reproducible in the CI #50205
Comments
cc @marc-hb |
I do not expect this to be something that breaks often. Bi-weekly build should be fine. |
We have many non-locked Python dependencies that are used somehow during the build process, they should be considered. |
Here's a list of 20+ old reproducibility fixes: This should show what the most common problems are. In the same place there's an (obsolete) test script. The approach was crude but very effective:
|
Agreed. Reproducibility testing and fixing is rare, but reproducibility regressions are very rare too.
On the other hand, IF it's cheap and quick to run then why not run it every PR? |
Because of the amount of generated code, I'm in favor of checking on every PR. Maybe the github workflow can be setup to run on any changes to the ./scripts directory, but also setup as a weekly run to catch problems with the actual source code. |
Temporary bugs, corner cases and obsolete toolchains aside, the Zephyr build is most of the time reproducible: zephyrproject-rtos#50205 and zephyrproject-rtos#14593. This means two different build machines using the same toolchain will always produce the same binary output. The one-line addition in this commit makes it trivial to verify that binary outputs are indeed the same by adding a single checksum line in the build logs: ``` [16/16] Linking C executable zephyr/zephyr.elf Memory region Used Size Region Size %age Used RAM: 53280 B 3 MB 1.69% IDT_LIST: 0 GB 2 KB 0.00% fdd2ddf2ad7d5da5bbd79b41cef...7b16ef549a8281111d8e205 zephyr.strip ``` This commit makes a non-measurable build time difference. Build reproducibility matters for (at least) two important reasons: - Security / supply chain attacks, see https://www.cisa.gov/sbom, zephyrproject-rtos#50205, https://reproducible-builds.org/ and many others. - Making sure build configurations are strictly identical when trying to reproduce elusive issues or when issuing releases. Displaying a reproducible checksum accelerates the investigation of temporary reproducibility issues like zephyrproject-rtos#48195. Signed-off-by: Marc Herbert <marc.herbert@intel.com>
Temporary bugs, corner cases and obsolete toolchains aside, the Zephyr build is reproducible most of the time: zephyrproject-rtos#50205 and zephyrproject-rtos#14593 This means two different build machines using the same toolchain will always produce the same binary output. The previous, one-line commit made it trivial to verify that binary outputs are indeed the same by adding this single line in the buid logs: ``` [16/16] Linking C executable zephyr/zephyr.elf Memory region Used Size Region Size %age Used RAM: 53280 B 3 MB 1.69% IDT_LIST: 0 GB 2 KB 0.00% fdd2ddf2ad7d5da5bbd79b41cef8d7...1a896b989a8281111d8e205 zephyr.strip ``` This commit enables that feature by default because build reproducibility matters for (at least) two important reasons: - Security / supply chain attacks, see https://www.cisa.gov/sbom, zephyrproject-rtos#50205, https://reproducible-builds.org/ and many others. - Making sure build configurations are strictly identical when trying to reproduce elusive issues or when issuing releases. It was of course already possible to _manually_ make this Kconfig change and manually compute this checksum. However this can be impossible when dealing with an automated build system that does not archive all _intermediate_ (zephyrproject-rtos#5009) files like `zephyr.elf`. Tweaking the build configuration can also be difficult and error-prone for people who are not Zephyr developers. Most automated CI systems preserve build logs by default. Displaying the reproducible checksum by default accelerates the discovery of reproducibility bugs like zephyrproject-rtos#48195. When measured with `west build -p -b qemu_x86 samples/hello_world/`, the additional `build/zephyr/zephyr.strip` disk space required is 43 kilobytes compared to a total of 11 Megabytes. Measuring a more realistic SOF example, `zephyr.strip` weighed 690 kb which was about 0.1% of a total `build/` directory weighing 65M. To measure the build time cost I ran `west build -p -b qemu_x86 samples/hello_world/` many times in a loop with and without this PR on my Linux workstation. Stripping and checksumming made literally no time difference compared to the "noise" observed when building the same configuration. This is not surprising considering how small `zephyr.strip`: so the extra cost is most likely dominated by process creation and the total number of processes created during a Zephyr build dwarfs the few extra processes required by this feature. More surprisingly, I measured incremental builds by running `touch kernel/timer.c; west build ...` in a loop and I could not observe any visible time difference either. Signed-off-by: Marc Herbert <marc.herbert@intel.com>
These 2 additional lines are IMHO a big step forward, please help review: |
Github Actions for the Zephyr+SOF project have been routinely and successfully comparing binaries built on Linux versus Windows in every PR for a few months now:
To achieve this I overrode the default config change in #51954 in an SOF-specific way: thesofproject/sof@945adb8d1660ed4 Building across two different operating systems provides a lot of differences "for free" that can be very difficult to achieve on the same operating system (see old #14593 attempt). Kudos to @aborisovich for implementing the Windows build in Github Actions. This does not catch everything (e.g.: Note a build is no more "reproducible" than a project is "bug-free"; fixing reproducibility bugs is a continuous activity exactly like fixing other bugs. Typically, building some code is reproducible in some Kconfiguration but fails when that Kconfiguration is changed - exactly like other bugs. Most recent example with CONFIG_ASSERT:
Switching to an old toolchain can also be very problematic: |
Introduction
Zephyr builds should be reproducible. A checkout of Zephyr from the same commit, built with the same toolchain, should generate an identical image binary.
Problem description
This has been proposed before (#11523 and #14593). But there are no tests that verify reproducible build in the Zephyr tree at the moment.
Furthermore, reproducible builds were broken for an unknown amount of time, but fixed with #48195.
Proposed change
Add a new github workflow that verifies builds are reproducible. This workflow will be run on every PR.
The workflow can follow the blueprint of the Footprint Delta workflow. The new workflow would build TBD platforms, back to back, verifying the resulting binaries are identical.
Note that the build command
west build -b native_posix tests/drivers/build_all/sensor
has been known to catch problems with devicetree generation that results in non-reproducible builds.Dependencies
The new github workflow will block new PRs if the reproducible build test fails.
Concerns and Unresolved Questions
Running this check against every PR will incur additional computing time and resources.
Alternatives
Run the reproducible build check less frequently, such as nightly. However, this will require a significant bisect effort to identify the culprit PR when any failures are detected. The incremental cost of some additional builds on each PR seems worth the trouble.
The text was updated successfully, but these errors were encountered: