-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image build time improvements #10104
Image build time improvements #10104
Conversation
Currently, when the final image is being built (sonic-vs.img.gz, sonic-broadcom.bin, or similar), each invocation of sudo in the build_debian.sh script takes 0.8 seconds to run and execute the actual command. This is because the /proc filesystem in the slave container has been unmounted somehow. This is happening when debootstrap is running, and it incorrectly unmounts the host's (in our case, the slave container's) /proc filesystem because in the new image being built, /proc is a symlink to the host's (the slave container's) /proc. Because of that, /proc is gone, and each invocation of sudo adds 0.8 seconds overhead. As a side effect, docker exec into the slave container during this time will fail, because /proc/self/fd doesn't exist anymore, and docker exec assumes that that exists. Debootstrap has fixed this in 1.0.124 and newer, so backport the patch that fixes this into the version that Bullseye has. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
During package installations, dpkg calls fsync multiples times (for each package) to ensure that tht efiles are written to disk, so that if there's some system crash during package installation, then it is in at least a somewhat recoverable state. For our use case though, we're installing packages in a chroot in fsroot-* from a slave container and then packaging it into an image. If there were a system crash (or even if docker crashed), the fsroot-* directory would first be removed, and the process would get restarted. This means that the fsync calls aren't really needed for our use case. The eatmydata package includes a library that will block/suppress the use of fsync (and similar) system calls from applications and will instead just return success, so that the application is not blocked on disk writes, which can instead happen in the background instead as necessary. If dpkg is run with this library, then the fsync calls that it does will have no effect. Therefore, install the eatmydata package at the beginning of build_debian.sh and have dpkg be run under eatmydata for almost all package installations/removals. At the end of the installation, remove it, so that the final image uses dpkg as normal. In my testing, this saves about 2-3 minutes from the image build time. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
/azp run |
You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list. |
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
build_debian.sh
Outdated
|
||
echo '[INFO] Install and setup eatmydata' | ||
sudo LANG=C chroot $FILESYSTEM_ROOT apt-get -y install eatmydata | ||
sudo ln -s /usr/bin/eatmydata $FILESYSTEM_ROOT/usr/local/bin/dpkg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The symlink is being created within the chroot for use within the chroot (the link is being created at $FILESYSTEM_ROOT/usr/local/bin/dpkg
and pointing to /usr/bin/eatmydata
). When chrooted, this allows all calls to dpkg
to actually call eatmydata
, which will then look for the actual binary in PATH
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean something like below?
sudo LANG=C chroot $FILESYSTEM_ROOT ln -s /usr/bin/eatmydata /usr/local/bin/dpkg
If yes, one minor comment: it is not obvious that the intention is ln
inside chroot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I was trying to avoid the chroot here, but I'll change this to use chroot to make it clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line has now been changed to use chroot
to make it clear that the symlink is meant for use inside the chroot.
@@ -568,6 +574,10 @@ scripts/collect_host_image_version_files.sh $TARGET_PATH $FILESYSTEM_ROOT | |||
# Remove GCC | |||
sudo LANG=C DEBIAN_FRONTEND=noninteractive chroot $FILESYSTEM_ROOT apt-get -y remove gcc | |||
|
|||
# Remove eatmydata | |||
sudo rm $FILESYSTEM_ROOT/etc/apt/apt.conf.d/00image-install-eatmydata $FILESYSTEM_ROOT/usr/local/bin/dpkg | |||
sudo LANG=C DEBIAN_FRONTEND=noninteractive chroot $FILESYSTEM_ROOT apt-get -y remove eatmydata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The symlink is being created inside the chroot, and is being deleted on line 578, so there shouldn't be any dead symlink in the slave container.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
@qiluo-msft can you re-review this? @xumia looks like the change you did for making sure |
Yes, I have a PR to make sure the /proc is mounted in the slave container. It should not be a correct usage to umount /proc, has impact on multiple programs, including the debootstrap (there is a warning message, complain /proc not mounted). |
Correct, |
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
@qiluo-msft Are we ok to merge this PR? |
Related work items: #49, #58, #107, sonic-net#247, sonic-net#249, sonic-net#277, sonic-net#593, sonic-net#597, sonic-net#1035, sonic-net#2130, sonic-net#2150, sonic-net#2165, sonic-net#2169, sonic-net#2178, sonic-net#2179, sonic-net#2187, sonic-net#2188, sonic-net#2191, sonic-net#2195, sonic-net#2197, sonic-net#2198, sonic-net#2200, sonic-net#2202, sonic-net#2206, sonic-net#2209, sonic-net#2211, sonic-net#2216, sonic-net#7909, sonic-net#8927, sonic-net#9681, sonic-net#9733, sonic-net#9746, sonic-net#9850, sonic-net#9967, sonic-net#10104, sonic-net#10152, sonic-net#10168, sonic-net#10228, sonic-net#10266, sonic-net#10288, sonic-net#10294, sonic-net#10313, sonic-net#10394, sonic-net#10403, sonic-net#10404, sonic-net#10421, sonic-net#10431, sonic-net#10437, sonic-net#10445, sonic-net#10457, sonic-net#10458, sonic-net#10465, sonic-net#10467, sonic-net#10469, sonic-net#10470, sonic-net#10474, sonic-net#10477, sonic-net#10478, sonic-net#10482, sonic-net#10485, sonic-net#10488, sonic-net#10489, sonic-net#10492, sonic-net#10494, sonic-net#10498, sonic-net#10501, sonic-net#10509, sonic-net#10512, sonic-net#10514, sonic-net#10516, sonic-net#10517, sonic-net#10523, sonic-net#10525, sonic-net#10531, sonic-net#10532, sonic-net#10538, sonic-net#10555, sonic-net#10557, sonic-net#10559, sonic-net#10561, sonic-net#10565, sonic-net#10572, sonic-net#10574, sonic-net#10576, sonic-net#10578, sonic-net#10581, sonic-net#10585, sonic-net#10587, sonic-net#10599, sonic-net#10607, sonic-net#10611, sonic-net#10616, sonic-net#10618, sonic-net#10619, sonic-net#10623, sonic-net#10624, sonic-net#10633, sonic-net#10646, sonic-net#10655, sonic-net#10660, sonic-net#10664, sonic-net#10680, sonic-net#10683
Why I did it
The last step of the SONiC build (building
sonic-vs.bin
orsonic-broadcom.bin
orsonic-aboot-broadcom.bin
) seemed to take a long time. I started looking at what it was doing and what could be done to reduce the time spent there.How I did it
There are two major changes in this PR (see the individual commits for more details on what was done there):
/proc
filesystem within the slave container to get umounted. This meant that/proc
is now an empty folder. There were two visible side effects to this: each invocation of sudo added 0.8 seconds to the runtime (sudo itself would take 0.8 seconds more, then execute the actual command), and from the host machine, trying todocker exec
into the container while the Debian image build was happening would fail, because it expects/proc/self/fd
to be available. This is fixed in debootstrap 1.0.124, but since there's no backported version available for Bullseye, I am rebuilding debootstrap with the patch added in.fsync
calls done during deb package installations by dpkg. dpkg callsfsync
many times, to make sure all new files are written to disk and that the filesystem is in a consistent state, so that if some system crash were to happen, it would be somewhat possible to recover the system. However, because of the nature of the build that we're doing, this is not really important for us. Theeatmydata
package can suppress thesefsync
calls and potentially improve application performance, at the risk of losing data or having the filesystem in an inconsistent state in the event of a system crash.How to verify it
On my dev VM, before any of these changes, building the virtual switch image took 28 minutes (this is only counting the part of the build where
build_debian.sh
is run to build the base Debian image, load in all of the containers, and make image files). After the debootstrap patch, this decreased to 22 minutes. With the use ofeatmydata
, this decreased to 19-20 minutes. Actual numbers will vary depending on the system.Which release branch to backport (provide reason below if selected)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)