Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

06.Version Cache - Support for wget #14612

Merged

Conversation

Kalimuthu-Velappan
Copy link
Contributor

When a package is referenced from the web through wget command, it downloads the package for every build.

This feature caches all the packages that are being downloaded from the web, so that subsequent build always loads the cache instead of from web.

Why I did it

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211

Description for the changelog

Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@zhangyanzhao
Copy link
Collaborator

@xumia can you please help to review? Thanks.

@k-v1
Copy link
Contributor

k-v1 commented May 11, 2023

@Kalimuthu-Velappan

Could you please clarify how to test a build with version cache correctly?
In https://github.com/sonic-net/SONiC/blob/master/doc/sonic-build-system/build-enhancements.md you mentioned multiple combinations like

  1. DPKG_CACHE=y and VERSION_CACHE=y
  2. DPKG_CACHE=n and VERSION_CACHE=y.

For DPKG_CACHE it's obvious:

  1. We set SONIC_DPKG_CACHE_METHOD=cache.
  2. We can use artifacts from SONIC_DPKG_CACHE_SOURCE after 1st build.

But what options should I choose to enable vcache?
Should I set SONIC_VERSION_CACHE_METHOD=cache for 1st build or only for builds when we already use artifacts from vcache directory?
Should I also enable SONIC_DPKG_CACHE_METHOD=cache or this feature is not required for vcache?
Should I also enable SONIC_VERSION_CONTROL_COMPONENTS?

@k-v1
Copy link
Contributor

k-v1 commented May 11, 2023

BTW we already have PR with number 06 for vcache: #13024

@Kalimuthu-Velappan
Copy link
Contributor Author

In order to use version vache(VCACHE), SONIC_VERSION_CACHE_METHOD=cache variable should be used.
VCACHE can work independently of SONIC_VERSION_CONTROL_COMPONENTS and SONIC_DPKG_CACHE_METHOD

Here are the recommended settings:
For the first time, only VCACHE can be set to cache, this will create a fresh cache.
SONIC_DPKG_CACHE_METHOD=none
SONIC_VERSION_CONTROL_COMPONENTS =none
SONIC_VERSION_CACHE_METHOD=cache

For the subsequent build to use the VCACHE, both the version variable needs to be set. DPKG cache can be independent of the version cache and it will superset of VCACHE.

       SONIC_VERSION_CONTROL_COMPONENTS =all
       SONIC_VERSION_CACHE_METHOD=cache

continue
fi
FLOCK ${SRC_FILENAME}
cp ${DST_FILENAME} ${SRC_FILENAME}
Copy link
Contributor

@k-v1 k-v1 May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work with sonic-slave docker containers?
I mean when we build sonic-slave container we download some files (e.g. golang binaries) using wget.

According to logs it saved something.

#20 88.54 Saving into web cache URL:https://sonicstorage.blob.core.windows.net/public/fips/bullseye/0.1/amd64/golang-1.15-go_1.15.15-1~deb11u4%2Bfips_amd64.deb, DST:/sonic/target/vcache/sonic-slave-bullseye/web/golang-1.15-go_1.15.15-1~deb11u4%2Bfips_amd64.deb-b60f6db62805696b47ab422ab798fe56.tgz, SRC:golang-go.deb
#20 117.0 Saving into web cache URL:https://sonicstorage.blob.core.windows.net/public/fips/bullseye/0.1/amd64/golang-1.15-src_1.15.15-1~deb11u4%2Bfips_amd64.deb, DST:/sonic/target/vcache/sonic-slave-bullseye/web/golang-1.15-src_1.15.15-1~deb11u4%2Bfips_amd64.deb-1c920937aa49b602a1bfec3c49747131.tgz, SRC:golang-src.deb
#61 2.627 Saving into web cache URL:https://github.com/bazelbuild/bazelisk/releases/latest/download/bazelisk-linux-amd64, DST:/sonic/target/vcache/sonic-slave-bullseye/web/bazelisk-linux-amd64-d9f6a4c3197625b1835b523697eb953e.tgz, SRC:/usr/local/bin/bazel

But there are no files in vcache directory.

➜  vcache tree 
.
├── sonic-slave-bullseye
└── sonic-slave-buster

2 directories, 0 files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it works for slave containers as well
Vcache is disabled for now, it will get enabled once everything is done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to build your branch VERSION_CACHE_WGET with SONIC_VERSION_CACHE_METHOD=cache enabled.
But I didn't find files like golang binaries from sonic-slaves in vcache directory (SONIC_VERSION_CACHE_SOURCE) after build finished.
Is it expected behaviour?

I can retest this. I'll provide build logs if this bug is duplicated.

Copy link
Contributor

@k-v1 k-v1 May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like these files are extracted as part of cache.tgz.
So we can ignore this issue for now and retest it lately when all parts of vcache are merged.

real_version=$(get_url_version $url) || { echo "get_url_version $url failed"; exit 1; }
if [ "$real_version" != "$version" ]; then
log_err "Failed to verify url: $url, real hash value: $real_version, expected value: $version_filename" 1>&2
real_version=$(get_url_version $url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you call get_url_version two times: line 215 and 218?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

SRC_FILENAME=${WEB_CACHE_PATH}/${WEB_FILENAME}-${VERSION}.tgz
if [ -f ${SRC_FILENAME} ]; then
log_info "Loading from web cache URL:${url}, SRC:${SRC_FILENAME}, DST:${DST_FILENAME}"
cp ${SRC_FILENAME} ${DST_FILENAME}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe need to add double quote around SRC_FILENAME and DST_FILENAME variables.
Filenames unlike urls can contain whitespaces.
So need to check if vcache works correctly in this case:
wget $url -o "my file"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for (( i=0; i<${#parameters[@]}; i++ ))
do
local para=${parameters[$i]}
local nexti=$((i+1))
if [[ "$para" == *://* ]]; then
if [[ "$para" == -o || "$para" == -O ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can find in SONiC Makefiles something like this:

wget -NO "$(DSC_FILE)" $(DSC_FILE_URL)

Probably this code won't be able to parse -NO correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW -NO is not correct combination. I got warning WARNING: timestamping does nothing in combination with -O. See the manual for details. I don't know why this is used in Makefiles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Kalimuthu-Velappan Kalimuthu-Velappan force-pushed the VERSION_CACHE_WGET branch 2 times, most recently from be60409 to 99e9fee Compare May 30, 2023 16:46
@adyeung
Copy link
Collaborator

adyeung commented May 30, 2023

@xumia @k-v1 pls review the latest patch, and signoff you agree with the change.

for (( i=0; i<${#parameters[@]}; i++ ))
do
local para=${parameters[$i]}
local nexti=$((i+1))
if [[ "$para" == *://* ]]; then
if [[ "$para" =~ -[^-]*o.* || "$para" =~ -[^-]*O.* ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to fix this regex.
My test build failed with this error:
#20 30.32 dpkg: error: cannot access archive 'golang-go.deb': No such file or directory

And I decided to test the regex using simple bash script:

#!/bin/bash
[[ "$1" =~ -[^-]*o || "$1" =~ -[^-]*O.* ]] && echo true || echo false

As a result many false positive:

./1.sh -o                
true

./1.sh golang.deb
false

./1.sh golang-go.deb
true

./1.sh www.test-url.com
true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@k-v1
Copy link
Contributor

k-v1 commented May 31, 2023

@Kalimuthu-Velappan

One question about implementation.
Do we really need to save the same files in separate directories?

What I mean:

cache/vcache/sonic-slave-buster/web:

hiredis_0.14.0-3~bpo9+1.debian.tar.xz-ef340aedc6fd42c549cd503bffb498b2.tgz
hiredis_0.14.0-3~bpo9+1.dsc-be4ce11ef67268e59e4b6be587327c40.tgz
hiredis_0.14.0.orig.tar.gz-6d565680a4af0d2e261abbc3e3431b2b.tgz
libyang2_2.0.112-6.debian.tar.xz-aa798a069f506a243d337c18600477e5.tgz
libyang2_2.0.112-6.dsc-da4924086edc4911c6fca21ca46fdb1d.tgz
libyang2_2.0.112.orig.tar.gz-4ac414ef27f3c4d14f96c2f49c58c2be.tgz
socat_1.7.4.1-3.debian.tar.xz-e8d1e99b4b9e93f5dde860f6d55f42e3.tgz
socat_1.7.4.1-3.dsc-df3ed0dd965589fd09bf6a2920bc273e.tgz
socat_1.7.4.1.orig.tar.gz-780d14908dc1a6aa2790de376ab56b7a.tgz

cache/vcache/sonic-slave-bullseye/web:

hiredis_0.14.0-3~bpo9+1.debian.tar.xz-ef340aedc6fd42c549cd503bffb498b2.tgz
hiredis_0.14.0-3~bpo9+1.dsc-be4ce11ef67268e59e4b6be587327c40.tgz
hiredis_0.14.0.orig.tar.gz-6d565680a4af0d2e261abbc3e3431b2b.tgz
libyang2_2.0.112-6.debian.tar.xz-aa798a069f506a243d337c18600477e5.tgz
libyang2_2.0.112-6.dsc-da4924086edc4911c6fca21ca46fdb1d.tgz
libyang2_2.0.112.orig.tar.gz-4ac414ef27f3c4d14f96c2f49c58c2be.tgz
socat_1.7.4.1-3.debian.tar.xz-e8d1e99b4b9e93f5dde860f6d55f42e3.tgz
socat_1.7.4.1-3.dsc-df3ed0dd965589fd09bf6a2920bc273e.tgz
socat_1.7.4.1.orig.tar.gz-780d14908dc1a6aa2790de376ab56b7a.tgz

Probably we can save some space if we use a single directory cache/vcache/web for all web files.
So it's just a suggestion and not a bug.

Comment on lines 233 to 236
[ -f ${BUILD_WEB_VERSION_FILE} ] && build_version=$(grep "^${url}==${real_version}" ${BUILD_WEB_VERSION_FILE} | awk -F"==" '{print $NF}')
if [ -z ${build_version} ]; then
echo "$url==$real_version" >> ${BUILD_WEB_VERSION_FILE}
sort ${BUILD_WEB_VERSION_FILE} -o ${BUILD_WEB_VERSION_FILE} -u &> /dev/null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We append "$url==$real_version" to the file ${BUILD_WEB_VERSION_FILE} on the line 227 if $real_version is valid (not md5sum of empty string).
So I don't understand what we exactly do here (lines 233-236). Why do we need to save this info again.

Moreover if ${BUILD_WEB_VERSION_FILE} contains duplicated lines for ${url}==${real_version} then ${build_version} will contain multiple hashes and condition on the line 234 will failed: line 234: [: c1c557036197188a22ec285fa53149d8: binary operator expected.
Example:

1. wget www.url.com/file.tar.gz -O file
(line 227: append url==real-version to the web_version_file)
2. wget www.url.com/file.tar.gz -O file2
(line 227: append url==real-version to the web_version_file)
3. get error on line 233-234 due to duplicated strings in web_version_file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, we add the ${url}==${real_version} combination into the file ${BUILD_WEB_VERSION_FILE} if the entry does not exist already and also remove any duplicate entry in the file ${BUILD_WEB_VERSION_FILE}.

line 234: [: => error is due to different shell version, I have fixed it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can clarify this.

Line 227:
[[ $real_version == "d41d8cd98f00b204e9800998ecf8427e" ]] || echo "$url==$real_version" >> ${BUILD_WEB_VERSION_FILE}

We add this entry (url==real_version) into the file unconditionally. The only condition here is comparison with a hash of empty string. If real_version is equal to the hash of empty string then probably there no reason to add this entry into the file.

You say if the entry does not exist already . But I don't understand how it's possible if the entry has a valid real_version and doesn't exist in file if we just added it at line 227. Probably we can remove all this fragment of code where you extract build_version and keep only line 236 where you remove possible duplicate entries from file.

That's my opinion. Maybe I missed something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it's last question that should be resolved before merge this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. I realized the empty HASH part and thanks for pointing it out.
It will add the entry only when it is not empty or valid but not exists in the version file.
Let me know if this is OK.

for (( i=0; i<${#parameters[@]}; i++ ))
do
local para=${parameters[$i]}
local nexti=$((i+1))
if [[ "$para" == *://* ]]; then
if [[ "$para" =~ ^-[^-]*o.* || "$para" =~ ^-[^-]*O.* ]]; then
Copy link
Contributor

@k-v1 k-v1 May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also possible:

  1. wget www.url.com/file.tar.gz -P directory
  2. wget www.url.com/file.tar.gz -O directory/filename.tar.gz
  3. (only for curl) curl www.url.com/file.tar.gz > filename.tar.gz
  4. wget www.url.com/file.tar.gz -O "file name.tar.gz" - not sure, but probably need to recheck this too.
  5. curl -O www.url.com/file.tar.gz
  6. curl -O www.url.com/file.tar.gz --output-dir test

I'm not sure should we try to fix this or not. I didn't find such a code in current SONiC codebase.
Maybe @xumia can help here and say should we fix or ignore this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we ran out of time for the current release, we can take this up on the subsequent release.
Please approve the PR if there are no further changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can ignore this for now and retest again when all parts of vcache are merged.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree to ignore for now. It may be a little hard to cover all the scenarios, covering the current used scenarios should be good enough.

@Kalimuthu-Velappan
Copy link
Contributor Author

@Kalimuthu-Velappan

One question about implementation. Do we really need to save the same files in separate directories?

What I mean:

cache/vcache/sonic-slave-buster/web:

hiredis_0.14.0-3~bpo9+1.debian.tar.xz-ef340aedc6fd42c549cd503bffb498b2.tgz
hiredis_0.14.0-3~bpo9+1.dsc-be4ce11ef67268e59e4b6be587327c40.tgz
hiredis_0.14.0.orig.tar.gz-6d565680a4af0d2e261abbc3e3431b2b.tgz
libyang2_2.0.112-6.debian.tar.xz-aa798a069f506a243d337c18600477e5.tgz
libyang2_2.0.112-6.dsc-da4924086edc4911c6fca21ca46fdb1d.tgz
libyang2_2.0.112.orig.tar.gz-4ac414ef27f3c4d14f96c2f49c58c2be.tgz
socat_1.7.4.1-3.debian.tar.xz-e8d1e99b4b9e93f5dde860f6d55f42e3.tgz
socat_1.7.4.1-3.dsc-df3ed0dd965589fd09bf6a2920bc273e.tgz
socat_1.7.4.1.orig.tar.gz-780d14908dc1a6aa2790de376ab56b7a.tgz

cache/vcache/sonic-slave-bullseye/web:

hiredis_0.14.0-3~bpo9+1.debian.tar.xz-ef340aedc6fd42c549cd503bffb498b2.tgz
hiredis_0.14.0-3~bpo9+1.dsc-be4ce11ef67268e59e4b6be587327c40.tgz
hiredis_0.14.0.orig.tar.gz-6d565680a4af0d2e261abbc3e3431b2b.tgz
libyang2_2.0.112-6.debian.tar.xz-aa798a069f506a243d337c18600477e5.tgz
libyang2_2.0.112-6.dsc-da4924086edc4911c6fca21ca46fdb1d.tgz
libyang2_2.0.112.orig.tar.gz-4ac414ef27f3c4d14f96c2f49c58c2be.tgz
socat_1.7.4.1-3.debian.tar.xz-e8d1e99b4b9e93f5dde860f6d55f42e3.tgz
socat_1.7.4.1-3.dsc-df3ed0dd965589fd09bf6a2920bc273e.tgz
socat_1.7.4.1.orig.tar.gz-780d14908dc1a6aa2790de376ab56b7a.tgz

Probably we can save some space if we use a single directory cache/vcache/web for all web files. So it's just a suggestion and not a bug.

Yes, we need to keep the separate cache contents for each slave container build, because the docker builder doesn't use shared volume mount during the build, like
docker build -v target/vcache:/vcache => not allowed
docker run -v target/vcache:/vcache => allowed

So we need to export these contents inside the docker builder as a context file. So this needs to be packed as a separate tgz file.

@Kalimuthu-Velappan Kalimuthu-Velappan force-pushed the VERSION_CACHE_WGET branch 2 times, most recently from c6062bf to f1a7e59 Compare June 1, 2023 14:01
Comment on lines 233 to 237
[ -f ${BUILD_WEB_VERSION_FILE} ] && build_version=$(grep "^${url}==${real_version}" ${BUILD_WEB_VERSION_FILE} | awk -F"==" '{print $NF}')
if [ ! -z "${build_version}" ]; then
echo "$url==$real_version" >> ${BUILD_WEB_VERSION_FILE}
sort ${BUILD_WEB_VERSION_FILE} -o ${BUILD_WEB_VERSION_FILE} -u &> /dev/null
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we have incorrect logic:

  1. download new file from some url
  2. try to extract build_version from ${BUILD_WEB_VERSION_FILE}
  3. build_version is empty because it was 1st download from this url
  4. condition [ ! -z "${build_version}" ]; failed because build_version is empty, but you check if this is not empty using ! -z (it means not has no value).
  5. you never add new entry to this file

Maybe just replace this code with

echo "$url==$real_version" >> ${BUILD_WEB_VERSION_FILE}
sort ${BUILD_WEB_VERSION_FILE} -o ${BUILD_WEB_VERSION_FILE} -u

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I forgot to remove ! in the if condition. Thanks for catching this.
I modified the code as you suggested.

When a package is referenced from the web through wget command,
it downloads the package for every build.

This feature caches all the packages that are being downloaded from the
web, so that subsequent build always loads the cache instead of from
web.
@Praveen-Brcm Praveen-Brcm merged commit 9dce453 into sonic-net:master Jun 5, 2023
sonic-otn pushed a commit to sonic-otn/sonic-buildimage that referenced this pull request Sep 20, 2023
When a package is referenced from the web through wget command,
it downloads the package for every build.

This feature caches all the packages that are being downloaded from the
web, so that subsequent build always loads the cache instead of from
web.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants