Skip to content
This repository has been archived by the owner on Jun 28, 2024. It is now read-only.

CC | Run the current tests offloading the image to the guest without the forked containerd #5774

Draft
wants to merge 10 commits into
base: CCv0
Choose a base branch
from

Conversation

fidencio
Copy link
Member

SSIA.

I'm opening the PR, but the majority of the content was written by (and the credits are properly given to) @ChengyuZhu6.

PLEASE, DO NOT RUN /test HERE YET.

@katacontainersbot katacontainersbot added the size/large Task of significant size label Sep 20, 2023
@fidencio
Copy link
Member Author

/fidencio-test

@fidencio fidencio force-pushed the topic/CC-add-job-to-test-image-pull-on-the-guest-without-forked-containerd branch from 6667a20 to 0cb184e Compare September 20, 2023 14:26
@fidencio
Copy link
Member Author

/fidencio-test

@fidencio fidencio force-pushed the topic/CC-add-job-to-test-image-pull-on-the-guest-without-forked-containerd branch from 0cb184e to e57b089 Compare September 20, 2023 21:46
@fidencio
Copy link
Member Author

/fidencio-test

Comment on lines 48 to 49
install -D -m 755 "bin/containerd-nydus-grpc" "${nydus_snapshotter_binary_target_dir}/containerd-nydus-grpc"
install -D -m 755 "bin/nydus-overlayfs" "${nydus_snapshotter_binary_target_dir}/nydus-overlayfs"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add sudo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definity should! :-)

@fidencio fidencio force-pushed the topic/CC-add-job-to-test-image-pull-on-the-guest-without-forked-containerd branch from e57b089 to 4c6b741 Compare September 21, 2023 05:49
@fidencio
Copy link
Member Author

/fidencio-test

Copy link
Member

@ChengyuZhu6 ChengyuZhu6 Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In f8cdfd2#diff-9ed0363d00e4ed02d42e6994eed9e2f806052abc3cd08d66bbb64d5c211f25a1, since it includes both host sharing and guest pulling tests, the snapshotter configuration file requires switching. This entails deleting the image remove_test_image used for the test and configuring the snapshotter configure_nydus_snapshotter. However, the test cases here solely involve guest pulling, a better approach would be to delete the test image before running the test scripts, followed by restarting the snapshotter process. The same operation can be performed after the test concludes, rather than repeating the deletion and configuration for each testcase. However, I believe it's fine to do in every test case. So we may need to add the related functions to integration/confidential/lib.sh.

@@ -31,7 +31,17 @@ RUNTIMECLASS="${RUNTIMECLASS:-kata}"
test_tag="[cc][agent][kubernetes][containerd]"

setup() {
setup_common
remove_test_image "$image_unsigned_protected" || true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we should remove all images in setup() like this:

 remove_test_image "$image_cosigned" || true
 remove_test_image "$image_cosigned_other" || true
 remove_test_image "$image_simple_signed" || true
 remove_test_image "$image_signed_protected_other" || true
 remove_test_image "$image_unsigned_protected" || true
 remove_test_image "$image_unsigned_unprotected" || true
 remove_test_image "$image_authenticated" || true

@@ -152,4 +162,7 @@ setup() {

teardown() {
teardown_common
remove_test_image "$image_unsigned_protected" || true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we should remove all images in teardown() like this:

 remove_test_image "$image_cosigned" || true
 remove_test_image "$image_cosigned_other" || true
 remove_test_image "$image_simple_signed" || true
 remove_test_image "$image_signed_protected_other" || true
 remove_test_image "$image_unsigned_protected" || true
 remove_test_image "$image_unsigned_unprotected" || true
 remove_test_image "$image_authenticated" || true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with the CI job only pulling the image on the guest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An image is associated with a snapshotter. Therefore, prior to utilizing the snapshotter, it's necessary to remove all tested images. Otherwise, the image pull request will not be directed to nydus-snapshotter, as containerd interprets the image as already available. So we should remove all tested images before running the tests.

Copy link
Member

@ChengyuZhu6 ChengyuZhu6 Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we neglect to remove the images, an error will occur when creating the container:

Warning  FailedCreatePodSandBox  3s    kubelet   Failed to create pod sandbox: 
rpc error: code = NotFound desc = failed to create containerd container: error unpacking image: 
failed to extract layer sha256:1021ef88c7974bfff89c5a0ec4fd3160daac6c48a075f74cff721f85dd104e68: 
failed to get reader from content store: content digest sha256:fbe1a72f5dcd08ba4ca3ce3468c742786c1f6578c1f6bb401be1c4620d6ff705: not found

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we neglect to remove the images, an error will occur when creating the container:

This is a huge drawback, IMHO.

It forces users to cleanup all the images on their systems before using nydus, making this approach only okay for a clean installation. :-/

@fidencio fidencio force-pushed the topic/CC-add-job-to-test-image-pull-on-the-guest-without-forked-containerd branch from 4c6b741 to 2e90792 Compare September 21, 2023 07:34
@fidencio
Copy link
Member Author

/fidencio-test

@fidencio fidencio force-pushed the topic/CC-add-job-to-test-image-pull-on-the-guest-without-forked-containerd branch from 2e90792 to 1536bac Compare September 21, 2023 09:10
@fidencio
Copy link
Member Author

/fidencio-test

2 similar comments
@fidencio
Copy link
Member Author

/fidencio-test

@fidencio
Copy link
Member Author

/fidencio-test

@fidencio fidencio force-pushed the topic/CC-add-job-to-test-image-pull-on-the-guest-without-forked-containerd branch from 3e1c451 to d1052a1 Compare September 21, 2023 12:20
@fidencio
Copy link
Member Author

/fidencio-test

@fidencio fidencio force-pushed the topic/CC-add-job-to-test-image-pull-on-the-guest-without-forked-containerd branch from d1052a1 to c30416d Compare September 21, 2023 18:04
@fidencio
Copy link
Member Author

/fidencio-test

@fidencio fidencio force-pushed the topic/CC-add-job-to-test-image-pull-on-the-guest-without-forked-containerd branch from c30416d to 57a1fff Compare September 22, 2023 08:15
@fidencio
Copy link
Member Author

/fidencio-test

@fidencio fidencio force-pushed the topic/CC-add-job-to-test-image-pull-on-the-guest-without-forked-containerd branch from 57a1fff to c070c2a Compare September 22, 2023 08:34
@fidencio
Copy link
Member Author

/fidencio-test

@fidencio fidencio force-pushed the topic/CC-add-job-to-test-image-pull-on-the-guest-without-forked-containerd branch from c070c2a to 43652ea Compare September 22, 2023 12:12
@fidencio
Copy link
Member Author

/fidencio-test

And let's hope for the buest. :-)

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
@fidencio fidencio force-pushed the topic/CC-add-job-to-test-image-pull-on-the-guest-without-forked-containerd branch from 43652ea to 2b5ca0a Compare September 22, 2023 12:14
@stevenhorsman
Copy link
Member

/fidencio-test

@stevenhorsman stevenhorsman force-pushed the topic/CC-add-job-to-test-image-pull-on-the-guest-without-forked-containerd branch from 4f17f93 to 0a967f9 Compare September 25, 2023 18:08
@stevenhorsman
Copy link
Member

/fidencio-test

@stevenhorsman
Copy link
Member

Chengyu - just to give you an update on what I've been trying today. The test that was 12 was still failing and after printing the containerd, kata and kubelet logs I couldn't spot an error, so I wondered if the problem was due to the fact that nydus snapshotter had already cached the docker config.json, so the missing credentials weren't a problem. I didn't know how to reset the credentials, so I tried moving the test above the others, but in http://jenkins.katacontainers.io/view/CCv0/job/tests-CCv0-ubuntu-20.04-x86_64-CC_CRI_CONTAINERD_K8S-IMAGE_OFFLOAD_TO_GUEST-PR/28/consoleFull it still fails and I can't see an obvious error related to the image pull failing on the host, so we might need to look elsewhere. I'll spend more time debugging tomorrow, but wanted to let you know what I've done in case you have any ideas whilst I'm asleep.

@ChengyuZhu6
Copy link
Member

2023-09-26.09-20-36.mp4

@ChengyuZhu6
Copy link
Member

@stevenhorsman I have tested in local machine and I can find the log "failed to resolve reference "quay.io/kata-containers/confidential-containers-auth:test": unexpected status from HEAD request".

@ChengyuZhu6
Copy link
Member

ChengyuZhu6 commented Sep 26, 2023

Consider printing the log after assert_pod_fail "${pod_config}"for debugging. This is because the image pulling process is initiated by assert_pod_fail. Another potential reason could be that the kubelet log exceeds a length of 100,000, preventing the display of the necessary log.

NYDUS_SNAPSHOTTER_TARFS_CONFIG="/usr/local/share/nydus-snapshotter/config-coco-host-sharing.toml"
NYDUS_SNAPSHOTTER_GUEST_CONFIG="/usr/local/share/nydus-snapshotter/config-coco-guest-pulling.toml"
NYDUS_SNAPSHOTTER_CONFIG="${NYDUS_SNAPSHOTTER_CONFIG:-${NYDUS_SNAPSHOTTER_TARFS_CONFIG}}"
NYDUS_SNAPSHOTTER_TARFS_EXPORT_MODE="${PULL_ON_HOST_EXPORT_MODE=:-image_block}"
Copy link
Member

@ChengyuZhu6 ChengyuZhu6 Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fidencio I found why tests pulling image on the host are all failed. There is an extraneous '=' to get PULL_ON_HOST_EXPORT_MODE, causing the value of export_mode to be export_mode = ":-image_block". So if we remove '=' in "${PULL_ON_HOST_EXPORT_MODE=:-image_block}", it should be fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - good spot

@ChengyuZhu6
Copy link
Member

image

Ensure nydus is setup to be used by containerd as part of our tests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The reason for that being:
```
The test case 5 failed due to a specific behavior when using containerd
and the snapshotter to download images. Containerd needs to fetch both
the manifest and configuration of the image. In this case, both
images`quay.io/kata-containers/confidential-containers:signed` and
`quay.io/kata-containers/confidential-containers:unsigned` have the same
IDs (sha256). Consequently, test case 4 downloaded image
`quay.io/kata-containers/confidential-containers:signed`. So, in test
case 5, when containerd detected that the image ID already existed, it
used the manifest and image name from
`quay.io/kata-containers/confidential-containers:signed` and passed it
to kata instead of
`quay.io/kata-containers/confidential-containers:unsigned`, resulting in
the use of image
`quay.io/kata-containers/confidential-containers:signed`. This explains
the error in test case 5. As a temporary measure, deleting the image
before running each test case should address this.
```
from:
https://cloud-native.slack.com/archives/C039JSH0807/p1695618313572309?thread_ts=1695591000.697989&cid=C039JSH0807

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
@stevenhorsman stevenhorsman force-pushed the topic/CC-add-job-to-test-image-pull-on-the-guest-without-forked-containerd branch from cbfe02e to 2970527 Compare September 26, 2023 08:26
@stevenhorsman
Copy link
Member

/fidencio-test

@stevenhorsman
Copy link
Member

@stevenhorsman I have tested in local machine and I can find the log "failed to resolve reference "quay.io/kata-containers/confidential-containers-auth:test": unexpected status from HEAD request".

Locally I'm seeing

Unknown desc = failed to pull and unpack image \"quay.io/kata-containers/confidential-containers-auth:test\": failed to resolve reference \"quay.io/kata-containers/confidential-containers-auth:test\": pulling from host quay.io failed with status code [manifests test]: 401 UNAUTHORIZED" image="quay.io/kata-containers/confidential-containers-auth:test"

but I've fixed the logging now, so we'll see what the automation comes up with and pick the best message :)

@stevenhorsman
Copy link
Member

@stevenhorsman I have tested in local machine and I can find the log "failed to resolve reference "quay.io/kata-containers/confidential-containers-auth:test": unexpected status from HEAD request".

Locally I'm seeing

Unknown desc = failed to pull and unpack image \"quay.io/kata-containers/confidential-containers-auth:test\": failed to resolve reference \"quay.io/kata-containers/confidential-containers-auth:test\": pulling from host quay.io failed with status code [manifests test]: 401 UNAUTHORIZED" image="quay.io/kata-containers/confidential-containers-auth:test"

but I've fixed the logging now, so we'll see what the automation comes up with and pick the best message :)

I see the following messages in the CI logs as well:

10:41:10 # Sep 26 09:39:11 ubuntu20-f89750 kubelet[174155]: E0926 09:39:11.193013  174155 remote_image.go:236] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"quay.io/kata-containers/confidential-containers-auth:test\": failed to resolve reference \"quay.io/kata-containers/confidential-containers-auth:test\": pulling from host quay.io failed with status code [manifests test]: 401 UNAUTHORIZED" image="quay.io/kata-containers/confidential-containers-auth:test"
10:41:10 # Sep 26 09:40:45 ubuntu20-f89750 containerd[191826]: time="2023-09-26T09:40:45.346872399Z" level=error msg="PullImage \"quay.io/kata-containers/confidential-containers-auth:test\" failed" error="failed to pull and unpack image \"quay.io/kata-containers/confidential-containers-auth:test\": failed to resolve reference \"quay.io/kata-containers/confidential-containers-auth:test\": pulling from host quay.io failed with status code [manifests test]: 401 UNAUTHORIZED"

so I'll work on fixing up the grep to try and match that properly

@stevenhorsman stevenhorsman force-pushed the topic/CC-add-job-to-test-image-pull-on-the-guest-without-forked-containerd branch from d11e8ef to 2970527 Compare September 26, 2023 10:55
@stevenhorsman
Copy link
Member

/fidencio-test

stevenhorsman and others added 2 commits September 26, 2023 13:52
Update image pull tests to provide registry creds on the host
for kubernetes and nydus snapshotter to use

Co-authored-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
@stevenhorsman stevenhorsman force-pushed the topic/CC-add-job-to-test-image-pull-on-the-guest-without-forked-containerd branch from 2970527 to c63008b Compare September 26, 2023 12:52
@stevenhorsman
Copy link
Member

/fidencio-test

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
@stevenhorsman
Copy link
Member

/fidencio-test

@stevenhorsman
Copy link
Member

@fidencio @stevenhorsman . Should we add encrypted image tests in pulling image in the guest with snapshotter? I believe the encrypted image tests would be passed with snapshotter to pull image in the guest.

So I think the plan beyond this draft PR is to switch the old jobs (that run the encrypted tests) to use image offload in https://github.com/confidential-containers/operator/compare/main...fidencio:cc-operator:remote_snapshotter?expand=1, so that should cover this. I think we are keeping this open for the pull on host testing

@ChengyuZhu6
Copy link
Member

ChengyuZhu6 commented Sep 26, 2023

Oh sorry. I found the tests have included encrypted tests.

@ChengyuZhu6
Copy link
Member

ChengyuZhu6 commented Sep 26, 2023

@fidencio @stevenhorsman The reason the tests for pulling images on the host are failing is because the latest version of nydus-image (2.2.3) does not include the tar-tarfs feature. Therefore, we need to build the main version of nydus-image from the nydus repository, similar to how we install nydus-snapshotter. It's important to note that this action will not impact the image offload tests.

23:45:33 # Sep 26 15:43:44 ubuntu20-384790 snapshotter[191643]: time="2023-09-26T15:43:44.034337972Z" level=warning msg="nydus image exec failed, error: 'tar-tarfs' isn't a valid value for '--type <type>'\n  [possible values: directory, dir-rafs, estargz-rafs, estargz-ref, estargztoc-ref, tar-rafs, targz-rafs, targz-ref, stargz_index]\n\n  Did you mean 'tar-rafs'?\n\nFor more information try '--help'\n"

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
@fidencio
Copy link
Member Author

/fidencio-test

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
size/large Task of significant size
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants