Skip to content
This repository has been archived by the owner on Mar 9, 2022. It is now read-only.

Add option for ignoring volumes defined in images #1504

Merged
merged 1 commit into from
Jun 14, 2020

Conversation

lorenz
Copy link
Contributor

@lorenz lorenz commented Jun 8, 2020

This adds a new config option for ignoring volumes defined in image metadata.

In Kubernetes volumes (even temporary ones) are generally specified in the pod configuration. Mapping the ones in the image metadata can lead to resource accounting/exhaustion issues (think writing an unlimited amount of inodes or data into the volume) and policy enforcement issues (a PSP cannot disable these).

Specifically for containers started with ReadOnlyRootFilesystem it can also lead to unexpected behavior, for example if there's a typo in the volume mount path the container will still start since the process can write to the data location, but the data ends up in the volume defined by the image which is ephemeral and subsequently gets lost when the pod gets descheduled. The user expected the process in the container to fail since there's no volume mounted at the data path and the root filesystem is readonly, but since the image volume is mounted there this works.

@k8s-ci-robot
Copy link

Hi @lorenz. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mikebrow
Copy link
Member

mikebrow commented Jun 9, 2020

/ok-to-test

Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment

we also have a config document..
https://github.com/containerd/cri/blob/master/docs/config.md

pkg/server/container_create.go Show resolved Hide resolved
@mikebrow
Copy link
Member

mikebrow commented Jun 9, 2020

@lorenz sounds like this was a concrete issue you ran into. Can you provide link(s)?

@lorenz lorenz force-pushed the ignore-image-defined-volumes branch from ff69933 to 398fb1e Compare June 9, 2020 06:39
@lorenz
Copy link
Contributor Author

lorenz commented Jun 9, 2020

I've added a debug log output if we're ignoring volumes and documented the option in config.md. I looked at the commit for the last-added option (TolerateMissingHugePagesCgroupController) and added mine to all the files it touched, but it wasn't added to config.md, that's why I missed it.

The main reason why I've been running this patch for quite some time is because of the issue where incorrectly configured volumes end up writing data into an ephemeral containerd-allocated volume and misbehaving containers writing lots of data into the containerd directory.

@lorenz
Copy link
Contributor Author

lorenz commented Jun 9, 2020

/test pull-cri-containerd-verify

@mikebrow
Copy link
Member

mikebrow commented Jun 9, 2020

I've added a debug log output if we're ignoring volumes and documented the option in config.md. I looked at the commit for the last-added option (TolerateMissingHugePagesCgroupController) and added mine to all the files it touched, but it wasn't added to config.md, that's why I missed it.

The main reason why I've been running this patch for quite some time is because of the issue where incorrectly configured volumes end up writing data into an ephemeral containerd-allocated volume and misbehaving containers writing lots of data into the containerd directory.

yeah I figured as much.. Which of course was my bad, either I missed it .. or just let it go in that pr review and flagged it as a docs issue to be worked on later, can't remember which was a week ago and seems like months :-)

Signed-off-by: Lorenz Brun <lorenz@brun.one>
@lorenz lorenz force-pushed the ignore-image-defined-volumes branch from 398fb1e to 5a1d49b Compare June 9, 2020 19:02
Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mikebrow Do we actually need the volume here? The user from eBay filed issue about this before https://github.com/containerd/containerd/issues/3511.

The volume handled by CRI plugin doesn't provide interface for user to handle this. The CRI-API spec defines volumes from kubelet, not image config. If the user uses devicemapper as snapshotter with disk quota, the size of rootfs will be ensured with quota. But volume from image config will be leak point.

cc @wenlxie

@mikebrow
Copy link
Member

LGTM

@mikebrow Do we actually need the volume here? The user from eBay filed issue about this before containerd/containerd#3511.

The volume handled by CRI plugin doesn't provide interface for user to handle this. The CRI-API spec defines volumes from kubelet, not image config. If the user uses devicemapper as snapshotter with disk quota, the size of rootfs will be ensured with quota. But volume from image config will be leak point.

cc @wenlxie

Hmm. Reading the other issue (I re-assigned it to cri) it sounds like @Random-Liu already had some ideas for properly handling image specified volumes (report disk usage and /or ask kublet to manage these volumes). Maybe we can recruit him for a discussion at a future sig-node call.

Either way I think this flag provides a good ignore option for now at least until we have a more elaborate solution to map and/or report usage for the image volumes.

@fuweid
Copy link
Member

fuweid commented Jun 10, 2020

Either way I think this flag provides a good ignore option for now at least until we have a more elaborate solution to map and/or report usage for the image volumes.

SGTM

@lorenz
Copy link
Contributor Author

lorenz commented Jun 13, 2020

Is this waiting on something to get merged?

@mikebrow
Copy link
Member

Is this waiting on something to get merged?

Just giving maintainers a sufficient opportunity to weigh in.

@mikebrow mikebrow merged commit b661ad7 into containerd:master Jun 14, 2020
leoluk pushed a commit to monogon-dev/monogon that referenced this pull request Mar 30, 2021
This unbreaks bbolt (as part of containerd) on 1.14+ (see etcd-io/bbolt#201 and
etcd-io/bbolt#220), pulls in my patch to ignore image-defined volumes
(containerd/cri#1504) and gets us some robustness fixes in containerd CNI/CRI integration
(containerd/cri#1405). This also updates K8s at the same time since they share a lot of
dependencies and only updating one is very annoying. On the K8s side we mostly get the standard stream of fixes
plus some patches that are no longer necessary.

One annoying on the K8s side (but with no impact to the functionality) are these messages in the logs of various
components:
```
W0714 11:51:26.323590       1 warnings.go:67] policy/v1beta1 PodSecurityPolicy is deprecated in v1.22+, unavailable in v1.25+
```
They are caused by KEP-1635, but there's not explanation why this gets logged so aggressively considering the operators
cannot do anything about it. There's no newer version of PodSecurityPolicy and you are pretty much required to use it if
you use RBAC.

Test Plan: Covered by existing tests

Bug: T753

X-Origin-Diff: phab/D597
GitOrigin-RevId: f6c447da1de037c27646f9ec9f45ebd5d6660ab0
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants