-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
seccomp: add default profile #660
Conversation
/cc @kubernetes/sig-node-proposals @kubernetes/sig-auth-proposals |
@timstclair: These labels do not exist in this repository: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
How will changes to the default profile work? It will be tricky to do in a way that doesn't risk breaking people on upgrade. Maybe the profile should be versioned? That's still not enough though, because changing defaults is hard... Upgrading a node shouldn't automatically replace the profile. |
The profile shouldn't need major changes once we deploy it. The only
changes to the docker one was removing super old syscalls we didn't need to
begin with and then whenever a vuln comes out I try to make sure that the
default protects it, but like in the case of dirty cow it would have
required blocking madvise MADV_DONT_NEED and that would have broken
everything that has glibc so obviously we didn't block it.
Basically if the profile breaks someone on changing it there is a problem
with the profile... in that it should not have blocked essentially
|
IIRC the use of any profile is a change unless folks are using PSP already (they probably aren't). So we'll need to solve the upgrade issue early. |
Yeah well the initial setting of a default was always going to be A Thing™️
it was when we did it in docker and obviously would be here. But like I
said on the other issues about this, when we did it with docker I tested
EVERY public dockerfile and all the official image and was still surprised
when we didn't break anyone. Obviously I'm not claiming that we will be so
lucky but it's a good sign.
…On May 24, 2017 17:38, "Paul Weil" ***@***.***> wrote:
How will changes to the default profile work?
The only
changes to the docker one
IIRC the use of any profile
<https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockershim/helpers.go#L205>
is a change
<https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockershim/helpers.go#L60>
unless folks are using PSP already (they probably aren't). So we'll need to
solve the upgrade issue early.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#660 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABYNbIZjbx_2ibUx6jML8GGsU0vj8E8yks5r9KNsgaJpZM4Nll-6>
.
|
Yeah, I think we're pretty safe. If the solution is "add the annotation and use unconfined again" I don't think that's a bad thing. It's just something folks need to be aware of. |
Yeah definitely will need education and making sure people know what's
happening :), I will also add more copy to the proposal about being aflble
to turn off with unconfined
…On May 24, 2017 17:50, "Paul Weil" ***@***.***> wrote:
Obviously I'm not claiming that we will be so
lucky but it's a good sign.
Yeah, I think we're pretty safe. If the solution is "add the annotation
and use unconfined again" I don't think that's a bad thing. It's just
something folks need to be aware of.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#660 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABYNbFirTdeOZCbMWli4omGY9_z6WeQYks5r9KX_gaJpZM4Nll-6>
.
|
Well now it's definitely going to need changes ;-) Hopefully if we get there though, we'll at least have first class support at the API level. The docker default is represented in the annotation as |
Ya I was unsure of naming... I am def open to either haha
As far as deployment if you mean how will it get to the docker API is I
figured it would pass it directly. That way we don't have to worry about
saving to disk which was something we were just doing. The default can be
written in Go and passed directly to docker. Also helps us modify it on the
fly for adding capabilities etc
|
By deployment I meant how will we get the file on the nodes, but if we pass the spec through the CRI directly (rather than file path) then we don't need a file, and can just hard code it in the Kubelet (well, we could do that either way, but passing paths would require us to write it to disk somewhere). This has some crossover with the discussion on kubernetes/kubernetes#46332. |
Yeah I don't think there is any need to save on disk... that way also we
know the version is tied to that of the k8s version
|
I think we should spec the seccomp format first before this. See kubernetes/kubernetes#39128 for some spec ideas. |
I can spec it out
…On Thu, May 25, 2017 at 3:08 AM, Pengfei Ni ***@***.***> wrote:
I think we should spec the seccomp format first before this. See
kubernetes/kubernetes#39128
<kubernetes/kubernetes#39128> for some spec
ideas.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#660 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABYNbEnDJfJQyMXtGd1DYdbmcopcEFalks5r9SkIgaJpZM4Nll-6>
.
--
Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu <http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3>
|
Signed-off-by: Jess Frazelle <acidburn@google.com>
### Spec | ||
|
||
We will start from the OCI specification. This API resource will be added to | ||
`settings.k8s.io` as an `alpha` resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just put this as a place holder but I have no idea where it belongs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also just added a spec but I am aware it needs more info with interaction with pods/containers and obviously examples but wanted to get early feedback, can add the rest as I get some time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to the initial version of the seccomp proposal that we wrote, so I think we should think about the same questions of whether we think we want to have an API resource complex enough to model a fully materialize seccomp profile. Personally, I think there's definitely some value in doing so.
I do think you might want to split some of the updates into a separate PR and/or split the new content into a distinct proposal, see comments.
FreeBSD has a seccomp/capability-like facility called | ||
[Capsicum](https://www.freebsd.org/cgi/man.cgi?query=capsicum&sektion=4). | ||
|
||
|
||
## Proposed Design | ||
|
||
### Seccomp API Resource? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your proposal alters this text about API resources a bit, and I wonder if this ought to be a new proposal (as we have tended to do in the past when making proposals for new additions to a feature). So, we should probably either add some new text here that continues the narrative about what API resource we're adding, or split your net-new changes into a separate follow-on proposal.
* [docker/22109](https://github.com/docker/docker/issues/22109): composable | ||
seccomp filters | ||
* [docker/21105](https://github.com/docker/docker/issues/22105): custom | ||
seccomp filters for builds | ||
|
||
#### rkt / appcontainers | ||
Implementation details: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good stuff; you may want to split the updates to the content out into a separate PR, since they're non-controversial and can easily be merged without depending on the new content here being accepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya I was unsure what the protocol was for when things change but can totally split it out :)
`settings.k8s.io` as an `alpha` resource. | ||
|
||
``` | ||
// Seccomp represents syscall restrictions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear whether this is global or namespaced - seems like namespaced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wdyt should it be namespaced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda feels a lot like storage classes which are not namespaced. @pmorie what makes it feel like namespaced? Multi-tenancy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no preference on namespaced or un-namespaced :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would vote for non-namespaced, to match PodSecurityPolicy. It seems like something a ClusterAdmin should have control over... defining seccomp policies and then controlling who has permission to use them.
If an admin of a namespace has permission to create (and then use) new seccomp policies, they could potentially use known vulnerabilities in a multi-tenant cluster, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for resources to be both namespaced & unnamespaced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for resources to be both namespaced & unnamespaced?
no
cc @destijl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few questiosn for you. Will be good to get this in.
| `keyctl` | Prevent containers from using the kernel keyring, which is not namespaced. | | ||
| `lookup_dcookie` | Tracing/profiling syscall, which could leak a lot of information on the host. Also gated by `CAP_SYS_ADMIN`. | | ||
| `mbind` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. | | ||
| `mount` | Deny mounting, already gated by `CAP_SYS_ADMIN`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably being dense, but what do you mean as "gated by"? Will the gates allow mount for instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meaning it is also controlled by CAP_SYS_ADMIN
aka CAP_SYS_ADMIN
is required to mount regardless
| `CAP_DAC_READ_SEARCH`| open_by_handle_at | | ||
| `CAP_IPC_LOCK` | mlock, mlock2, mlockall | | ||
| `CAP_SYS_ADMIN` | name_to_handle_at, bpf, clone, fanotify_init, lookup_dcookie, mount, perf_event_open, setdomainname, sethostname, setns, umount, umount2, unshare | | ||
| `CAP_SYS_BOOT` | reboot | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only one that I would be raise a concern about. Do we have a usecase for reboot? But maybe I am overthinking it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's been in docker's default and there have been no complaints, do you reboot in a container, where is your cause for concern, just curious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this reboot the node or the container? And frankly I have never typed reboot inside a container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's referring to the reboot syscall, which is not for containers because they aren't real :) http://man7.org/linux/man-pages/man2/reboot.2.html
`settings.k8s.io` as an `alpha` resource. | ||
|
||
``` | ||
// Seccomp represents syscall restrictions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda feels a lot like storage classes which are not namespaced. @pmorie what makes it feel like namespaced? Multi-tenancy?
cc @davidopp |
I think I agree with CJ's reasoning. Another point for namespaced vs
cluster scoped - is the cardinality of the resource such that each
application might need their own profile (applications legitimately need to
suggest a profile in order to work) or the vast majority of clusters and
applications will all use one or two profiles that are equivalent with
special cases going through security review that is already centralized per
cluster.
Anything namespaced is more convenient for app authors, anything cluster
scoped is more convenient for admins to control (slightly).
…On Tue, Aug 22, 2017 at 3:54 PM, Brian Grant ***@***.***> wrote:
cc @davidopp <https://github.com/davidopp>
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#660 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p5UbD666fkfzNImWVvQeV307IfsLks5sazH9gaJpZM4Nll-6>
.
|
@jessfraz PR needs rebase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jessfraz will you have bandwidth to drive this to completion in 1.10, or should we look for another owner?
Also, seccomp profiles can be layered, right? (Is no_new_privs
required for that?) It might be useful to have a cluster-admin enforced default policy, but still allow devs to further tighten the restrictions with a custom profile layered on top.
|
||
If `capAdd` is used on a Container, the default profile will be adjusted to | ||
interact accordingly with the capability added. These are documented below in | ||
a table by the cap being added: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these capabilities are in the default set (e.g. CAP_CHOWN
) - does that mean those syscalls would be allowed by default, or only if CAP_CHOWN is explicitly added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, the defaults would stay the same this was merely a reference.
`settings.k8s.io` as an `alpha` resource. | ||
|
||
``` | ||
// Seccomp represents syscall restrictions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for resources to be both namespaced & unnamespaced?
I will update this in the next few days I have bandwidth sorry for the delay. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@jessfraz @tallclair Is this proposal still active? It's currently been rotting, and is slated to close 30d from now. |
yes i just need to find the bandwidth to push it through
…On Tue, Mar 20, 2018 at 2:42 PM, Christoph Blecker ***@***.*** > wrote:
@jessfraz <https://github.com/jessfraz> @tallclair
<https://github.com/tallclair> Is this proposal still active? It's
currently been rotting, and is slated to close 30d from now.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#660 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABYNbIW6T1AYRiahHau02B98ehvIuWZwks5tgU2igaJpZM4Nll-6>
.
--
Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu <http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3>
|
@jessfraz Sounds good. I'll remove the label so this stays open /remove-lifecycle rotten |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Updates kubernetes/kubernetes#39845
Updates kubernetes/kubernetes#20870
Feature issue kubernetes/enhancements#135
This adds the notion of a default seccomp profile for kubernetes to the alpha seccomp support.
It details the various interactions with capabilities added to containers and covers some of the important syscalls that will be blocked by default.
Kinda related to the other sane hardening defaults set in #639 as well.
cc @timstclair