Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executor support via Kubelet API (in lieu of docker) #902

Closed
jessesuen opened this issue Jul 4, 2018 · 10 comments
Closed

Executor support via Kubelet API (in lieu of docker) #902

jessesuen opened this issue Jul 4, 2018 · 10 comments
Milestone

Comments

@jessesuen
Copy link
Member

Is this a BUG REPORT or FEATURE REQUEST?: FEATURE REQUEST

What happened:

We need to support other runtimes other than docker (e.g. containerd). The container runtime in the executor has already been abstracted into an interface in anticipation for this need. The work here is to explore the undocumented kubelet API and implement the following interface using the kubelet API:

// ContainerRuntimeExecutor is the interface for interacting with a container runtime (e.g. docker)
type ContainerRuntimeExecutor interface {
	// GetFileContents returns the file contents of a file in a container as a string
	GetFileContents(containerID string, sourcePath string) (string, error)

	// CopyFile copies a source file in a container to a local path
	CopyFile(containerID string, sourcePath string, destPath string) error

	// GetOutput returns the entirety of the container output as a string
	// Used to capturing script results as an output parameter
	GetOutput(containerID string) (string, error)

	// Wait for the container to complete
	Wait(containerID string) error

	// Kill a list of containerIDs first with a SIGTERM then with a SIGKILL after a grace period
	Kill(containerIDs []string) error
}

The workflow-controller-configmap will need to have an option to use kubelet, and workflowpod.go will need to consult this setting when constructing the pod spec (skipping the docker.sock volume mounts)

@jessesuen
Copy link
Member Author

Had some more internal discussions about this, and we feel there's a path forward where we can use k8s APIs to achieve much of the same functionality we are currently getting with docker.sock access. Admittedly, relying on docker.sock is not ideal and will become less acceptable as more clusters are configured with tighter security and OPA, such as in #942.

Here would be the alternatives to the interface needing to be implemented:

  • GetFileContents - kubectl cp API
  • CopyFile - kubectl cp API
  • GetOutput - kubectl logs API
  • Wait - workflow-controller informer would signaling the argoexec sidecar about main container completion
  • Kill - kubectl exec from controller or sidecar, which performs sh -c "kill 1". May not work with scratch containers with static binaries.

@jessesuen jessesuen changed the title Executor support via kubelet API (in lieu of docker) Executor support via Kubernetes API (in lieu of docker) Aug 9, 2018
@andreimc
Copy link
Contributor

Would this mean that argo wouldn't work in cri-o/containerd installs?

@jessesuen
Copy link
Member Author

Would this mean that argo wouldn't work in cri-o/containerd installs?

Actually the opposite. By moving up the stack to Kubelet and/or Kubernetes API Server, it will accommodate all runtimes.

@JulienBalestra reached out and mentioned he has an upcoming kubelet implementation to contribute.

In the end, I think we just may need to provide different options since there are tradeoffs to each approach.

docker - least secure, scalable, but does not support any other runtimes. This potentially can be deprecated and/or replaced with a kubelet implementation.

kubelet - I'm still trying to understand the security implications, how kubelet authorizes and works with K8s RBAC, but definitely more secure than docker, supports all runtimes, scalability would be similar to docker, since it would communicate to own node.

k8s API - secure, supports all runtimes, least scalable since workflow pods will be copying artifacts, retrieving logs through the API server, which would then become a bottleneck.

@JulienBalestra
Copy link
Contributor

JulienBalestra commented Aug 10, 2018

I hope I'll give enough details to help on this.

The kubelet can proceed to a (service account) bearer token validation with the webhook feature.
Basically the client set the service account bearer token in the header to query the kubelet api directly.

This is properly covered by the RBAC like:

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: kubelet
rules:
- apiGroups: [""]
  resources: ["nodes/log", "nodes/proxy"]
  verbs: ["get", "list"]

The kubelet authNZ is documented here.
The kubelet is basically doing an authorization query to the apiserver and cache the response with a default configurable TTL.

The challenges I see in this implementation is the communication with the kubelet endpoint:

1) find the endpoint

  • downwardAPI - status.hostIP
  • default HTTPS port - 10250

2) valid certificates or assume an insecure HTTPS behavior

This could become more complex because there are a lot of possible setups:

  • self-sign
  • TLSBootstrap prior to 1.11, there is --hostname-override as only subject alternative name so the downwardAPI has to provide status.nodeName instead of status.hostIP. The given name has to be resolvable from the Pod. This could increase the number of DNS queries, especially when users have to set a cloud provider hostname.
  • third party issue. The kubelet certificates have to be trusted by the provided ca.crt in service account (controller-manager --root-ca-file).

The kubernetes approach should be easier but least scalable as @jessesuen mentioned

The APIServer will have more pressure from:

  • GetFileContents
  • CopyFile
  • GetOutput

By default these exec calls go through the apiserver unless the setup is configured to upgrade the websocket connection directly to the container runtime, but it's rarely the case, I suppose 🤔

The Wait could be fine if we already have the node and namespace where we are supposed to find the given containerID. After that, we need to open a watch to the Pod's containerID.

I think we can agree on providing the kubelet and the kubernetes approach.

@jessesuen jessesuen changed the title Executor support via Kubernetes API (in lieu of docker) Executor support via Kubelet API (in lieu of docker) Aug 10, 2018
@jessesuen
Copy link
Member Author

@edlee2121
Copy link
Contributor

edlee2121 commented Aug 10, 2018

For security/audit purposes, it would be desirable to limit use of kubelet to just the performance intensive read-only operations (GetFileContents, CopyFile, GetOutput) and use api server for the rest.

Perhaps we should consider implementing our own service, which could proxy kubelet, to support these operations. Then we could be more independent of kubelet and can better limit privileges to just the ones we need.

@JulienBalestra
Copy link
Contributor

@edlee2121 it's still possible to get some auditing with the apiserver SubjectAccessReview calls from the kubelet itself.

Like:

 {
  "kind": "SubjectAccessReview",
  "apiVersion": "authorization.k8s.io/v1beta1",
  "metadata": {
    "creationTimestamp": null
  },
  "spec": {
    "resourceAttributes": {
      "verb": "get",
      "version": "v1",
      "resource": "nodes",
      "subresource": "proxy",
      "name": "$nodeName"
    },
    "user": "system:serviceaccount:default:argo",
    "group": [
      "system:serviceaccounts",
      "system:serviceaccounts:default",
      "system:authenticated"
    ],
    "uid": "46ea9cea-a0c2-11e8-b0b8-5404a66983a9"
  },
  "status": {
    "allowed": true,
    "reason": "allowed by ClusterRoleBinding \"argo-admin\" of ClusterRole \"cluster-admin\" to ServiceAccount \"argo/default\""
  }
}

@edlee2121
Copy link
Contributor

@JulienBalestra Cool!

@JulienBalestra
Copy link
Contributor

As #952 is merged, I'll try to make progress on the pure Kubernetes API integration.

@jessesuen jessesuen added this to the v2.2 milestone Aug 25, 2018
@jessesuen
Copy link
Member Author

Implemented in 5739436

@JulienBalestra let's create a new issue for it. I'll go ahead and close this one. Thanks again!

icecoffee531 pushed a commit to icecoffee531/argo-workflows that referenced this issue Jan 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants