Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean-up Runtime Contract #4035

Closed
wants to merge 7 commits into from
Closed

Conversation

dgerd
Copy link

@dgerd dgerd commented May 7, 2019

This change makes numerous cleanups to the runtime contract in an
attempt to improve the readability of the document and make the document
more useful for the intended audience.

  • Moves developer facing statements to a new runtime-user-guide.
    Focuses runtime-contract on operator/platform-provider.
  • Add links to Conformance tests that test Runtime Contract statements.
  • Corrects, updates, or removes statements to more accurately represent
    today's Knative runtime.
  • Updates to informative or removes most untestable statements
  • Copies in important OCI runtime requirements we previously referenced
  • Removes reference to OCI specification that didn't bring new
    requirements.

Ref: #2539, #2973, #4014, #4027

/cc @evankanderson @mattmoor @tanzeeb @markusthoemmes @vaikas-google @tcnghia

This change makes numerous cleanups to the runtime contract in an
attempt to improve the readability of the document and make the document
more useful for the intended auidence.

* Moves developer facing statements to a new `runtime-user-guide`.
Focuses `runtime-contract` on operator/platform-provider.
* Add links to Conformance tests that test Runtime Contract statements.
* Corrects, updates, or removes statements to more accurately represent
today's Knative runtime.
* Updates to informative or removes most untestable statements
* Copies in important OCI runtime requirements we previously referenced
* Removes reference to OCI specification that didn't bring new
requirements.

Ref: knative#2539, knative#2973, knative#4014, knative#4027
@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label May 7, 2019
@knative-prow-robot knative-prow-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 7, 2019
Co-Authored-By: dgerd <dangerd@google.com>
Co-Authored-By: dgerd <dangerd@google.com>
Copy link
Contributor

@tanzeeb tanzeeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@@ -181,51 +157,39 @@ for purposes of scaling CPU and removing idle containers.

#### Protocols and Ports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Summary: http1 and h2c are supported, http1 by default. No automatic upgrades.

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label May 8, 2019
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dgerd, tanzeeb
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: evankanderson

If they are not already assigned, you can assign the PR to them by writing /assign @evankanderson in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

docs/runtime-contract.md Outdated Show resolved Hide resolved
docs/runtime-contract.md Outdated Show resolved Hide resolved
docs/runtime-contract.md Outdated Show resolved Hide resolved
docs/runtime-user-guide.md Outdated Show resolved Hide resolved
docs/runtime-user-guide.md Show resolved Hide resolved
docs/runtime-user-guide.md Outdated Show resolved Hide resolved
docs/runtime-user-guide.md Outdated Show resolved Hide resolved
@knative-prow-robot knative-prow-robot removed the lgtm Indicates that a PR is ready to be merged. label May 8, 2019
docs/runtime-user-guide.md Outdated Show resolved Hide resolved
docs/runtime-user-guide.md Show resolved Hide resolved
docs/runtime-user-guide.md Outdated Show resolved Hide resolved
## Connection

A Knative container must package a webserver that can respond to HTTP/1.1
requests on a specified port.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about http/2?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP/2.0 is covered in the Protocols section.

My understanding is that we expect clients to implement h2c and not h2 which requires an initial upgrade request over HTTP/1.1. This is why I have set the expectation that responding to an HTTP/1.1 request is required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth clarifying that containers can implement h2c with the logic you provided here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I can add this.

docs/runtime-user-guide.md Show resolved Hide resolved
docs/runtime-user-guide.md Show resolved Hide resolved
docs/runtime-user-guide.md Outdated Show resolved Hide resolved
docs/runtime-user-guide.md Show resolved Hide resolved
@mattmoor
Copy link
Member

/lgtm

Will leave /approve for @evankanderson

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label May 10, 2019
Copy link
Member

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight preference for splitting up functional vs presentational changes in the future, as it allows the two conversations to proceed at a different pace.

Part of the reason for the structure of the existing spec was due to questions asked during the drafting about how different parts of the OCI contract applied -- in some cases, the mechanism was suitable for operator consumption but had multitenancy or security concerns, so needed to be cordoned off from developers. I'm not sure how much these notes add/remove from the spec, but most were added in response to questions. 😁

particularly concerned with the expectations of _developers_ (and _language and
tooling developers_, by extension) running code in the environment.
This document is aimed at _operators_ with the goal of providing a consistent
runtime environment for _developers_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you want to add the additional detail of language and tooling developers being additional roles that are concerned about the environment as consumers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can work that back in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
runtime environment for _developers_.
runtime environment for _developers_ (who may interact via tools created by _language and tooling developers_).

`poststop` hooks to platform operators rather than developers. All of these
hooks are defined in the context of the runtime namespace, rather than the
container namespace, and may expose system-level information (and are
container MUST be sent a `SIGTERM` signal when it is killed to allow for a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can at most be a SHOULD, because there are a number of conditions which could reasonable cause a container to stop running without receiving a SIGTERM:

  • Kernel SIGKILL due to cgroup resource exhaustion / machine resource exhaustion (particularly memory)
  • Kernel SIGKILL due to AppArmor / BPF policy violation
  • Operator SIGKILL due to exceptional circumstances
  • Machine (hardware) failure

There's also the fun case of network failure isolating or partially-isolating the machine, which may not affect SIGTERM/SIGKILL, but may screw up other shutdown.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree here, but I would like to distinguish between this may not happen due to the reason your container is being terminated versus this may vary from platform to platform. Perhaps we can bound the situations for which a SIGTERM must be sent (i.e. normal autoscale actions or deployment actions). We can work on the exact wording, but thoughts here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think saying that "normal orderly shutdown MUST send a SIGTERM" would be acceptable.

@@ -125,47 +112,36 @@ the OCI specification as long as:
- Platforms MAY provide mechanisms for post-mortem viewing of filesystem
contents from a particular execution. Because containers (particularly failing
containers) can experience frequent starts, operators or platform providers
SHOULD limit the total space consumed by these failures.
- A container should write its own termination message to `/dev/termination-log`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/dev/termination-log is inherited from the Kubernetes terminationMessagePolicy, which is pretty helpful for collecting information from crashing containers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it because:

  1. It is written from the standpoint of the container. Should the container write the termination message on all Knative platforms (i.e. All installs MUST implement and offer terminationMessagePolicy), or just some platforms?
  2. We do not have test coverage for this and I couldn't quite find where we actually consume this information and present it back to users in our API. It is not clear to me that we do this at all.
  3. Most importantly, this seems like an overly limiting runtime requirement that is not in the critical path at the MUST/SHOULD level and does not add much developer value at the MAY level.

I am not stuck on this one and could be convinced to just rewording it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah -- good catch. I don't think we need to keep this, because it could be a scalability / visibility concern to report these up to the control plane.

In particular, what happens if you have a simultaneous crash of 10k Pods -- aggregation termination messages back to the Revision could be fairly expensive, with each Pod crash causing another attempted update of the Revision. It looks like these don't get aggregated from Pod -> ReplicaSet in k8s, so let's follow that precedent.

### Hooks

Operation hooks SHOULD NOT be configurable by the Knative developer. Operators
or platform providers MAY use hooks to implement their own lifecycle controls.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was removed because it's not developer-facing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only hooks I could find here were the OCI hooks ( https://github.com/opencontainers/runtime-spec/blob/master/config.md#posix-platform-hooks ). We already describe these hooks starting on line 90. This was duplicative.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, I copied all the sections from OCI, I think.

collected and retained in a developer-accessible logging repository.
A read from the `stdin` file descriptor on the container [SHOULD](https://github.com/knative/serving/blob/master/test/conformance/file_descriptor_test.go)
always result in `EOF`. The `stdout` and `stderr` file descriptors on the container
MUST be collected and retained in a developer-accessible logging repository.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering about the upgrade from SHOULD to MUST.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a minimum logging requirement that is common across all Knative installs. Right now both /var/log and stdout are SHOULD which means that a developer need to go hunting through the docs or configuration for each Knative install to understand how logs are collected. My understanding is that right now in Knative Serving we collect stdout and stderr for all containers so we are already doing this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one reason for SHOULD was that it wasn't entirely clear how to test that the logs were actually being retained.

I'm fine upgrading to MUST.

the container as the specified user ID if allowed by the platform (see below).
If no `runAsUser` is specified, a platform-specific default SHALL be used.
Platform Providers SHOULD document this default behavior.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is removing a documentation recommendation.

Would it make sense instead to have a section of the spec recommending the most significant documentation areas where platform providers may differ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to take another stab at updating this after some more recent conversations about how OpenShift handles runAsUser.

It would be nice to have documentation recommendations and requirements elsewhere so you don't have to go through this entire document to find them all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a <!-- TODO(dgerd): .... --> here?


The namespace configuration MUST be provided by the operator or platform
provider; developers or container providers MUST NOT set or assume a particular
namespace configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again I am going to put it back and update it. I was assuming Kubernetes Namespace and not Container Namespace when I was deleting this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is still missing?

This option MAY only be set by the operator or platform provider, and MUST NOT
be configurable by the developer. As masked paths may be part of the platform
security hardening, operators may tune this from time to time as the threat
environment changes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, wondering if it's worth documenting where the responsibility for these setting lies, given that they are in the OCI spec.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section seemed overly limiting and does not seem to prevent the operation of Knative. Why must this not be configurable by the developer? It seemed to me like you could let the developer have control over some of these things. Given that the OCI interface should not be available within the container the only way to configure this would be through the Knative API. An extension of the Knative API to allow certain paths to be masked or read-only didn't seem crazy to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with dropping this.

## Connection

A Knative container must package a webserver that can respond to HTTP/1.1
requests on a specified port.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth clarifying that containers can implement h2c with the logic you provided here.


# Logging

Log statements should be sent to `stdout` and `stderr`. Log statements sent to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to include /var/log or /dev/log as alternatives to the streams. /dev/log is particularly nice, as it natively supports multiline log statements (but I think our current fluentd configuration does not provide /dev/log).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not guaranteed to be in a knative install. They are listed as SHOULD items.

@evankanderson
Copy link
Member

/hold

Will look at this tomorrow

@knative-prow-robot knative-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 3, 2019
dgerd pushed a commit to dgerd/serving that referenced this pull request Jun 6, 2019
* Describe current behavior for HTTP1.1 and HTTP2
* Provide links to relevant tests for keywords

Ref: knative#4035
Fixes: knative#4283
docs/runtime-user-guide.md Show resolved Hide resolved
the Knative cluster.
- **Language and tooling developers** typically write tools used by
_developers_ to package code into containers. As such, they are concerned
that tooling which wraps developer code complies with this runtime contract.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a section clarifying that this is descriptive advice rather than prescriptive requirements? "Your container will fit better if it follows the following guidance..."

particularly concerned with the expectations of _developers_ (and _language and
tooling developers_, by extension) running code in the environment.
This document is aimed at _operators_ with the goal of providing a consistent
runtime environment for _developers_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
runtime environment for _developers_.
runtime environment for _developers_ (who may interact via tools created by _language and tooling developers_).

`poststop` hooks to platform operators rather than developers. All of these
hooks are defined in the context of the runtime namespace, rather than the
container namespace, and may expose system-level information (and are
container MUST be sent a `SIGTERM` signal when it is killed to allow for a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think saying that "normal orderly shutdown MUST send a SIGTERM" would be acceptable.

[SHOULD](https://github.com/knative/serving/blob/master/test/conformance/container_test.go)
restrict the use of `prestart`, `poststart`, and `poststop` hooks to platform
operators rather than developers. All of these hooks are defined in the context
of the runtime namespace, rather than the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/runtime namespace/host runtime namespace/ to be extra clear?

non-portable).
- Failures of the developer-specified process MUST be logged to a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this was removed because it's not currently testable?

Can we leave a comment suggesting this for future inclusion once we have appropriate log infrastructure in place? I think it would be testable if we had a standard log-tail interface:

  • start tailing
  • Deploy container which runs echo "Not gonna happen" 1>&2; exit 1
  • Check for string in tail results.

@@ -125,47 +112,36 @@ the OCI specification as long as:
- Platforms MAY provide mechanisms for post-mortem viewing of filesystem
contents from a particular execution. Because containers (particularly failing
containers) can experience frequent starts, operators or platform providers
SHOULD limit the total space consumed by these failures.
- A container should write its own termination message to `/dev/termination-log`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah -- good catch. I don't think we need to keep this, because it could be a scalability / visibility concern to report these up to the control plane.

In particular, what happens if you have a simultaneous crash of 10k Pods -- aggregation termination messages back to the Revision could be fairly expensive, with each Pod crash causing another attempted update of the Revision. It looks like these don't get aggregated from Pod -> ReplicaSet in k8s, so let's follow that precedent.

### Hooks

Operation hooks SHOULD NOT be configurable by the Knative developer. Operators
or platform providers MAY use hooks to implement their own lifecycle controls.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, I copied all the sections from OCI, I think.


### Operations

[The OCI interface](https://github.com/opencontainers/runtime-spec/blob/v1.0.0-rc3/runtime.md#operations)
SHOULD NOT be exposed within the container. The operator or platform provider
MAY have the ability to directly interact with the OCI interface, but that is
beyond the scope of this specification.
MAY have the ability to directly interact with the OCI interface.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evankanderson @dgerd did we want to reword the OCI interface parts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Dan was saying that we could probe at some well-known locations, but couldn't definitively guarantee that this was not exposed.

I'm fine with leaving as-is or moving to the developer guide.

Copy link
Member

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished my pass; a few missed items from last time and some more discussion, but I think many of these can be closed with <!-- TODO(dgerd): ... --> comments rather than needing to get the final details right in this pass.

Observability in particular I expect to improve over the next 3-5 months, so there may be some better answers there soon.

collected and retained in a developer-accessible logging repository.
A read from the `stdin` file descriptor on the container [SHOULD](https://github.com/knative/serving/blob/master/test/conformance/file_descriptor_test.go)
always result in `EOF`. The `stdout` and `stderr` file descriptors on the container
MUST be collected and retained in a developer-accessible logging repository.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one reason for SHOULD was that it wasn't entirely clear how to test that the logs were actually being retained.

I'm fine upgrading to MUST.

#### Dev symbolic links

As specified by OCI.
The following symbolic links MUST exist within the container:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did the OCI reference end up?

@@ -181,51 +157,39 @@ for purposes of scaling CPU and removing idle containers.

#### Protocols and Ports

The container MUST accept HTTP/1.1 requests from the environment. The
environment
[SHOULD offer an HTTP/2.0 upgrade option](https://http2.github.io/http2-spec/#discover-http)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed up there -- empty == HTTP/1.1 with later upgrade option seems reasonable.

Only one inbound `containerPort` SHALL be specified in the
The developer MAY specify a container port value at deployment; if the developer does not specify a port, the platform provider
[MUST](https://github.com/knative/serving/blob/master/test/conformance/service_test.go) provide a default.
Only one inbound `containerPort` [SHALL](https://github.com/knative/serving/blob/master/test/conformance/container_test.go) be specified in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not 100% clear here whether not specifying a port in the spec (and just responding to $PORT in the container) is or should be acceptable.

Thoughts?

of its source, the selected port will be made available in the `PORT`
environment variable.

The platform provider SHOULD configure the platform to perform HTTPS termination
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a comment on HTTPS termination yet.

This option MAY only be set by the operator or platform provider, and MUST NOT
be configurable by the developer. As masked paths may be part of the platform
security hardening, operators may tune this from time to time as the threat
environment changes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with dropping this.

- name: "h2c"
containerPort: 8081
...
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define here or in connection that the proxy terminates SSL and may convert HTTP versions automatically, so the user container only needs to speak a single HTTP version.

(I found it strange to have "Connection" and "Protocols" so far separated, FWIW.

persistent data as it is not retained in the event of container termination.

It is recommended to store temporary state such as local caches or working data in `/tmp`
as this space is guaranteed to be present and mounted Read-Write.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably also worth pointing out here the option of off-node storage via API interfaces like Memcache/Redis, and recommending that for session storage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular:

  1. Sessions may not be sticky; subsequent requests from the same user may not arrive at the same container.
  2. Local disk state for session storage management may be lost when containers are scaled down (possibly to zero).

It is recommended to store temporary state such as local caches or working data in `/tmp`
as this space is guaranteed to be present and mounted Read-Write.

# Probes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this H1 when the other sections above are H2?

# Probes

By default, the readiness of a Knative container is determined by a successful TCP
response on the specified port for incoming traffic. If this readiness check is not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/response/connection/

@dgerd
Copy link
Author

dgerd commented Jul 29, 2019

Some of this has merged. Some of this is in conflict now. Going to close this and get the remaining updates put in separately.

@dgerd dgerd closed this Jul 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Indicates the PR's author has signed the CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants