-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean-up Runtime Contract #4035
Conversation
This change makes numerous cleanups to the runtime contract in an attempt to improve the readability of the document and make the document more useful for the intended auidence. * Moves developer facing statements to a new `runtime-user-guide`. Focuses `runtime-contract` on operator/platform-provider. * Add links to Conformance tests that test Runtime Contract statements. * Corrects, updates, or removes statements to more accurately represent today's Knative runtime. * Updates to informative or removes most untestable statements * Copies in important OCI runtime requirements we previously referenced * Removes reference to OCI specification that didn't bring new requirements. Ref: knative#2539, knative#2973, knative#4014, knative#4027
Co-Authored-By: dgerd <dangerd@google.com>
Co-Authored-By: dgerd <dangerd@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@@ -181,51 +157,39 @@ for purposes of scaling CPU and removing idle containers. | |||
|
|||
#### Protocols and Ports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Summary: http1
and h2c
are supported, http1
by default. No automatic upgrades.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dgerd, tanzeeb If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
## Connection | ||
|
||
A Knative container must package a webserver that can respond to HTTP/1.1 | ||
requests on a specified port. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about http/2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HTTP/2.0 is covered in the Protocols
section.
My understanding is that we expect clients to implement h2c
and not h2
which requires an initial upgrade request over HTTP/1.1. This is why I have set the expectation that responding to an HTTP/1.1 request is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth clarifying that containers can implement h2c with the logic you provided here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I can add this.
/lgtm Will leave |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a slight preference for splitting up functional vs presentational changes in the future, as it allows the two conversations to proceed at a different pace.
Part of the reason for the structure of the existing spec was due to questions asked during the drafting about how different parts of the OCI contract applied -- in some cases, the mechanism was suitable for operator consumption but had multitenancy or security concerns, so needed to be cordoned off from developers. I'm not sure how much these notes add/remove from the spec, but most were added in response to questions. 😁
particularly concerned with the expectations of _developers_ (and _language and | ||
tooling developers_, by extension) running code in the environment. | ||
This document is aimed at _operators_ with the goal of providing a consistent | ||
runtime environment for _developers_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you want to add the additional detail of language and tooling developers being additional roles that are concerned about the environment as consumers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can work that back in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runtime environment for _developers_. | |
runtime environment for _developers_ (who may interact via tools created by _language and tooling developers_). |
`poststop` hooks to platform operators rather than developers. All of these | ||
hooks are defined in the context of the runtime namespace, rather than the | ||
container namespace, and may expose system-level information (and are | ||
container MUST be sent a `SIGTERM` signal when it is killed to allow for a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can at most be a SHOULD, because there are a number of conditions which could reasonable cause a container to stop running without receiving a SIGTERM:
- Kernel SIGKILL due to cgroup resource exhaustion / machine resource exhaustion (particularly memory)
- Kernel SIGKILL due to AppArmor / BPF policy violation
- Operator SIGKILL due to exceptional circumstances
- Machine (hardware) failure
There's also the fun case of network failure isolating or partially-isolating the machine, which may not affect SIGTERM/SIGKILL, but may screw up other shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree here, but I would like to distinguish between this may not happen due to the reason your container is being terminated versus this may vary from platform to platform. Perhaps we can bound the situations for which a SIGTERM
must be sent (i.e. normal autoscale actions or deployment actions). We can work on the exact wording, but thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think saying that "normal orderly shutdown MUST send a SIGTERM
" would be acceptable.
@@ -125,47 +112,36 @@ the OCI specification as long as: | |||
- Platforms MAY provide mechanisms for post-mortem viewing of filesystem | |||
contents from a particular execution. Because containers (particularly failing | |||
containers) can experience frequent starts, operators or platform providers | |||
SHOULD limit the total space consumed by these failures. | |||
- A container should write its own termination message to `/dev/termination-log` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/dev/termination-log
is inherited from the Kubernetes terminationMessagePolicy
, which is pretty helpful for collecting information from crashing containers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it because:
- It is written from the standpoint of the container. Should the container write the termination message on all Knative platforms (i.e. All installs MUST implement and offer
terminationMessagePolicy
), or just some platforms? - We do not have test coverage for this and I couldn't quite find where we actually consume this information and present it back to users in our API. It is not clear to me that we do this at all.
- Most importantly, this seems like an overly limiting runtime requirement that is not in the critical path at the MUST/SHOULD level and does not add much developer value at the MAY level.
I am not stuck on this one and could be convinced to just rewording it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah -- good catch. I don't think we need to keep this, because it could be a scalability / visibility concern to report these up to the control plane.
In particular, what happens if you have a simultaneous crash of 10k Pods -- aggregation termination messages back to the Revision could be fairly expensive, with each Pod crash causing another attempted update of the Revision. It looks like these don't get aggregated from Pod -> ReplicaSet in k8s, so let's follow that precedent.
### Hooks | ||
|
||
Operation hooks SHOULD NOT be configurable by the Knative developer. Operators | ||
or platform providers MAY use hooks to implement their own lifecycle controls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was removed because it's not developer-facing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only hooks I could find here were the OCI hooks ( https://github.com/opencontainers/runtime-spec/blob/master/config.md#posix-platform-hooks ). We already describe these hooks starting on line 90. This was duplicative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, I copied all the sections from OCI, I think.
collected and retained in a developer-accessible logging repository. | ||
A read from the `stdin` file descriptor on the container [SHOULD](https://github.com/knative/serving/blob/master/test/conformance/file_descriptor_test.go) | ||
always result in `EOF`. The `stdout` and `stderr` file descriptors on the container | ||
MUST be collected and retained in a developer-accessible logging repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering about the upgrade from SHOULD to MUST.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a minimum logging requirement that is common across all Knative installs. Right now both /var/log and stdout are SHOULD which means that a developer need to go hunting through the docs or configuration for each Knative install to understand how logs are collected. My understanding is that right now in Knative Serving we collect stdout and stderr for all containers so we are already doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one reason for SHOULD was that it wasn't entirely clear how to test that the logs were actually being retained.
I'm fine upgrading to MUST.
the container as the specified user ID if allowed by the platform (see below). | ||
If no `runAsUser` is specified, a platform-specific default SHALL be used. | ||
Platform Providers SHOULD document this default behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is removing a documentation recommendation.
Would it make sense instead to have a section of the spec recommending the most significant documentation areas where platform providers may differ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to take another stab at updating this after some more recent conversations about how OpenShift handles runAsUser
.
It would be nice to have documentation recommendations and requirements elsewhere so you don't have to go through this entire document to find them all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a <!-- TODO(dgerd): .... -->
here?
|
||
The namespace configuration MUST be provided by the operator or platform | ||
provider; developers or container providers MUST NOT set or assume a particular | ||
namespace configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again I am going to put it back and update it. I was assuming Kubernetes Namespace and not Container Namespace when I was deleting this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is still missing?
This option MAY only be set by the operator or platform provider, and MUST NOT | ||
be configurable by the developer. As masked paths may be part of the platform | ||
security hardening, operators may tune this from time to time as the threat | ||
environment changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, wondering if it's worth documenting where the responsibility for these setting lies, given that they are in the OCI spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section seemed overly limiting and does not seem to prevent the operation of Knative. Why must this not be configurable by the developer? It seemed to me like you could let the developer have control over some of these things. Given that the OCI interface should not be available within the container the only way to configure this would be through the Knative API. An extension of the Knative API to allow certain paths to be masked or read-only didn't seem crazy to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with dropping this.
## Connection | ||
|
||
A Knative container must package a webserver that can respond to HTTP/1.1 | ||
requests on a specified port. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth clarifying that containers can implement h2c with the logic you provided here.
|
||
# Logging | ||
|
||
Log statements should be sent to `stdout` and `stderr`. Log statements sent to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to include /var/log or /dev/log as alternatives to the streams. /dev/log is particularly nice, as it natively supports multiline log statements (but I think our current fluentd configuration does not provide /dev/log).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not guaranteed to be in a knative install. They are listed as SHOULD items.
/hold Will look at this tomorrow |
* Describe current behavior for HTTP1.1 and HTTP2 * Provide links to relevant tests for keywords Ref: knative#4035 Fixes: knative#4283
the Knative cluster. | ||
- **Language and tooling developers** typically write tools used by | ||
_developers_ to package code into containers. As such, they are concerned | ||
that tooling which wraps developer code complies with this runtime contract. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a section clarifying that this is descriptive advice rather than prescriptive requirements? "Your container will fit better if it follows the following guidance..."
particularly concerned with the expectations of _developers_ (and _language and | ||
tooling developers_, by extension) running code in the environment. | ||
This document is aimed at _operators_ with the goal of providing a consistent | ||
runtime environment for _developers_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runtime environment for _developers_. | |
runtime environment for _developers_ (who may interact via tools created by _language and tooling developers_). |
`poststop` hooks to platform operators rather than developers. All of these | ||
hooks are defined in the context of the runtime namespace, rather than the | ||
container namespace, and may expose system-level information (and are | ||
container MUST be sent a `SIGTERM` signal when it is killed to allow for a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think saying that "normal orderly shutdown MUST send a SIGTERM
" would be acceptable.
[SHOULD](https://github.com/knative/serving/blob/master/test/conformance/container_test.go) | ||
restrict the use of `prestart`, `poststart`, and `poststop` hooks to platform | ||
operators rather than developers. All of these hooks are defined in the context | ||
of the runtime namespace, rather than the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/runtime namespace/host runtime namespace/ to be extra clear?
non-portable). | ||
- Failures of the developer-specified process MUST be logged to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this was removed because it's not currently testable?
Can we leave a comment suggesting this for future inclusion once we have appropriate log infrastructure in place? I think it would be testable if we had a standard log-tail interface:
- start tailing
- Deploy container which runs
echo "Not gonna happen" 1>&2; exit 1
- Check for string in tail results.
@@ -125,47 +112,36 @@ the OCI specification as long as: | |||
- Platforms MAY provide mechanisms for post-mortem viewing of filesystem | |||
contents from a particular execution. Because containers (particularly failing | |||
containers) can experience frequent starts, operators or platform providers | |||
SHOULD limit the total space consumed by these failures. | |||
- A container should write its own termination message to `/dev/termination-log` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah -- good catch. I don't think we need to keep this, because it could be a scalability / visibility concern to report these up to the control plane.
In particular, what happens if you have a simultaneous crash of 10k Pods -- aggregation termination messages back to the Revision could be fairly expensive, with each Pod crash causing another attempted update of the Revision. It looks like these don't get aggregated from Pod -> ReplicaSet in k8s, so let's follow that precedent.
### Hooks | ||
|
||
Operation hooks SHOULD NOT be configurable by the Knative developer. Operators | ||
or platform providers MAY use hooks to implement their own lifecycle controls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, I copied all the sections from OCI, I think.
|
||
### Operations | ||
|
||
[The OCI interface](https://github.com/opencontainers/runtime-spec/blob/v1.0.0-rc3/runtime.md#operations) | ||
SHOULD NOT be exposed within the container. The operator or platform provider | ||
MAY have the ability to directly interact with the OCI interface, but that is | ||
beyond the scope of this specification. | ||
MAY have the ability to directly interact with the OCI interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@evankanderson @dgerd did we want to reword the OCI interface parts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Dan was saying that we could probe at some well-known locations, but couldn't definitively guarantee that this was not exposed.
I'm fine with leaving as-is or moving to the developer guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished my pass; a few missed items from last time and some more discussion, but I think many of these can be closed with <!-- TODO(dgerd): ... -->
comments rather than needing to get the final details right in this pass.
Observability in particular I expect to improve over the next 3-5 months, so there may be some better answers there soon.
collected and retained in a developer-accessible logging repository. | ||
A read from the `stdin` file descriptor on the container [SHOULD](https://github.com/knative/serving/blob/master/test/conformance/file_descriptor_test.go) | ||
always result in `EOF`. The `stdout` and `stderr` file descriptors on the container | ||
MUST be collected and retained in a developer-accessible logging repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one reason for SHOULD was that it wasn't entirely clear how to test that the logs were actually being retained.
I'm fine upgrading to MUST.
#### Dev symbolic links | ||
|
||
As specified by OCI. | ||
The following symbolic links MUST exist within the container: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did the OCI reference end up?
@@ -181,51 +157,39 @@ for purposes of scaling CPU and removing idle containers. | |||
|
|||
#### Protocols and Ports | |||
|
|||
The container MUST accept HTTP/1.1 requests from the environment. The | |||
environment | |||
[SHOULD offer an HTTP/2.0 upgrade option](https://http2.github.io/http2-spec/#discover-http) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followed up there -- empty == HTTP/1.1 with later upgrade option seems reasonable.
Only one inbound `containerPort` SHALL be specified in the | ||
The developer MAY specify a container port value at deployment; if the developer does not specify a port, the platform provider | ||
[MUST](https://github.com/knative/serving/blob/master/test/conformance/service_test.go) provide a default. | ||
Only one inbound `containerPort` [SHALL](https://github.com/knative/serving/blob/master/test/conformance/container_test.go) be specified in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not 100% clear here whether not specifying a port
in the spec (and just responding to $PORT
in the container) is or should be acceptable.
Thoughts?
of its source, the selected port will be made available in the `PORT` | ||
environment variable. | ||
|
||
The platform provider SHOULD configure the platform to perform HTTPS termination |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a comment on HTTPS termination yet.
This option MAY only be set by the operator or platform provider, and MUST NOT | ||
be configurable by the developer. As masked paths may be part of the platform | ||
security hardening, operators may tune this from time to time as the threat | ||
environment changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with dropping this.
- name: "h2c" | ||
containerPort: 8081 | ||
... | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define here or in connection that the proxy terminates SSL and may convert HTTP versions automatically, so the user container only needs to speak a single HTTP version.
(I found it strange to have "Connection" and "Protocols" so far separated, FWIW.
persistent data as it is not retained in the event of container termination. | ||
|
||
It is recommended to store temporary state such as local caches or working data in `/tmp` | ||
as this space is guaranteed to be present and mounted Read-Write. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably also worth pointing out here the option of off-node storage via API interfaces like Memcache/Redis, and recommending that for session storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular:
- Sessions may not be sticky; subsequent requests from the same user may not arrive at the same container.
- Local disk state for session storage management may be lost when containers are scaled down (possibly to zero).
It is recommended to store temporary state such as local caches or working data in `/tmp` | ||
as this space is guaranteed to be present and mounted Read-Write. | ||
|
||
# Probes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this H1 when the other sections above are H2?
# Probes | ||
|
||
By default, the readiness of a Knative container is determined by a successful TCP | ||
response on the specified port for incoming traffic. If this readiness check is not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/response/connection/
Some of this has merged. Some of this is in conflict now. Going to close this and get the remaining updates put in separately. |
This change makes numerous cleanups to the runtime contract in an
attempt to improve the readability of the document and make the document
more useful for the intended audience.
runtime-user-guide
.Focuses
runtime-contract
on operator/platform-provider.today's Knative runtime.
requirements.
Ref: #2539, #2973, #4014, #4027
/cc @evankanderson @mattmoor @tanzeeb @markusthoemmes @vaikas-google @tcnghia