Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEP-0011: Redirecting Step Output Streams #182

Merged
merged 1 commit into from
Nov 2, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
221 changes: 221 additions & 0 deletions teps/0011-redirecting-step-output-streams.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
---
title: redirecting-step-output-streams
authors:
- "@chhsia0"
creation-date: 2020-08-17
last-updated: 2020-11-02
status: implementable
---

# TEP-0011: Redirecting Step Output Streams

<!-- toc -->
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Requirements](#requirements)
- [Proposal](#proposal)
- [Option 1: Allowing Users to Specify Redirection Paths](#option-1-allowing-users-to-specify-redirection-paths)
- [Example Usage](#example-usage)
- [Risks and Mitigations](#risks-and-mitigations)
- [Option 2: Redirecting to Canonical Conventional Paths](#option-2-redirecting-to-canonical-conventional-paths)
- [Example Usage](#example-usage-1)
- [Risks and Mitigations](#risks-and-mitigations-1)
- [Design Details](#design-details)
- [Alternatives](#alternatives)
- [References](#references)
<!-- /toc -->

## Summary

Consuming outputs of a step in another step is a common pattern in writing Tasks. However, this is currently tedious to do. Task authors have to overwrite the image entrypoint with either `sh -c` or the `script` field to wrap up the command to run with an explicit output stream redirection. This is not even possible if the image does not come with a shell.

To achieve the functionality of output stream redirection between steps and even Tasks, Tekton will add new fields to steps for Task authors to specify paths to redirect stdout/stderr to.

## Motivation

This TEP extends Tekton Pipelines to support the following use cases:

* Allow Task authors to run image `gcr.io/k8s-staging-boskos/boskosctl` with args in a step and process its output through another `jq` image in a subsequent step to acquire the project name without overwriting the image entrypoint, building custom image with both utilities, or looking into container logs ([tektoncd/pipeline#2925](https://github.com/tektoncd/pipeline/issues/2925)). Generally speaking, this allows Task authors to apply [Unix philosophy](https://en.wikipedia.org/wiki/Unix_philosophy) to container images using steps and make multiple images work together in a Task.

* Allow Task authors to run images not controlled by Task authors and still be able to use Task results (and potentially other path-based features such as output resources). It is common for users to use some third-party “official” utility images instead of maintaining their own fork to include a shell, and thus impossible to ensure that there is always a shell to use the script for that shell. In the current Tekton API, Task results cannot be used with certain images (e.g., images without a shell and whose entrypoint does not provide an option to write outputs to files), so there is an incompleteness in the API, and this TEP proposes a way to address this limitation.

* Allow tool developers to create CI pipeline tooling that can use Task results on *any* image specified by the end users without the above limitations. As Task authors, one can choose which images to use to work around the limitations. But as a tool developer, one cannot or should not control what images their end users can use, and cannot assume any details about the images.

### Goals

1. Allow the stdout/stderr of a step to be consumed by another step.
chhsia0 marked this conversation as resolved.
Show resolved Hide resolved
1. Enable a user to configure the path where the output streams are written.

### Non-Goals

1. Parse stdout/stderr of a step into a structured format (e.g., JSON) and extract information from certain fields.

## Requirements

* Add new fields for a step to specify paths to redirect stdout/stderr to.
* Users should be able to observe the output streams through Pod logs even if stdout/stderr redirections are specified. In other words, output streams should be duplicated instead of simply redirected.
* Clearly documents the restrictions when stdout/stderr redirections are set to Task result paths, and encourage Task authors to set the paths to workspace paths, especially if they want to exchange large amount of data between Tasks.
* Provide examples to use step redirection for the use cases mentioned in [Motivation](#motivation).

## Proposal

### Option 1: Allowing Users to Specify Redirection Paths

The following new fields will be added to the `Step` struct:

```go
// StepOutputConfig stores configuration for a step output stream.
type StepOutputConfig {
// Path to duplicate stdout stream to on container's local filesystem.
// +optional
Path string `json:"path,omitempty"`
}

type Step struct {
...
// Stores configuration for the stdout stream of the step.
// +optional
StdoutConfig StepOutputConfig `json:"stdoutConfig"`
// Stores configuration for the stderr stream of the step.
// +optional
StderrConfig StepOutputConfig `json:"stderrConfig"`
chhsia0 marked this conversation as resolved.
Show resolved Hide resolved
}
```

Once `StdoutConfig.Path` or `StderrConfig.Path` is specified, the corresponding output stream will be duplicated to both the given file and the standard output stream of the container, so users can still view the output through the Pod log API. If both `StdoutConfig.Path` and `StderrConfig.Path` are set to the same value, outputs from both streams will be interleaved in the same file, but there will be no ordering guarantee on the data. If multiple steps' `StdoutConfig.Path` fields are set to the same value, the file content will be overwritten by the last outputting step.

Variable substitution will be applied to the new fields, so one could specify `$(results.<name>.path)` to the `StdoutConfig.Path` field to extract the stdout of a step into a Task result. No new variable substitution for accessing the values of `StdoutConfig.Path` and `StderrConfig.Path` fields will be provided so variable substitution can remain single-pass.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yet! :D


#### Example Usage

Redirecting stdout of `boskosctl` to `jq` and publish the resulting `project-id` as a Task result:

```yaml
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: boskos-acquire
spec:
results:
- name: project-id
steps:
- name: boskosctl
image: gcr.io/k8s-staging-boskos/boskosctl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the issue here that boskosctl doesn't include a shell, and so script is not useable, and therefore you'd have to jump through hoops to get its output? Would it be possible to get a shell added to the boskosctl image, perhaps by simply basing it on alpine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not the one using boskosctl. @bobcatfish is ;) And it seems to have a shell.

IMO the problem here is that Tekton shouldn't limit how users build and use images. Especially some images are build to act like command line tools (i.e., comes with decent default entrypoints so running such images has the same UX as running utility programs), and right now if you want to composite multiple utility images to achieve a complex task, you have to ensure that there must be a shell for all images, or the image entrypoints must have an option to put output to a file.

Copy link
Contributor Author

@chhsia0 chhsia0 Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual use case I have is, I'm working on a DSL and a tool for user to run any image, and I can redirect the stdout to either task results or somewhere else, without the need to know the details of an image. How to run the image is provided by the user and completely transparent to my tool. In other words, I don't have control to the image, and thus impossible to ensure that there is always a shell and synthesize a script for that shell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems common for users to prefer some third-party “official” images instead of maintaining their own fork to include a shell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boskosctl is actually a bad example b/c it DOES have a shell, but otherwise: what @chhsia0 said :D

args:
- acquire
chhsia0 marked this conversation as resolved.
Show resolved Hide resolved
- --server-url=http://boskos.test-pods.svc.cluster.local
- --owner-name=christie-test-boskos
- --type=gke-project
- --state=free
- --target-state=busy
stdoutConfig:
path: /data/boskosctl-stdout
volumeMounts:
- name: data
mountPath: /data
- name: parse-project-id
image: imega/jq
args:
- -r
- .name
- /data/boskosctl-stdout
stdoutConfig:
path: $(results.project-id.path)
volumeMounts:
- name: data
mountPath: /data
volumes:
- name: data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why you decided to use volume mounts here - workspaces are intended to replace volumes and volumemounts as much as possible (i would have liked to have removed them entiredly tektoncd/pipeline#2058) b/c they allow volumes to be supplied at runtime vs task authoring time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using volumes in Tasks instead of workspaces provides us a way to encapsulate storage information that's internal to Tasks. IIUC if you use a workspace in a Task, you have to surface that to your Pipeline definition, even if you just want to provide a temporary storage for steps within a Task.

So if the storage not used to pass data between Tasks in a pipeline (e.g., emptyDir, configMap and secret), I prefer using volumes and don't expose that to the Pipeline definition.

Also IMO using emptyDir for a workspace might give people an elusion that the emptyDir is shared.

```

#### Risks and Mitigations

* Users might mistakenly specify paths not shared among steps for redirection. This should be clearly documented. Alternatively, Tekton could put restrictions on `StdoutConfig.Path` or `StderrConfig.Path` or warn users about such misuses.

* If the stdout/stderr of a step is set to the path of a Task result and the step prints too many data, the result manifest would become too large. Currently the entrypoint binary would [fail if that happens](https://github.com/tektoncd/pipeline/blob/v0.15.2/cmd/entrypoint/main.go#L86). We could enhence the error message to provide more information about which step to blame for a termination message bloat to hint users to fix the problem.

### Option 2: Redirecting to Canonical Conventional Paths
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reasonable option that provides both, e.g. the ability to write to a specific location AND the ability to specify an explicit path? e.g.

type Step struct {
    Stdout PathConfig `json:"stdout,omitempty"`
}

type PathConfig struct{
   Auto bool
   Path string
}

So if you dont care where stdout ends up and you just want to pass it between steps you can use:

  - name: boskosctl
    image: gcr.io/k8s-staging-boskos/boskosctl
    ...
    stdout:
      auto: true
  - name: parse-project-id
    image: imega/jq
    args:
    - -r
    - .name
    - $(steps.boskosctl.stdout.path)

But if you do care you can use:

  - name: boskosctl
    image: gcr.io/k8s-staging-boskos/boskosctl
    ...
    stdout:
      path: /data/boskosctl-stdout
  - name: parse-project-id
    image: imega/jq
    args:
    - -r
    - .name
    - $(steps.boskosctl.stdout.path)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This boils down to whether we allow people to use $(workspaces.<name>.path) or $(results.<name>.path) or $(resources.outputs.<name>.path) in the specified path. If yes and if we want to keep a single-pass variable substitution, then we cannot provide $(steps.<name>.stdout.path).


The following new fields will be added to the `Step` struct:

```go
type Step struct {
...
// Whether to capture the stdout stream. If set, the stream will be duplicated to `/tekton/steps/<step_index>/stdout`.
// +optional
Stdout bool `json:"stdout,omitempty"`
// Whether to capture the stderr stream. If set, the stream will be duplicated to `/tekton/steps/<step_index>/stderr`.
// +optional
Stderr bool `json:"stderr,omitempty"`
}
```

Once `Stdout` or `Stderr` is set, the corresponding output stream will be duplicated to both the conventional path indicated above and the standard output stream of the container, so users can still view the output through the Pod log API. Variable substitutions for `$(steps.<name>.stdoutPath)` and `$(steps.<name>.stderrPath)` will be provided if the corresponding field is set to grant users easy access to the conventional paths.

#### Example Usage

Redirecting stdout of `boskosctl` to `jq` and publish the resulting `project-id` as a Task result:

```yaml
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: boskos-acquire
spec:
results:
- name: project-id
steps:
- name: boskosctl
image: gcr.io/k8s-staging-boskos/boskosctl
args:
- acquire
- --server-url=http://boskos.test-pods.svc.cluster.local
- --owner-name=christie-test-boskos
- --type=gke-project
- --state=free
- --target-state=busy
stdout: true
- name: parse-project-id
image: imega/jq
args:
- -r
- .name
- $(steps.boskosctl.stdoutPath)
stdout: true
- name: copy-result
image: alpine
args:
- cp
- $(steps.parse-project-id.stdoutPath)
- $(results.project-id.path)
```

#### Risks and Mitigations

* Task authors cannot configure the redirection path, it would take extra steps to implement the following use cases: 1) write stdout of a step into `.docker/config.json` and consume the generated auth configuration in a subsequent step; or 2) extract stdout into a Task result. In both cases, if the step image provides a shell, users can specify the `Script` field to overwrite the image entrypoint to invoke the original entrypoint command and redirect the stdout into the target file. There are two main drawbacks: 1) information encapsulated by the image (e.g., entrypoint) will be "leaked" into Task specification, meaning that users cannot treat a third-party utility image as a black box and just pass in appropriate arguments, and thus creating a tight coupling between the Task and the image; 2) users will lose the ability to see stdout through Pod log API, unless they maintain a forked image to package a `tee` program.

* If multiple steps output large data and there is a disk limit, users cannot reuse the disk space to store redirected output data.

## Design Details

The following flags will be added to the `entrypoint` command to support I/O redirection of the sub-process:

* `-stdout_path`: If specified, the stdout of the sub-process will be duplicated to the given path on the local filesystem.

* `-stderr_path`: If specified, the stderr of the sub-process will be duplicated to the given path on the local filesystem. It can be set to the same value as `{{stdout_path}}` so both streams are copied to the same file. However, there is no ordering guarantee on data copied from both streams.

A proof-of-concept implementation is presented in [tektoncd/pipeline#3103](https://github.com/tektoncd/pipeline/pull/3103).

## Alternatives

* Parsing stdout/stderr into a structured format ([example](https://github.com/tektoncd/pipeline/issues/2925#issue-654319361)): This approach requires the step image to produce JSON output, which limits what images can be used. It also hides the parsing magic in Tekton, which can be hard to debug if the output is malformed.

* [Allowing subsequent steps to specify a filter expression to apply to step outputs](https://github.com/tektoncd/pipeline/issues/2925#issuecomment-657529820): If there are multiple subsequent "consumer" steps, then either all consumers must use the same filter to save disk space. Also the magic of filtering will be hidden by Tekton from users, creating unnecessary complexity. It is not hard to use `Script` or add an extra step to perform filtering to achieve the same result with more transparency.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another alternative would be to do nothing in particular in tekton and leave this up to the Task and image authors, which effectively means if you want to communicate data between steps, or you want to capture data for results, you need a shell or you need your binaries to support outputting to a file


## References

* Make it possible to extract results from a container's stdout ([tektoncd/pipeline#2925](https://github.com/tektoncd/pipeline/issues/2925)).

* Added `-stdout_file` and `-stderr_file` flags to entrypoint ([tektoncd/pipeline#3103](https://github.com/tektoncd/pipeline/pull/3103)).