Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEP-0011: Redirecting Step Output Streams #182

Merged
merged 1 commit into from
Nov 2, 2020

Conversation

chhsia0
Copy link
Contributor

@chhsia0 chhsia0 commented Aug 17, 2020

TEP for pipeline#2925 and pipeline#3103.

Consuming outputs of a step in another step is a common pattern in writing Tasks. However, this is currently tedious to do. Task authors have to overwrite the image entrypoint with either sh -c or the script field to wrap up the command to run with an explicit output stream redirection. This is not even possible if the image does not come with a shell, e.g., images built by ko.

To achieve the functionality of output stream redirection between steps and even Tasks, this TEP proposes to add new fields to steps for Task authors to specify paths to store stdout/stderr to.

@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 17, 2020
teps/0011-redirecting-step-inputs-and-outputs.md Outdated Show resolved Hide resolved

### Risks and Mitigations

* Users might mistakenly specify paths not shared among steps for redirection. Potentially Tekton could implement a validation to warn users about such misuses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, that might be a problem. We may want to restrict it to workspaces but it's gonna be tricky to validate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an argument for just always writing stdout/stderr to some Tekton-owned path and documenting where that is. If a later step wants to read it, they can.

Copy link
Contributor Author

@chhsia0 chhsia0 Aug 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is why I explicitly call this out in this section so we can discuss if "canonical paths" as you suggest is a better way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, if we make it explicit that this is a path, most users would naturally be aware that if they want to use the same path in different containers, they have to make a shared mount point.

teps/0011-redirecting-step-inputs-and-outputs.md Outdated Show resolved Hide resolved

* Run image `gcr.io/k8s-staging-boskos/boskosctl` with args in a step and process its output through another `jq` image in a subsequent step to acquire the project name without overwriting the image entrypoint, building custom image with both utilities, or looking into container logs ([pipeline#2925](https://github.com/tektoncd/pipeline/issues/2925)).

* Run image `alpine/git` with args `["describe", "--tags"]` in a step and extract the stdout as a Task result without overwriting the image entrypoint to be used by subsequent tasks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this should be doable with a script, instead of requiring a whole new image entrypoint.

script: git describe --tags > /tekton/results/tags

Copy link
Contributor Author

@chhsia0 chhsia0 Aug 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

script is essentially overwriting the entrypoint ;)

But yeah this is generally not a big problem, just a bit tedious. The only problem, again, is when the image doesn't have a shell and the command to run doesn't have a flag to write output to a file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true. Maybe I'm not understanding what the downside is in that case. Using script mode in this way is pretty straightforward and doesn't require maintaining another image.

teps/0011-redirecting-step-inputs-and-outputs.md Outdated Show resolved Hide resolved
teps/0011-redirecting-step-inputs-and-outputs.md Outdated Show resolved Hide resolved
teps/0011-redirecting-step-inputs-and-outputs.md Outdated Show resolved Hide resolved
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern would be to document and push users towards using workspaces for the stdout / stderr files so that they can be shared around without accidentally exploding the size of results and the termination message.

I was wondering if you considered any of the alternatives described in the comments on tektoncd/pipeline#2925 - in particular inverting the control a little bit and allowing a subsequent step to provide a filter to run over a previous steps stdout and limit the amount of data shuttled from step to step. Even if those alternatives are rejected it's probably still worth adding them to this doc to capture them I think?

teps/0011-redirecting-step-inputs-and-outputs.md Outdated Show resolved Hide resolved
teps/0011-redirecting-step-inputs-and-outputs.md Outdated Show resolved Hide resolved
@chhsia0
Copy link
Contributor Author

chhsia0 commented Aug 18, 2020

@sbwsg Sure I'll go through pipeline#2925 and add the discussions here. For promoting users to use workspaces, I feel that it's better giving users options but document the pros and cons and the suggested ways, as well as a validation and/or runtime check (which we already have, as you pointed out). I can image the most common use cases for sharing data through results are sending just a couple bytes, and using workspaces (backed by PVC I believe?) for such purpose is too heavy-weight.

@ghost
Copy link

ghost commented Aug 18, 2020

I can image the most common use cases for sharing data through results are sending just a couple bytes, and using workspaces (backed by PVC I believe?) for such purpose is too heavy-weight.

Having read your example yaml in the TEP I think I understand a bit better where you're coming from. It looks like you're talking about the sort of approach where one step writes its stdout to a shared location on disk, and another step then filters through the data with jq and writes only a tiny portion of it to a result. That's great and I totally understand that use case and I think this is the perfect place for results. Similarly to @imjasonh I'm not actually sure this needs to be explicitly called out as a goal - Tekton will already work like this today so long as the user writes the stdout to a location shared by multiple steps.

However, I do want to be careful that when we come to document / promote the feature we have to be super clear to users that they cannot slam all of stdout/stderr from every step into a task result and expect it to "just work". I think that's where I'm getting stuck a bit - if a user sees this feature and adds stdout: $(results.stepXstdout.path) to every single Step they are going to have a very bad time. I think we could safely expect a number of very similar github bug reports to be generated around it very quickly and ideally I'd like to avoid that.

@ghost
Copy link

ghost commented Aug 18, 2020

Also, just to add - PVCs are only required if the data needs to be shared between Tasks of a Pipeline. An emptyDir workspace is shared by all steps in a task and is as close to zero-configuration as a volume definition can get.

@chhsia0
Copy link
Contributor Author

chhsia0 commented Aug 18, 2020

For sharing thing between steps within a task then workspaces isn't even needed. An emptyDir volume in the task is sufficient. See my example between the 1st and 2nd steps.

@chhsia0
Copy link
Contributor Author

chhsia0 commented Aug 18, 2020

I call it out as a goal in the sense that I'd like to enable user to configure the paths, instead of using some built-in paths, where you won't be able to make it work with Task results without an additional step. If this is not needed in the goal I could remove it.

@ghost
Copy link

ghost commented Aug 18, 2020

I call it out as a goal in the sense that I'd like to enable user to configure the paths

Gotcha! Makes sense. I don't think you have to remove it in that case, but my suggestion would be to rephrase the goal to be "Enable a user to configure the path where the stream is written"? I got caught up on the result part and I missed the broader idea.

@chhsia0 chhsia0 requested review from vdemeester, imjasonh and a user August 18, 2020 21:11
@chhsia0 chhsia0 changed the title TEP-0011: Redirecting Step Inputs and Outputs TEP-0011: Redirecting Step Output Streams Aug 18, 2020
## Requirements

* Add new fields for a step to specify paths to redirect stdout/stderr to.
* Users should be able to observe the output streams through Pod logs even if `StdoutPath` or `StderrPath` is specified. In other words, output streams should be duplicated instead of simply redirected.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this might become configurable one day? i.e. a user may decide they don't want the stdout logged to terminal, only the file?

I am wondering if that would change the desirable shape of the struct if so :-

type Step struct {
  // ...
  StdoutConfig struct {
    ToPath string
    ToTerminal bool
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes sense to have an additional layer of structure. Do you have anything else in mind that stdout could be redirected to? About the ToTerminal option, at least it should be turned on by default, mainly from a debuggability point of view: it would be impossible to retrieve the stdout from e.g., emptyDir if the task fails.

This extra layer, on the other hand, would make the config structure more verbose.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having thought about this a bit more - I wonder if "hiding" or limiting the stdout log would be more a runtime concern. Something to configure at the taskrun level instead of task. I'm o.k. to move ahead with the existing proposal of just the path string!

}
```

Once `StdoutPath` or `StderrPath` is specified, the corresponding output stream will be duplicated to both the given file and the standard output stream of the container, so users can still view the output through the Pod log API. If both `StdoutPath` and `StderrPath` are set to the same value, the two streams will be written to two file descriptors referring to the same file description, so outputs from both stream will be interleaved in the same file, but there will be no ordering guarantee on the data. If multiple steps' `StdoutPath` fields are set to the same value, the file content will be overwritten by the last outputting step.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm curious about the interleaving - would it be line-buffered or would the user end up with halves of lines from each stream spliced together?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if we'd want a "prefix: " in the interleaved scenario like:

stdout: tests 3 / 6 completed
stdout: tests 4 / 6 completed
stderr: warning: foo bar
stdout: tests 5 / 6 completed

and, if so, whether that'd also be something the user wants to configure?

Copy link
Contributor Author

@chhsia0 chhsia0 Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "interleaving" behaves similar to redirecting both stdout and stderr to the same stream in shell, e.g,

./command > log 2>&1

IIUC go runtime doesn't implement a buffer and it directly goes to the write syscall, and thus line-buffering only happens if the *os.File is associated with a TTY. You can find some implementation details about how we "redirect" stdout in our entrypoint customization here:
https://github.com/tektoncd/pipeline/pull/3103/files#diff-18256282f9059ed3c2b641b9c2d8ac6fR29
https://github.com/tektoncd/pipeline/pull/3103/files#diff-8c0170fa2830f4fdfc8067ba745cf734R49

We could use a bufio to implement a line buffer before writing to the file, but I'd imagine there would still be corner cases (e.g., super long lines) where the interleaving happens within a line. And most of the time the interleaving is not a big deal so I'm not sure if it is worth complicating the code to have line buffering.

An alternative is to forbid user from specifying the same path for stdout and stderr. If you the user really wants that, they can use the Pod logs API.

Prefixing is a fancy idea, but IMO it's incremental and can be added when there's an actual need. If users really want separation, they should put them to different files. I'm also afraid that we're opening an can of worms, e.g., one may want to add timestamps or other information. Especially timestamping is something we should not touch and leave it to the Pod logs API :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, understood!

teps/0011-redirecting-step-output-streams.md Outdated Show resolved Hide resolved
teps/0011-redirecting-step-output-streams.md Outdated Show resolved Hide resolved
teps/0011-redirecting-step-output-streams.md Outdated Show resolved Hide resolved
teps/0011-redirecting-step-output-streams.md Outdated Show resolved Hide resolved
teps/0011-redirecting-step-output-streams.md Outdated Show resolved Hide resolved
- name: project-id
steps:
- name: boskosctl
image: gcr.io/k8s-staging-boskos/boskosctl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the issue here that boskosctl doesn't include a shell, and so script is not useable, and therefore you'd have to jump through hoops to get its output? Would it be possible to get a shell added to the boskosctl image, perhaps by simply basing it on alpine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not the one using boskosctl. @bobcatfish is ;) And it seems to have a shell.

IMO the problem here is that Tekton shouldn't limit how users build and use images. Especially some images are build to act like command line tools (i.e., comes with decent default entrypoints so running such images has the same UX as running utility programs), and right now if you want to composite multiple utility images to achieve a complex task, you have to ensure that there must be a shell for all images, or the image entrypoints must have an option to put output to a file.

Copy link
Contributor Author

@chhsia0 chhsia0 Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual use case I have is, I'm working on a DSL and a tool for user to run any image, and I can redirect the stdout to either task results or somewhere else, without the need to know the details of an image. How to run the image is provided by the user and completely transparent to my tool. In other words, I don't have control to the image, and thus impossible to ensure that there is always a shell and synthesize a script for that shell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems common for users to prefer some third-party “official” images instead of maintaining their own fork to include a shell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boskosctl is actually a bad example b/c it DOES have a shell, but otherwise: what @chhsia0 said :D

1. Write stdout of a step into `.docker/config.json` and consume the generated auth configuration in a subsequent step.
1. Extract stdout into a Task result.

If the step image provides a shell, users can specify the `Script` field to overwrite the image entrypoint to invoke the original entrypoint command and redirect the stdout into the target file. There are two main drawbacks: 1) information encapsulated by the image (e.g., entrypoint) will be leaked into Task specification, creating a tight coupling between the Task and the image; 2) users will lose the ability to see stdout through Pod log API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the first point: the entrypoint doesn't leak into the Task definition if it uses script -- if anything, the presence of a shell is assumed using script, but the original entrypoint value is ignored completely.

For the second point, the Task author could tee to stdout and wherever else they want that data to end up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "leaking" I mean that users cannot treat any third-party utility image as a black box and just pass in appropriate arguments. For example, if you use ko to build an image with shell, you have to know that the binary sits in /ko-app/... and bake this information into your shell.

If I later update my image to move the entrypoint binary around, steps passing args can still run, but steps with scripts will fail.

@pritidesai
Copy link
Member

/kind tep

@tekton-robot tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Aug 20, 2020
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit but otherwise LGTM

teps/0011-redirecting-step-output-streams.md Outdated Show resolved Hide resolved
@chhsia0 chhsia0 requested a review from a user August 25, 2020 22:09
@chhsia0
Copy link
Contributor Author

chhsia0 commented Aug 25, 2020

Attaching some great discussions related to this change:

https://tektoncd.slack.com/archives/CLCCEBUMU/p1597969084015700

Pierre Tasci:appleincwhite: 5:18 PM
FWIW it feels a bit odd to make this change at the Tekton level to alleviate something related to the internals of the image. I totally see how users might not want to author their own images but to me at least, it seems like the gap there is in tooling around the image rather than Tektons tooling in how it is executed if that make sense.

chhsiao 5:24 PM
IMO Tekton should be a platform to run any image without assumptions, i.e., treat them as black boxes, like k8s Pod API. (It seems we both agreen on this 🙂 ) But now there are Tekton features (say, task results) that some images cannot use, so it seems an incompleteness/limitation in the API.

Jason Hall 5:37 AM
That's a reasonable argument. I do think that I've tended to think of Tasks as the unit of pluggability/reusability, and the specifics of the images inside those Tasks is up to the Task author, and should generally be a black box to end users and even operators. I'd love to hear from other folks who are generating TaskRuns/PipelineRuns from some other configuration though, since that's not a use case I have as much first-hand experience with.

https://tektoncd.slack.com/archives/CLCCEBUMU/p1598024780034800?thread_ts=1598023428.029200&cid=CLCCEBUMU

Pierre Tasci:appleincwhite: 4 days ago
I think you make some excellent points @Christie Wilson. If I am hearing it correctly (let me know if I am not), the core of the assumption here is that small, focused images are better in some way to larger images.
I only have my own experience to go on here so pardon the naïveté but I haven’t found that to be particularly true. It often hurts debuggability and testability to have bare-bones images. I think of Kaniko as a prime example. Great tool but because it is built off scratch, everyone I know who uses it, actually uses the debug version because they attach a busybox shell into it. I will take ease of development and familiarity over additional complexity most times.
I do think there is a point here. The way steps are hooked up together right now is minimal at best. Even passing a file between steps requires hardcoded, explicit handoff. For that reason, I often end up writing larger scripts than I would otherwise desire. Rather than unix pipe redirection, perhaps there is a larger story about inter-step interaction here?

https://tektoncd.slack.com/archives/CLCCEBUMU/p1598040443039700?thread_ts=1598023428.029200&cid=CLCCEBUMU

Cory Cross 4 days ago
pipes are untyped and hard to test and full of edge cases. I think powershell should be the better inspiration if something's needed intra-Task

Jason Hall 4 days ago
Good point! Now you're making a even stronger argument that there's lots of complexity down this path, and lots of prior art we can learn from before we jump in and implement the next easy enhancement. 🙃

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. I am definitely voting for Option 1 as it is way more flexible.
For Option 2, what happens if the we refer a step stdoutPath that doesn't have stdout: true ? Should it fail, should we default to empty ? Having $(step.<name>.stdoutPath) gives the impression to the user (at least to me) that it's gonna be available for all the steps (hence we do capture the stdout on all of them).

@chhsia0
Copy link
Contributor Author

chhsia0 commented Aug 26, 2020

Good point. I can think of two options:
Option 2.1: Validate the task definition and reject the task if users refer to a non-existent step stdout path.
Option 2.2: No explicit Stdout/Stderr flags at all. If $(steps.<name>.stdoutPath) exists in the task definition, capture the stdout stream of the corresponding step.

In general I also like Option 1 (allowing users to specify paths) better. Anyway, we should settle with one option then I'll move the other to "alternatives."

Copy link
Contributor

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chhsia0 !

I have 3 main pieces of feedback:

  1. Is it possible and reasonable to do stdout only, at least at first? I'm deeply suspicious that stderr is mostly only relevant at runtime
  2. I'd like to revisit the variable interpolation and not provide a way to automatically expose stdout/stderr as results
  3. Can we support both an "auto" mode AND allow the user to specify an explicit path?


* Parsing stdout/stderr into a structured format ([example](https://github.com/tektoncd/pipeline/issues/2925#issue-654319361)): This approach requires the step image to produce JSON output, which limits what images can be used. It also hides the parsing magic in Tekton, which can be hard to debug if the output is malformed.

* [Allowing subsequent steps to specify a filter expression to apply to step outputs](https://github.com/tektoncd/pipeline/issues/2925#issuecomment-657529820): If there are multiple subsequent "consumer" steps, then either all consumers must use the same filter to save disk space. Also the magic of filtering will be hidden by Tekton from users, creating unnecessary complexity. It is not hard to use `Script` or add an extra step to perform filtering to achieve the same result with more transparency.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another alternative would be to do nothing in particular in tekton and leave this up to the Task and image authors, which effectively means if you want to communicate data between steps, or you want to capture data for results, you need a shell or you need your binaries to support outputting to a file

teps/0011-redirecting-step-output-streams.md Show resolved Hide resolved

Once `StdoutPath` or `StderrPath` is specified, the corresponding output stream will be duplicated to both the given file and the standard output stream of the container, so users can still view the output through the Pod log API. If both `StdoutPath` and `StderrPath` are set to the same value, the two streams will be written to two file descriptors referring to the same file description, so outputs from both stream will be interleaved in the same file, but there will be no ordering guarantee on the data. If multiple steps' `StdoutPath` fields are set to the same value, the file content will be overwritten by the last outputting step.

Variable substitution will be applied to the new fields, so one could specify `$(results.<name>.path)` to the `StdoutPath` field to extract the stdout of a step into a Task result. No new variable substitution for accessing the values of `StdoutPath` and `StderrPath` fields will be provided so variable substitution can remain single-pass.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$(results.<name>.path) to the StdoutPath field to extract the stdout of a step into a Task result

I'm a bit confused by this, you've got an example below: stdout: $(results.project-id.path)

in this case, what path does stdout actually get written to? is this magically writing stdout to a location on disk and then also exposing it as a result?

i dont think we should automatically wire stdout/stderr to any results. if we do, i think we could do it in a subsequent proposal after this

my alternative suggestion for the example below is that if you want to expose stdout as a result, you'd need to add a step after parse-project-id that writes stdout as a result

No new variable substitution for accessing the values of StdoutPath and StderrPath fields will be provided so variable substitution can remain single-pass.

can you explain a bit more? i think it would be very convenient to use $(steps.foo.stdout.path) or similar, and that's exactly how id want to use this feature, i wouldnt want to have to hardcode the path in multiple places

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case, what path does stdout actually get written to? is this magically writing stdout to a location on disk and then also exposing it as a result?

$(results.project-io.path) would be interpolated and be the path of the results (so /tekton/result/project-io or something), and thus, stdout would be written in. Just like if you use $(workspaces.some.path)/foo.

can you explain a bit more? i think it would be very convenient to use $(steps.foo.stdout.path) or similar, and that's exactly how id want to use this feature, i wouldnt want to have to hardcode the path in multiple places

So one of the problem with $(steps.foo.stdout.path) is that, it may be used without the previous task having stdout captured. You don't necessarly hardcode the path, or at least the full path ; most likely you would use $(workspace.foo.path)/somewhere/afile as a path, and re-use the same later on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$(results.project-io.path) would be interpolated and be the path of the results (so /tekton/result/project-io or something), and thus, stdout would be written in. Just like if you use $(workspaces.some.path)/foo.

oh that makes sense! i didnt realize you could do that and i've been hardcoding the path everywhere. just ignore that comment from me then @chhsia0!

So one of the problem with $(steps.foo.stdout.path) is that, it may be used without the previous task having stdout captured. You don't necessarly hardcode the path, or at least the full path ; most likely you would use $(workspace.foo.path)/somewhere/afile as a path, and re-use the same later on.

coud you explain a bit more about this? we could check if folks are using $(steps.foo.stdout.path) for a step that doesn't have stdout capturing configured. it still seems really useful to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could check if folks are using $(steps.foo.stdout.path) for a step that doesn't have stdout capturing configured

actually without this feature, we wouldn't be able to help out anyone making this mistake

Copy link
Contributor Author

@chhsia0 chhsia0 Aug 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the main differences between Option 1 and Option 2 if we also want the variable substitution mechanism to remain single-pass. With the single-pass assumption, Option 1 gives us a way to capture stdout into results, workspaces, volumes, ... but it's harder to validate if the specified path exists (and IMO it's Task authors' responsibility: they're responsible to ensure any path used in their Tasks is valid); Option 2 provides more guarantees and validations, but is more limited.

- name: data
mountPath: /data
volumes:
- name: data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why you decided to use volume mounts here - workspaces are intended to replace volumes and volumemounts as much as possible (i would have liked to have removed them entiredly tektoncd/pipeline#2058) b/c they allow volumes to be supplied at runtime vs task authoring time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using volumes in Tasks instead of workspaces provides us a way to encapsulate storage information that's internal to Tasks. IIUC if you use a workspace in a Task, you have to surface that to your Pipeline definition, even if you just want to provide a temporary storage for steps within a Task.

So if the storage not used to pass data between Tasks in a pipeline (e.g., emptyDir, configMap and secret), I prefer using volumes and don't expose that to the Pipeline definition.

Also IMO using emptyDir for a workspace might give people an elusion that the emptyDir is shared.


* If the stdout/stderr of a step is set to the path of a Task result and the step prints too many data, the result manifest would become too large. Currently the entrypoint binary would [fail if that happens](https://github.com/tektoncd/pipeline/blob/v0.15.2/cmd/entrypoint/main.go#L86). We could enhence the error message to provide more information about which step to blame for a termination message bloat to hint users to fix the problem.

### Option 2: Redirecting to Canonical Conventional Paths
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reasonable option that provides both, e.g. the ability to write to a specific location AND the ability to specify an explicit path? e.g.

type Step struct {
    Stdout PathConfig `json:"stdout,omitempty"`
}

type PathConfig struct{
   Auto bool
   Path string
}

So if you dont care where stdout ends up and you just want to pass it between steps you can use:

  - name: boskosctl
    image: gcr.io/k8s-staging-boskos/boskosctl
    ...
    stdout:
      auto: true
  - name: parse-project-id
    image: imega/jq
    args:
    - -r
    - .name
    - $(steps.boskosctl.stdout.path)

But if you do care you can use:

  - name: boskosctl
    image: gcr.io/k8s-staging-boskos/boskosctl
    ...
    stdout:
      path: /data/boskosctl-stdout
  - name: parse-project-id
    image: imega/jq
    args:
    - -r
    - .name
    - $(steps.boskosctl.stdout.path)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This boils down to whether we allow people to use $(workspaces.<name>.path) or $(results.<name>.path) or $(resources.outputs.<name>.path) in the specified path. If yes and if we want to keep a single-pass variable substitution, then we cannot provide $(steps.<name>.stdout.path).

bobcatfish added a commit to bobcatfish/pipeline that referenced this pull request Aug 27, 2020
I didn't realize this variable existed until @chhsia0 used it in
tektoncd/community#182
All this time ive been hardcoding this path XD hoping to help some other
folks realize this is possible as well.
@vdemeester
Copy link
Member

See #182 (comment), I don't see why we would only add stdout and not stderr. From my point of view, it's not up to tekton to know which one is valuable/used by the user.

@bobcatfish
Copy link
Contributor

From my point of view, it's not up to tekton to know which one is valuable/used by the user.

I don't think we've had any users say they actually need stderr so it feels like we don't need it, and if we don't need it, why not wait to add it until we do.

Sounds like you feel strongly @vdemeester tho

@chmouel
Copy link
Member

chmouel commented Sep 14, 2020

I believe there is a use case for having stderr passed over,

For example, as a user I would like to write a tekton pipeline for python based software to do some linting of my python code.

I look around and find this container provided by the pylint authors https://hub.docker.com/r/eeacms/pylint/ which I can reuse directly in my tekton task.

I would like in a following task being able to capture the output (stdout) or the errors (stderr) of the task so I can for example post it back to GitHub or somewhere else...

Being able to do this directly without having to write a custom script based on eeacms/pylint/ to redirect stderr to stdout or store the stderr to my workspace make it more useful and easy for me.

bobcatfish added a commit to bobcatfish/pipeline that referenced this pull request Sep 21, 2020
I didn't realize this variable existed until @chhsia0 used it in
tektoncd/community#182
All this time ive been hardcoding this path XD hoping to help some other
folks realize this is possible as well.
tekton-robot pushed a commit to tektoncd/pipeline that referenced this pull request Sep 21, 2020
I didn't realize this variable existed until @chhsia0 used it in
tektoncd/community#182
All this time ive been hardcoding this path XD hoping to help some other
folks realize this is possible as well.
@tekton-robot tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label Sep 28, 2020
Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @tektoncd/core-maintainers

@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 28, 2020
@ghost
Copy link

ghost commented Oct 2, 2020

All my comments are addressed and I'd like to see this move ahead. Adding hold to give @bobcatfish time to respond to the remaining open threads of conversation.

/lgtm

/hold

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 2, 2020
@tekton-robot tekton-robot assigned ghost Oct 2, 2020
@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 2, 2020
Copy link
Contributor

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few comments but I think they're all nitpicks :D Generally looks great! Thanks for your patience with the back and forth @chhsia0 !

I'm leaving the hold just because I think one of those nitpicks is in an example that isn't quite right, feel free to remove the hold!

teps/0011-redirecting-step-output-streams.md Show resolved Hide resolved
teps/0011-redirecting-step-output-streams.md Outdated Show resolved Hide resolved

Once `StdoutConfig.Path` or `StderrConfig.Path` is specified, the corresponding output stream will be duplicated to both the given file and the standard output stream of the container, so users can still view the output through the Pod log API. If both `StdoutConfig.Path` and `StderrConfig.Path` are set to the same value, the two streams will be written to two file descriptors referring to the same file description, so outputs from both stream will be interleaved in the same file, but there will be no ordering guarantee on the data. If multiple steps' `StdoutConfig.Path` fields are set to the same value, the file content will be overwritten by the last outputting step.

Variable substitution will be applied to the new fields, so one could specify `$(results.<name>.path)` to the `StdoutConfig.Path` field to extract the stdout of a step into a Task result. No new variable substitution for accessing the values of `StdoutConfig.Path` and `StderrConfig.Path` fields will be provided so variable substitution can remain single-pass.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yet! :D

- name: project-id
steps:
- name: boskosctl
image: gcr.io/k8s-staging-boskos/boskosctl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boskosctl is actually a bad example b/c it DOES have a shell, but otherwise: what @chhsia0 said :D

teps/0011-redirecting-step-output-streams.md Outdated Show resolved Hide resolved
@tekton-robot tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label Nov 2, 2020
@bobcatfish
Copy link
Contributor

@chhsia0 one last nitpick, it looks like this PR has 14 commits right now, can those be squashed into 1?

Feel free to remove:
/hold

@chhsia0
Copy link
Contributor Author

chhsia0 commented Nov 2, 2020

/hold cancel

@tekton-robot tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 2, 2020
@chhsia0
Copy link
Contributor Author

chhsia0 commented Nov 2, 2020

I usually don't squash the commits in a PR because it allows reviewers to see diff since last review and the squash could be done through "Squash and merge." But sure I'm happy to do that now 👍

@vdemeester
Copy link
Member

@chhsia0 😝 but it's a bot that merges, and only through rebase, so it would be nice, now that it's validated, that we would squash the commits before merge 😛

@vdemeester
Copy link
Member

… and as I commented, you did it @chhsia0 🤗 😻
/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 2, 2020
@tekton-robot tekton-robot merged commit f70e525 into tektoncd:master Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants