Implement reference Unstructured store API to upload TaskRun logs GCS store #107

tejal29 · 2018-10-06T05:56:44Z

Expected Behavior

The Pipeline TaskRun logs should be uploaded to an endpoint and available to download later.
In our initial reference implementation we should support uploading to GCS. In the long run we should support other kinds of stores, and provide a default that does not require GCS.

Actual Behavior

~~As of #167 the logs will be streamed to a PVC. This volume will continue to exist after the TaskRun has completed. Once this task is done, that PVC should no longer be needed.~~ (This functionality was removed in #443)

Since we moved from init containers to containers for steps in #564 logs are available via the pod logs through kube, however there are still only limited guarantees about how long the logs will be available for.

Steps to Reproduce the Problem

Create a Task
Create a TaskRun
Wait for TaskRun to complete
Downloaded the uploaded results.

Additional Info

The text was updated successfully, but these errors were encountered:

bobcatfish · 2018-10-08T17:02:39Z

@tejal29 this is a stretch goal for the milestone so I'm going to remove it from the required milestone tasks

This pr implements a simple TaskRun controller that creates a knative/build Build and updates the TaskRun status to reflect the Build status. We delegate to the knative/build controller to do the work of actually fulfillign the Build itself - meaning we have a hard dependency on knative/build. The integration test doesn't actually assert on the logs output by the bulid step because the pods disappear immediately after completion, so we need a better solution here (e.g. writing to a PVC in the test) - in the long run we need to implement better log support (#107). Remaining work for #59 is to improve unit test coverage, and add some docs on running + debugging.

tejal29 · 2018-10-30T19:09:33Z

After feedback with Build working group, we decided to go with following approach.
The API will provide a Sink interface.
A Sink Service Definition will contain

Endpoint : This is url where the sink is hosted.
Path to upload to: for now we will be only focusing on uploading logs.
- Note: We will add support for uploading the Status for TaskRun or PipelineRun in subsequent iteration.
  When we want to upload sat task runs logs, the reconciler will call the service which is running at Endpoint and upload all the logs at path prefix <Path> <Endpoint>:<Path>/<taskruns><task run id>

This will be a Https Service running in our cluster or any other cluster which your cluster can access.

The Sink Base definition will look like this.

package sink
type URL string
type Base struct {
   Endpoint URL
   Path String
}

The reason, we have Path same service can be used to upload logs for multiple projects. They will reside in different Paths and not overlap.

e.g.: I have implemented a GCSSink and installed in my cluster. It is running at "104.198.205.71:8080".
The GCS sink defintion will look this

package sink

type GCS struct {
  *sink.Base
   project string //GCP project.
}

I can define two GCS Sinks which point to 2 buckets "cluster1", "cluster2"

sink1 := sink.GCS {
  Endpoint: "104.198.205.71:8080",
  Path: "cluster1",
  Project: "test1",  // only test1 has write access to gcs bucket gs://cluster1
}

sink2 := sink.GCS {
  Endpoint: "104.198.205.71:8080",
  Path: "cluster2",
 Project: "test2",  // Similarly, only test2 has write access to gcs bucket gs://cluster2
}

Along with that, Sink Interface need to handle 4 url requests "upload/taskruns/", "download/taskruns/" , "upload/pipelineruns/", "download/taskruns/"
(We can also support, Upload and Get with run type. Its up to the implementer )
Note: We can support partial upload and download of logs later.

The Task Reconciler will now make a HTTP request to this sink.Endpoint/sink.Path/upload/taskruns/<id=x>,<contentstream=>

The Design Question over here is how to Define a Sink for a pipeline or a Task run.
Should sink be defined cluster wide Custom Resource?

this would mean, we will create a new custom resource for Sink.
Admins can create multiple Sinks.
The Reconciler will fetch all sinks installed in your cluster and then upload logs to all sinks.

Should Sink be defined per Pipeline or Task?

this would Sink will added to PipelineParams
this would mean, now task and task run definition will list the sink it would want to sink the results too

/cc @imjasonh and @bobcatfish and @aaron-prindle does this all make sense?

aaron-prindle · 2018-10-30T19:29:00Z

Nice! These are some initial thoughts/questions:

For something like a GCS sink, how is authorization done for the log uploading? I think GCS w/ GKE will just work but I'm wondering about other clouds/providers?
Is there a default path: value that we should populate for users is none is supplied? (path is optional?)
For GCS, project should maybe be renamed bucket as I think you can have multiple GCS buckets/project. Also would the endpoint there be the gs:// url or the full url?

tejal29 · 2018-10-31T05:21:48Z

Nice! These are some initial thoughts/questions:

For something like a GCS sink, how is authorization done for the log uploading? I think GCS w/ GKE will just work but I'm wondering about other clouds/providers?

Yes you would need like Credentials file added to sink.GCS definition and then pass that along?
Maybe it could be a k8 ConfigMap object and we define the name of the ConfigMap in the GCS sink definition.

Is there a default path: value that we should populate for users is none is supplied? (path is optional?)

not sure, what would happen if we provide a default path. We we have taskruns with same id running in separate clusters. They might end up writing to same path. Maybe we could add some validation to make sure path is always specified.

For GCS, project should maybe be renamed bucket as I think you can have multiple GCS buckets/project. Also would the endpoint there be the gs:// url or the full url?

Ahh! For GCS i thinking path represents bucket and then project was something i saw in the go storage api.

       ctx := context.Background()

        // For API packages whose import path is starting with "cloud.google.com/go",
        // such as cloud.google.com/go/storage in this case, if there are no credentials
        // provided, the client library will look for credentials in the environment.
        storageClient, err := storage.NewClient(ctx)
        if err != nil {
                log.Fatal(err)
        }

        it := storageClient.Buckets(ctx, "project-id")
        for {
                bucketAttrs, err := it.Next()
                if err == iterator.Done {
                        break
                }
                if err != nil {
                        log.Fatal(err)
                }
                fmt.Println(bucketAttrs.Name)
        }

The endpoint will be actually GCS Sink Implementation Http Service like "10.x.x.x:8080" which will have all the code to upload and download content from GCS.
We are providing a GCS implementation which others can use.
Users will have to write Sink Implementation and deploy it in their cluster as a HTTP service. They have to make sure they handle "upload/taskruns", "download/taskruns" request.
The TaskController is now agnostic to what service implements.

When a user kicks off a run, they will provide an endpoint to upload logs to (initial implementation will be in #107). The corresponding fields in `status` will indicate where the logs actually got uplaoded to. Once we actually get to #107, and especially once we start supporting endpoints other than GCS, we may find this isn't useful and remove it. Fixes tektoncd#146

We noticed early on that logs from init containers are often cleaned up immediately by k8s, particularly if the containers are short running (e.g. just echoing "hello world"). We started down a path to correct that, which takes an approach based on Prow's entrypoint solution (https://github.com/kubernetes/test-infra/tree/master/prow/cmd/entrypoint) (even using the same image at the moment!) which wraps the user's provided command and streams logs to a volume, from which the logs can be uploaded/streamed by a sidecar. Since we are using init containers for step execution, we can't yet use sidecars, but we are addressing that in tektoncd#224 (also an entrypoint re-writing based solution). Once we have that, we can sidecar support, starting with GCS as a POC (#107) and moving into other types. In the meantime, to enable us to get logs (particularly in tests), we had the taskrun controller create a PVC on the fly to hold logs. This has two problems: * The PVCs are not cleaned up so this is an unexpected side effect for users * Combined with PVC based input + ouput linking, this causes scheduling problems for the resulting pods (tektoncd#375) Now that we want to have an official release, this would be a bad state to release in, so we will remove this magical log PVC creation logic, which was never our intended end state anyway. Since we _do_ need the entrypoint rewriting and log interception logic in the long run, this commit leaves most functionality intact, removing only the PVC creation and changing the volume being used to an `emptyDir`, which is what we will likely use for #107 (and this is how Prow handles this as well). This means the released functionality will be streaming logs to a location where nothing can read them, however I think it is better than completely removing the functionality b/c: 1. We need the functionality in the long run 2. Users should be prepared for this functionality (e.g. dealing with edge cases around the taskrun controller being able to fetch an image's entrypoint) Fixes tektoncd#387

We noticed early on that logs from init containers are often cleaned up immediately by k8s, particularly if the containers are short running (e.g. just echoing "hello world"). We started down a path to correct that, which takes an approach based on Prow's entrypoint solution (https://github.com/kubernetes/test-infra/tree/master/prow/cmd/entrypoint) (even using the same image at the moment!) which wraps the user's provided command and streams logs to a volume, from which the logs can be uploaded/streamed by a sidecar. Since we are using init containers for step execution, we can't yet use sidecars, but we are addressing that in #224 (also an entrypoint re-writing based solution). Once we have that, we can sidecar support, starting with GCS as a POC (#107) and moving into other types. In the meantime, to enable us to get logs (particularly in tests), we had the taskrun controller create a PVC on the fly to hold logs. This has two problems: * The PVCs are not cleaned up so this is an unexpected side effect for users * Combined with PVC based input + ouput linking, this causes scheduling problems for the resulting pods (#375) Now that we want to have an official release, this would be a bad state to release in, so we will remove this magical log PVC creation logic, which was never our intended end state anyway. Since we _do_ need the entrypoint rewriting and log interception logic in the long run, this commit leaves most functionality intact, removing only the PVC creation and changing the volume being used to an `emptyDir`, which is what we will likely use for #107 (and this is how Prow handles this as well). This means the released functionality will be streaming logs to a location where nothing can read them, however I think it is better than completely removing the functionality b/c: 1. We need the functionality in the long run 2. Users should be prepared for this functionality (e.g. dealing with edge cases around the taskrun controller being able to fetch an image's entrypoint) Fixes #387

@hrishin

In tektoncd#549 @hrishin pointed out that it's hard to understand from the step status exactly which step did what. While looking at this I realized that we have included a field `logsURL` which we never populate - I thought this was copied over from Build but it was actually from our original prototype API and we have never used it. In #107 we should be revisiting making logs available and we may add in something like this, but since we're not using it and it's not clear if we ever will, let's remove it for now.

@hrishin

In #549 @hrishin pointed out that it's hard to understand from the step status exactly which step did what. While looking at this I realized that we have included a field `logsURL` which we never populate - I thought this was copied over from Build but it was actually from our original prototype API and we have never used it. In #107 we should be revisiting making logs available and we may add in something like this, but since we're not using it and it's not clear if we ever will, let's remove it for now.

@cmoulliard

As @cmoulliard pointed out, it's not obvious how to get to the logs for a PipelineRun or a TaskRun. If you know how the underlying kubernetes resources work you can figure it out but it can be hard to know where to start. Plus, folks may not realize that we are working on better ways of accessing logs. And once we work on #107 we can build up these docs with more detail about how to upload logs too.

@cmoulliard

As @cmoulliard pointed out, it's not obvious how to get to the logs for a PipelineRun or a TaskRun. If you know how the underlying kubernetes resources work you can figure it out but it can be hard to know where to start. Plus, folks may not realize that we are working on better ways of accessing logs. And once we work on #107 we can build up these docs with more detail about how to upload logs too. Fixes tektoncd#898

@cmoulliard

As @cmoulliard pointed out, it's not obvious how to get to the logs for a PipelineRun or a TaskRun. If you know how the underlying kubernetes resources work you can figure it out but it can be hard to know where to start. Plus, folks may not realize that we are working on better ways of accessing logs. And once we work on #107 we can build up these docs with more detail about how to upload logs too. Fixes #898

ghost · 2019-08-05T12:46:11Z

I'm closing this issue out as we have now circulated a design doc for logging in Tekton and the utility of information retained in this issue is limited due to its age.

I've opened #1155 to encompass the work of validating and implementing the proposed design and encourage anyone looking to get involved on this topic to add commentary, use cases and counterpoints to the design doc or github issue linked above. Cheers!

In #107 and related issues we decided to let tools dedicated to this (e.g. fluentd) take care of it!

tejal29 added this to the Mid October Demo milestone Oct 6, 2018

bobcatfish removed this from the Mid October Demo milestone Oct 8, 2018

bobcatfish mentioned this issue Oct 12, 2018

Create solution for making logs available when running Tasks #143

Closed

bobcatfish mentioned this issue Oct 12, 2018

TaskRun should have Results in Status #146

Closed

bobcatfish added the meaty-juicy-coding-work This task is mostly about implementation!!! And docs and tests of course but that's a given label Oct 12, 2018

tejal29 self-assigned this Oct 15, 2018

bobcatfish added design This task is about creating and discussing a design and removed meaty-juicy-coding-work This task is mostly about implementation!!! And docs and tests of course but that's a given labels Oct 16, 2018

bobcatfish changed the title ~~Implement a Unstructured store API to upload TaskRun logs and results to GCB store.~~ Implement reference Unstructured store API to upload TaskRun logs and results to GCB store. Oct 24, 2018

bobcatfish assigned tejal29 and unassigned tejal29 Oct 29, 2018

bobcatfish modified the milestones: 0.0.1 Alpha release, Results store API defined w/ initial GCS implementation Nov 3, 2018

bobcatfish mentioned this issue Nov 5, 2018

Explore moving away from init containers #224

Closed

bobcatfish assigned aaron-prindle Nov 6, 2018

bobcatfish modified the milestones: Results store API defined w/ initial GCS implementation, 0.3 Alpha release Nov 26, 2018

bobcatfish mentioned this issue Nov 30, 2018

Remove PipelineParams #282

Merged

bobcatfish mentioned this issue Dec 1, 2018

Consolidate Results validation + add Results to Run status #292

Merged

bobcatfish mentioned this issue Feb 28, 2019

Remove logsURL from step status #563

Merged

3 tasks

hrishin mentioned this issue Mar 4, 2019

Taskruns: hard to follow steps status of a task #549

Closed

mchmarny unassigned aaron-prindle Mar 7, 2019

bobcatfish changed the title ~~Implement reference Unstructured store API to upload TaskRun logs and results to GCS store.~~ Implement reference Unstructured store API to upload TaskRun logs GCS store Apr 25, 2019

bobcatfish assigned bobcatfish and unassigned tejal29 Apr 25, 2019

bobcatfish modified the milestones: Pipelines 0.2 🎉 🎉 🎉, Pipelines 0.4 Apr 25, 2019

bobcatfish assigned ghost and unassigned bobcatfish Apr 26, 2019

bobcatfish modified the milestones: Pipelines 0.4 🐱 : Aegean Brackenridge, Pipelines 0.5 🐱 : <cat name tbd @dlorenc> May 28, 2019

bobcatfish mentioned this issue May 30, 2019

Add docs on how to access Run logs 🗎 #935

Merged

2 tasks

AlanGreene mentioned this issue Jun 10, 2019

Would like to see old logs at least for 24 hours past tektoncd/dashboard#232

Closed

vdemeester modified the milestones: Pipelines 0.5 🐱, Pipelines 0.6 🐱 Jul 10, 2019

ghost mentioned this issue Aug 5, 2019

Validate Design & Implement Tekton Logging Proposal #1155

Closed

3 tasks

ghost closed this as completed Aug 5, 2019

bobcatfish added a commit that referenced this issue Mar 26, 2020

Remove reference to persisting logs 🪵

01be379

In #107 and related issues we decided to let tools dedicated to this (e.g. fluentd) take care of it!

bobcatfish mentioned this issue Mar 26, 2020

Remove reference to persisting logs 🪵 #2299

Merged

2 tasks

tekton-robot pushed a commit that referenced this issue Mar 27, 2020

Remove reference to persisting logs 🪵

2cdece9

In #107 and related issues we decided to let tools dedicated to this (e.g. fluentd) take care of it!

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement reference Unstructured store API to upload TaskRun logs GCS store #107

Implement reference Unstructured store API to upload TaskRun logs GCS store #107

tejal29 commented Oct 6, 2018 •

edited by bobcatfish

Loading

bobcatfish commented Oct 8, 2018

tejal29 commented Oct 30, 2018 •

edited

Loading

aaron-prindle commented Oct 30, 2018 •

edited

Loading

tejal29 commented Oct 31, 2018 •

edited

Loading

ghost commented Aug 5, 2019

Implement reference Unstructured store API to upload TaskRun logs GCS store #107

Implement reference Unstructured store API to upload TaskRun logs GCS store #107

Comments

tejal29 commented Oct 6, 2018 • edited by bobcatfish Loading

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Additional Info

bobcatfish commented Oct 8, 2018

tejal29 commented Oct 30, 2018 • edited Loading

aaron-prindle commented Oct 30, 2018 • edited Loading

tejal29 commented Oct 31, 2018 • edited Loading

ghost commented Aug 5, 2019

tejal29 commented Oct 6, 2018 •

edited by bobcatfish

Loading

tejal29 commented Oct 30, 2018 •

edited

Loading

aaron-prindle commented Oct 30, 2018 •

edited

Loading

tejal29 commented Oct 31, 2018 •

edited

Loading