Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.4 release must not try to create multiple PVCs across controllers #387

Closed
bobcatfish opened this issue Jan 10, 2019 · 0 comments
Closed
Assignees

Comments

@bobcatfish
Copy link
Collaborator

Expected Behavior

Folks who try to use Pipelines after our 0.4 release should not have to work around the issue we found in #375, which effectively makes it infeasible to use Pipelines with regional GKE clusters.

Actual Behavior

If you try to use Pipelines with a GKE regional cluster, and you try to use a Task with an output, the resulting pod for that Task will be unschedulable.

This is because two PVCs will be created, one by the PipelineRun controller for the output resource, and one by the TaskRun controller for the logs. These will be assigned round robin to different zones within the region, and then the resulting pod will be unable to be scheduled such that it can use both (it can't be in 2 zones at once).

Additional Info

We have several options for how to resolve this before we release:

  1. If we finish Explore moving away from init containers #224 before the 0.4 release, we can remove the log PVC because users should be able to reliably (enough) get logs from the step containers themselves
  2. If we complete Use storage bucket to link outputs and inputs of tasks #384, we could potentially get rid of the output PVC entirely
  3. If we neither of the above complete, we should remove the PVC we added only for logs

It is arguable that (2) is not enough, and we should remove the extra log PVC regardless, since we shouldn't be leaking PVCs. The main benefit of the log PVC as it stands at the moment is that it allows us to check log output in end to end tests.

@bobcatfish bobcatfish added this to the 0.4 Knative Release milestone Jan 10, 2019
@bobcatfish bobcatfish self-assigned this Jan 27, 2019
bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Jan 27, 2019
We noticed early on that logs from init containers are often cleaned up
immediately by k8s, particularly if the containers are short running
(e.g. just echoing "hello world"). We started down a path to correct
that, which takes an approach based on Prow's entrypoint solution
(https://github.com/kubernetes/test-infra/tree/master/prow/cmd/entrypoint)
(even using the same image at the moment!) which wraps the user's
provided command and streams logs to a volume, from which the logs can
be uploaded/streamed by a sidecar.

Since we are using init containers for step execution, we can't yet use
sidecars, but we are addressing that in tektoncd#224 (also an entrypoint
re-writing based solution). Once we have that, we can sidecar support,
starting with GCS as a POC (#107) and moving into other types.

In the meantime, to enable us to get logs (particularly in tests), we
had the taskrun controller create a PVC on the fly to hold logs. This
has two problems:
* The PVCs are not cleaned up so this is an unexpected side effect for
  users
* Combined with PVC based input + ouput linking, this causes scheduling
  problems for the resulting pods (tektoncd#375)

Now that we want to have an official release, this would be a bad state
to release in, so we will remove this magical log PVC creation logic,
which was never our intended end state anyway.

Since we _do_ need the entrypoint rewriting and log interception logic
in the long run, this commit leaves most functionality intact, removing
only the PVC creation and changing the volume being used to an
`emptyDir`, which is what we will likely use for #107 (and this is how
Prow handles this as well). This means the released functionality will
be streaming logs to a location where nothing can read them, however I
think it is better than completely removing the functionality b/c:
1. We need the functionality in the long run
2. Users should be prepared for this functionality (e.g. dealing with
   edge cases around the taskrun controller being able to fetch an
   image's entrypoint)

Fixes tektoncd#387
bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Jan 28, 2019
Three caveats to this integration test:
- It was created before tektoncd#320, so the resource binding is not correct
- It was created before tektoncd#387, so it relies on the log PVC which will no
  longer exist (can work around this by mounting a PVC explicitly in the
  test and writing directly to it instead of echoing?)
- It doesn't exercise `runAfter` functionality
bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Jan 28, 2019
Three caveats to this integration test:
- It was created before tektoncd#320, so the resource binding is not correct
- It was created before tektoncd#387, so it relies on the log PVC which will no
longer exist (can work around this by mounting a PVC explicitly in the
test and writing directly to it instead of echoing?)
- It doesn't exercise `runAfter` functionality
bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Jan 28, 2019
This tests DAG functionality by defining a pipeline with both fan in and
fan out. The idea is that each Task echoes the current time, so after
the pipeline compeletes, we can look at which Task echoes which which
time to make sure they run in order. The tasks are also declared in the
Pipeline in the wrong order on purpose, to make sure that the order of
declaration doesn't affect how they are run (the opposite of the current
functionality)

Three caveats to this integration test:
- It was created before tektoncd#320, so the resource binding is not correct
- It was created before tektoncd#387, so it relies on the log PVC which will no
longer exist (can work around this by mounting a PVC explicitly in the
test and writing directly to it instead of echoing?)
- It doesn't exercise `runAfter` functionality
bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Jan 30, 2019
We noticed early on that logs from init containers are often cleaned up
immediately by k8s, particularly if the containers are short running
(e.g. just echoing "hello world"). We started down a path to correct
that, which takes an approach based on Prow's entrypoint solution
(https://github.com/kubernetes/test-infra/tree/master/prow/cmd/entrypoint)
(even using the same image at the moment!) which wraps the user's
provided command and streams logs to a volume, from which the logs can
be uploaded/streamed by a sidecar.

Since we are using init containers for step execution, we can't yet use
sidecars, but we are addressing that in tektoncd#224 (also an entrypoint
re-writing based solution). Once we have that, we can sidecar support,
starting with GCS as a POC (#107) and moving into other types.

In the meantime, to enable us to get logs (particularly in tests), we
had the taskrun controller create a PVC on the fly to hold logs. This
has two problems:
* The PVCs are not cleaned up so this is an unexpected side effect for
  users
* Combined with PVC based input + ouput linking, this causes scheduling
  problems for the resulting pods (tektoncd#375)

Now that we want to have an official release, this would be a bad state
to release in, so we will remove this magical log PVC creation logic,
which was never our intended end state anyway.

Since we _do_ need the entrypoint rewriting and log interception logic
in the long run, this commit leaves most functionality intact, removing
only the PVC creation and changing the volume being used to an
`emptyDir`, which is what we will likely use for #107 (and this is how
Prow handles this as well). This means the released functionality will
be streaming logs to a location where nothing can read them, however I
think it is better than completely removing the functionality b/c:
1. We need the functionality in the long run
2. Users should be prepared for this functionality (e.g. dealing with
   edge cases around the taskrun controller being able to fetch an
   image's entrypoint)

Fixes tektoncd#387
bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Jan 30, 2019
We noticed early on that logs from init containers are often cleaned up
immediately by k8s, particularly if the containers are short running
(e.g. just echoing "hello world"). We started down a path to correct
that, which takes an approach based on Prow's entrypoint solution
(https://github.com/kubernetes/test-infra/tree/master/prow/cmd/entrypoint)
(even using the same image at the moment!) which wraps the user's
provided command and streams logs to a volume, from which the logs can
be uploaded/streamed by a sidecar.

Since we are using init containers for step execution, we can't yet use
sidecars, but we are addressing that in tektoncd#224 (also an entrypoint
re-writing based solution). Once we have that, we can sidecar support,
starting with GCS as a POC (#107) and moving into other types.

In the meantime, to enable us to get logs (particularly in tests), we
had the taskrun controller create a PVC on the fly to hold logs. This
has two problems:
* The PVCs are not cleaned up so this is an unexpected side effect for
  users
* Combined with PVC based input + ouput linking, this causes scheduling
  problems for the resulting pods (tektoncd#375)

Now that we want to have an official release, this would be a bad state
to release in, so we will remove this magical log PVC creation logic,
which was never our intended end state anyway.

Since we _do_ need the entrypoint rewriting and log interception logic
in the long run, this commit leaves most functionality intact, removing
only the PVC creation and changing the volume being used to an
`emptyDir`, which is what we will likely use for #107 (and this is how
Prow handles this as well). This means the released functionality will
be streaming logs to a location where nothing can read them, however I
think it is better than completely removing the functionality b/c:
1. We need the functionality in the long run
2. Users should be prepared for this functionality (e.g. dealing with
   edge cases around the taskrun controller being able to fetch an
   image's entrypoint)

Fixes tektoncd#387
bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Jan 30, 2019
We noticed early on that logs from init containers are often cleaned up
immediately by k8s, particularly if the containers are short running
(e.g. just echoing "hello world"). We started down a path to correct
that, which takes an approach based on Prow's entrypoint solution
(https://github.com/kubernetes/test-infra/tree/master/prow/cmd/entrypoint)
(even using the same image at the moment!) which wraps the user's
provided command and streams logs to a volume, from which the logs can
be uploaded/streamed by a sidecar.

Since we are using init containers for step execution, we can't yet use
sidecars, but we are addressing that in tektoncd#224 (also an entrypoint
re-writing based solution). Once we have that, we can sidecar support,
starting with GCS as a POC (#107) and moving into other types.

In the meantime, to enable us to get logs (particularly in tests), we
had the taskrun controller create a PVC on the fly to hold logs. This
has two problems:
* The PVCs are not cleaned up so this is an unexpected side effect for
  users
* Combined with PVC based input + ouput linking, this causes scheduling
  problems for the resulting pods (tektoncd#375)

Now that we want to have an official release, this would be a bad state
to release in, so we will remove this magical log PVC creation logic,
which was never our intended end state anyway.

Since we _do_ need the entrypoint rewriting and log interception logic
in the long run, this commit leaves most functionality intact, removing
only the PVC creation and changing the volume being used to an
`emptyDir`, which is what we will likely use for #107 (and this is how
Prow handles this as well). This means the released functionality will
be streaming logs to a location where nothing can read them, however I
think it is better than completely removing the functionality b/c:
1. We need the functionality in the long run
2. Users should be prepared for this functionality (e.g. dealing with
   edge cases around the taskrun controller being able to fetch an
   image's entrypoint)

Fixes tektoncd#387
knative-prow-robot pushed a commit that referenced this issue Jan 31, 2019
We noticed early on that logs from init containers are often cleaned up
immediately by k8s, particularly if the containers are short running
(e.g. just echoing "hello world"). We started down a path to correct
that, which takes an approach based on Prow's entrypoint solution
(https://github.com/kubernetes/test-infra/tree/master/prow/cmd/entrypoint)
(even using the same image at the moment!) which wraps the user's
provided command and streams logs to a volume, from which the logs can
be uploaded/streamed by a sidecar.

Since we are using init containers for step execution, we can't yet use
sidecars, but we are addressing that in #224 (also an entrypoint
re-writing based solution). Once we have that, we can sidecar support,
starting with GCS as a POC (#107) and moving into other types.

In the meantime, to enable us to get logs (particularly in tests), we
had the taskrun controller create a PVC on the fly to hold logs. This
has two problems:
* The PVCs are not cleaned up so this is an unexpected side effect for
  users
* Combined with PVC based input + ouput linking, this causes scheduling
  problems for the resulting pods (#375)

Now that we want to have an official release, this would be a bad state
to release in, so we will remove this magical log PVC creation logic,
which was never our intended end state anyway.

Since we _do_ need the entrypoint rewriting and log interception logic
in the long run, this commit leaves most functionality intact, removing
only the PVC creation and changing the volume being used to an
`emptyDir`, which is what we will likely use for #107 (and this is how
Prow handles this as well). This means the released functionality will
be streaming logs to a location where nothing can read them, however I
think it is better than completely removing the functionality b/c:
1. We need the functionality in the long run
2. Users should be prepared for this functionality (e.g. dealing with
   edge cases around the taskrun controller being able to fetch an
   image's entrypoint)

Fixes #387
pradeepitm12 pushed a commit to pradeepitm12/pipeline that referenced this issue Jan 28, 2021
Fixes tektoncd#387
This PR fixes a pointer bug where an EventListener with `n` triggers
would have the last trigger execute `n` times instead of all `n`
triggers executing once.

This PR also updates the TestHandleEvent function to test multiple
triggers. In the future, it might be worth refactoring the
TestHandleEvent function into a testing table, so we can increase our
test coverage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant