Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove extra log PVC #443

Merged
merged 1 commit into from
Jan 31, 2019

Conversation

bobcatfish
Copy link
Collaborator

We noticed early on that logs from init containers are often cleaned up
immediately by k8s, particularly if the containers are short running
(e.g. just echoing "hello world"). We started down a path to correct
that, which takes an approach based on Prow's entrypoint solution
(https://github.com/kubernetes/test-infra/tree/master/prow/cmd/entrypoint)
(even using the same image at the moment!) which wraps the user's
provided command and streams logs to a volume, from which the logs can
be uploaded/streamed by a sidecar.

Since we are using init containers for step execution, we can't yet use
sidecars, but we are addressing that in #224 (also an entrypoint
re-writing based solution). Once we have that, we can sidecar support,
starting with GCS as a POC (#107) and moving into other types.

In the meantime, to enable us to get logs (particularly in tests), we
had the taskrun controller create a PVC on the fly to hold logs. This
has two problems:

Now that we want to have an official release, this would be a bad state
to release in, so we will remove this magical log PVC creation logic,
which was never our intended end state anyway.

Since we do need the entrypoint rewriting and log interception logic
in the long run, this commit leaves most functionality intact, removing
only the PVC creation and changing the volume being used to an
emptyDir, which is what we will likely use for #107 (and this is how
Prow handles this as well). This means the released functionality will
be streaming logs to a location where nothing can read them, however I
think it is better than completely removing the functionality b/c:

  1. We need the functionality in the long run
  2. Users should be prepared for this functionality (e.g. dealing with
    edge cases around the taskrun controller being able to fetch an
    image's entrypoint)

Fixes #387

(@pivotal-nader-ziada I think you said you've been refactoring the PVC logic a bunch, so if this conflicts with your changes plz just ignore it, and I'll deal with any rebasing later after you've merged!)

@googlebot googlebot added the cla: yes Trying to make the CLA bot happy with ppl from different companies work on one commit label Jan 27, 2019
@knative-prow-robot knative-prow-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 27, 2019
@knative-prow-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobcatfish

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 27, 2019
@bobcatfish
Copy link
Collaborator Author

I0127 20:35:41.594]  clusterConditions: []
I0127 20:35:41.595]  detail: u"Deploy error: Not all instances running in IGM after 4m17.390316985s. Expect 1. Current errors: [ZONE_RESOURCE_POOL_EXHAUSTED_WITH_DETAILS]: Instance 'gke-kbuild-pipeline-e2e--default-pool-11f3aa8c-vsgj' creation failed: The zone 'projects/knative-boskos-05/zones/us-central1-f' does not have enough resources available to fulfill the request.  '(resource type:pd-standard)'. - ; ."

🤔

/test pull-knative-build-pipeline-integration-tests

@bobcatfish
Copy link
Collaborator Author

Looks like something is wrong with the cluster setup :S gonna have to ping @adrcunha and folks once we're back in regular working hours.

@bobcatfish
Copy link
Collaborator Author

/test pull-knative-build-pipeline-integration-tests

Copy link
Member

@nader-ziada nader-ziada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, this has no conflict with the work I'm doing to refactor the pvc used to share artifacts between tasks

@nader-ziada
Copy link
Member

/test pull-knative-build-pipeline-integration-tests

1 similar comment
@nader-ziada
Copy link
Member

/test pull-knative-build-pipeline-integration-tests

@nader-ziada
Copy link
Member

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 28, 2019
@adrcunha
Copy link
Contributor

/retest

2 similar comments
@adrcunha
Copy link
Contributor

/retest

@shashwathi
Copy link
Contributor

/retest

@nader-ziada
Copy link
Member

@bobcatfish this PR just needs to resolve conflicts and would be good to get merged

@knative-prow-robot knative-prow-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 30, 2019
@bobcatfish
Copy link
Collaborator Author

I0130 22:04:32.662] # github.com/knative/build-pipeline/test
I0130 22:04:32.663] test/embed.go:68:18: undefined: setup
I0130 22:04:32.663] test/embed.go:70:42: undefined: tearDown
I0130 22:04:32.663] test/embed.go:71:8: undefined: tearDown

what have i done

docs/using.md Outdated
overwritten with a custom binary. The plan is to use this custom binary for
controlling the execution of step containers ([#224](https://github.com/knative/build-pipeline/issues/224)) and log streaming
[#107](https://github.com/knative/build-pipeline/issues/107), though currently
it will write logs only to an [`emptyDir`]() (which cannot be read from after
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops 😅

We noticed early on that logs from init containers are often cleaned up
immediately by k8s, particularly if the containers are short running
(e.g. just echoing "hello world"). We started down a path to correct
that, which takes an approach based on Prow's entrypoint solution
(https://github.com/kubernetes/test-infra/tree/master/prow/cmd/entrypoint)
(even using the same image at the moment!) which wraps the user's
provided command and streams logs to a volume, from which the logs can
be uploaded/streamed by a sidecar.

Since we are using init containers for step execution, we can't yet use
sidecars, but we are addressing that in tektoncd#224 (also an entrypoint
re-writing based solution). Once we have that, we can sidecar support,
starting with GCS as a POC (#107) and moving into other types.

In the meantime, to enable us to get logs (particularly in tests), we
had the taskrun controller create a PVC on the fly to hold logs. This
has two problems:
* The PVCs are not cleaned up so this is an unexpected side effect for
  users
* Combined with PVC based input + ouput linking, this causes scheduling
  problems for the resulting pods (tektoncd#375)

Now that we want to have an official release, this would be a bad state
to release in, so we will remove this magical log PVC creation logic,
which was never our intended end state anyway.

Since we _do_ need the entrypoint rewriting and log interception logic
in the long run, this commit leaves most functionality intact, removing
only the PVC creation and changing the volume being used to an
`emptyDir`, which is what we will likely use for #107 (and this is how
Prow handles this as well). This means the released functionality will
be streaming logs to a location where nothing can read them, however I
think it is better than completely removing the functionality b/c:
1. We need the functionality in the long run
2. Users should be prepared for this functionality (e.g. dealing with
   edge cases around the taskrun controller being able to fetch an
   image's entrypoint)

Fixes tektoncd#387
@knative-metrics-robot
Copy link

The following is the coverage report on pkg/.
Say /test pull-knative-build-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/v1alpha1/taskrun/taskrun.go 73.0% 74.0% 0.9

@imjasonh
Copy link
Member

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 31, 2019
@knative-prow-robot knative-prow-robot merged commit 9425e01 into tektoncd:master Jan 31, 2019
bobcatfish added a commit to bobcatfish/pipeline that referenced this pull request Feb 6, 2019
Verifying the pipelinerun example worked was disabled due to the combo
of tektoncd#375 and tektoncd#443, but now that we've removed the extra log PVC in tektoncd#443
we shouldn't run into this issue anymore :D
bobcatfish added a commit to bobcatfish/pipeline that referenced this pull request Feb 6, 2019
Verifying the pipelinerun example worked was disabled due to the combo
of tektoncd#375 and tektoncd#443, but now that we've removed the extra log PVC in tektoncd#443
we shouldn't run into this issue anymore :D
knative-prow-robot pushed a commit that referenced this pull request Feb 6, 2019
Verifying the pipelinerun example worked was disabled due to the combo
of #375 and #443, but now that we've removed the extra log PVC in #443
we shouldn't run into this issue anymore :D
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Trying to make the CLA bot happy with ppl from different companies work on one commit lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants