Skip to content

Commit

Permalink
Document bug with sidecar usage of nop container
Browse files Browse the repository at this point in the history
Sidecars are stopped by having their Image field swapped out to the
`nop` image. When the nop image starts up in the sidecar container it is
supposed to immediately exit because `nop` doesn't include the sidecar's
command. However, when the `nop` image *does* contain the command that
the sidecar is running, the sidecar container will actually never stop
and the Task will eventually timeout.

For most sidecars this issue will not manifest - the `nop` container
that Tekton provides out of the box includes only a very limited set of
commands. However, if a Tekton operator overrides the `nop` image when
deploying the tekton controller (for example, because their organization
requires images configured for Tekton to be built on their org's own base
image) then there is a risk that `nop` will start offering more commands
and therefore introduce a higher risk that a sidecar's command will be
runnable by the `nop` image finally increasing the likelihood of Tasks
with sidecars running until timeout.

This issue is a known bug with the way sidecars operate at the moment
and is being tracked in #1347
but should be documented clearly.
  • Loading branch information
Scott authored and tekton-robot committed Oct 25, 2019
1 parent c8e15fe commit 6c132b9
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 2 deletions.
14 changes: 12 additions & 2 deletions docs/developers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,5 +183,15 @@ begin.
On completion of all steps in a Task the TaskRun reconciler stops any
sidecar containers. The `Image` field of any sidecar containers is swapped
to the nop image. Kubernetes observes the change and relaunches the container
with updated container image. The nop container image exits. The container
is considered `Terminated` by Kubernetes and the TaskRun's Pod stops.
with updated container image. The nop container image exits immediately
*because it does not provide the command that the sidecar is configured to run*.
The container is considered `Terminated` by Kubernetes and the TaskRun's Pod
stops.

There is a known issue with this implementation of sidecar support. When the
`nop` image does provide the sidecar's command, the sidecar will continue to
run even after `nop` has been swapped into the sidecar container's image
field. See https://github.com/tektoncd/pipeline/issues/1347 for the issue
tracking this bug. Until this issue is resolved the best way to avoid it is to
avoid overriding the `nop` image when deploying the tekton controller, or
ensuring that the overridden `nop` image contains as few commands as possible.
6 changes: 6 additions & 0 deletions docs/taskruns.md
Original file line number Diff line number Diff line change
Expand Up @@ -590,6 +590,12 @@ order to terminate the sidecars they will be restarted with a new
Pod will include the sidecar container with a Retry Count of 1 and
with a different container image than you might be expecting.

Note: The configured "nop" image must not provide the command that the
sidecar is expected to run. If it does provide the command then it will
not exit. This will result in the sidecar running forever and the Task
eventually timing out. https://github.com/tektoncd/pipeline/issues/1347
is the issue where this bug is being tracked.

---

Except as otherwise noted, the content of this page is licensed under the
Expand Down
8 changes: 8 additions & 0 deletions docs/tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -447,6 +447,14 @@ volumes:
emptyDir: {}
```

Note: There is a known bug with Tekton's existing sidecar implementation.
Tekton uses a specific image, called "nop", to stop sidecars. The "nop" image
is configurable using a flag of the Tekton controller. If the configured "nop"
image contains the command that the sidecar was running before the sidecar
was stopped then the sidecar will actually keep running, causing the TaskRun's
Pod to remain running, and eventually causing the TaskRun to timeout rather
then exit successfully. Issue https://github.com/tektoncd/pipeline/issues/1347
has been created to track this bug.

### Variable Substitution

Expand Down

0 comments on commit 6c132b9

Please sign in to comment.