-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remote Artifacts #616
Remote Artifacts #616
Conversation
@@ -74,6 +75,7 @@ The `Build` definition supports the following fields: | |||
- Optional: | |||
- `spec.parameters` - Refers to a list of `name-value` that could be used to loosely type parameters in the `BuildStrategy`. | |||
- `spec.dockerfile` - Path to a Dockerfile to be used for building an image. (_Use this path for strategies that require a Dockerfile_) | |||
- `spec.sources` - [Sources](#Sources) describes a slice of artifacts that will be imported into project context, before the actual build process starts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separately, should we add a deprecation notice on source
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should head to the deprecation of source
field, lets tackle that chance in a dedicated pull-request coming next based on #652 .
pkg/config/config.go
Outdated
@@ -22,6 +22,9 @@ const ( | |||
kanikoDefaultImage = "gcr.io/kaniko-project/executor:v1.5.1" | |||
kanikoImageEnvVar = "KANIKO_CONTAINER_IMAGE" | |||
|
|||
remoteArtifactsDefaultImage = "busybox:latest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the full image reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this will default to docker.io which can result in throttling
quay.io/quay/busybox:latest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Fixed on latest commit, using quay.io hosted image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a first pass @otaviof
@@ -247,6 +249,33 @@ spec: | |||
name: icr-knbuild | |||
``` | |||
|
|||
### Sources | |||
|
|||
Represents remote artifacts, as in external entities that will be added to the build context before the actual build starts. Therefore, you may employ `.spec.sources` to download artifacts from external repositories. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although your first pass @otaviof is only supporting remote artifacts, we do know we have plans to expand that over time. IMO we should go ahead and explain that here.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, Gabe! I'll make sure the documentation reflects the plan to make .spec.sources
an extension point by adding more "source types" later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider latest change.
pkg/config/config.go
Outdated
@@ -22,6 +22,9 @@ const ( | |||
kanikoDefaultImage = "gcr.io/kaniko-project/executor:v1.5.1" | |||
kanikoImageEnvVar = "KANIKO_CONTAINER_IMAGE" | |||
|
|||
remoteArtifactsDefaultImage = "busybox:latest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this will default to docker.io which can result in throttling
quay.io/quay/busybox:latest
// TODO: define credentials to reach external artifacts. | ||
// +optional | ||
// Credentials *corev1.LocalObjectReference `json:"credentials"` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to per BuildSource creds, I could see the image used for the download varying on a per source basis
I'm good with maintaining the configurable global default you currently have, but an optional per BuildSource image override seems like a good idea. It might not be as immediate with the one type (i.e. remote) but could be more prevalent as we add types.
Also, we do have this global default, per build override pattern in openshift build v1, and this config item feels similar to what we cover with that pattern.... for whatever that is worth.
I did go back to the EP @otaviof to see if this was discussed previously and I did not see anything. Apologies if I missed it. Or apologies if I missed making this suggestion back when reviewing the EP :-)
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked very much this suggestion, that's an elegant approach to extend the ability to support different types of remote artifacts, and also give users an opportunity to extend this behavior elegantly.
It was not part of the EP itself, so my suggestion is that we create a issue to discuss (and document) this approach, and make sure we tackle this change right after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, perhaps we could make this change part of #652 EP that @adambkaplan is working on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with transferring the credentials granularity concern to #652
In fact I already see there where it has been covered. A yaml snippet from part of the proposed API there:
- name: git
type: Git
git:
url: https://github.com/shipwright-io/build.git
ref: main
credentials:
name: git-creds
credentials
is part of the specific source entry ... so finer granularity provided.
I consider my suggestion resolved @otaviof
I will try to review this one tomorrow. |
@otaviof could you resolve the conflicts on this PR pls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking forward to this in action. Just found a couple of things we I would like to hear what you think about changing the test style towards Ginkgo/Gomega. I know it is all about preferences, but I think these test cases could look a bit more readable when written in Gomega style.
"github.com/tektoncd/pipeline/pkg/apis/pipeline/v1beta1" | ||
) | ||
|
||
func TestRemoteArtifacts_outputPathForBuildSource(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use Ginkgo/Gomega style pretty much everywhere else. Wondering if we should aim for a more common style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I end up writing vanilla go test since it was simpler to implement the test tables for unit testing.
@HeavyWombat thanks for the changes/fixes! 👍🏼 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @otaviof! Just found some spots where I have questions and one where we have to tweak the download script a bit.
items: | ||
description: BuildSource remote artifact definition. | ||
properties: | ||
http: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have a string opinion on this, but after reading over the name http
a couple of times I am wondering whether this could be a confusing name. We support both HTTP and HTTPS with this, but one could argue that purely based on the name one could be tricked into thinking it is HTTP only. Did a quick check and saw that brew
is calling this kind of transport mechanism a "CurlDownloadStrategy", which kind of indicates that whatever location you configure, it needs to be able to be downloaded via curl
(or wget
in this respect).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do understand the concerns, and I believe we can address them via API documentation, users can make sure the protocol used in the url
attribute is respected.
Personally I'm a bit reluctant to adopt alternative names, I prefer common definitions like in this case "http".
Image: image, | ||
WorkingDir: workspaceDir, | ||
Command: []string{"/bin/sh"}, | ||
Args: append([]string{"-x", "-c"}, strings.Join(script, " ")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering, if users put authorization credentials into the URL (https://foo:bar@foobar.com/foobar
), then we would expose them in the logs when using -x
. Could be a total edge case, but it just crossed my mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This situation could happen. For the initial version of the remote artifacts, we are not supporting authentication, but later on we should add proper support for it, so we can make sure users would not rely in alternative mechanisms, like you mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on this thread then please consider https://github.com/shipwright-io/build/pull/616/files#r603435889 @otaviof
|
||
// renderRemoteArtifactsDownloadScript returns a slice of commands, a shell script, based on informed | ||
// BuildSources slice. Scripting lines are bind together with "&&". | ||
func renderRemoteArtifactsDownloadScript(sources []buildv1alpha1.BuildSource) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a comment, I am wondering if in another iteration, we could introduce GNU parallel to do the downloads of all sources concurrently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now we are using wget
to download the data, but for the future we might decide do create our own tool to download data. Parallelism would be one of the reasons I would use for creating a standalone tool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah the recent "multi-arch" topics / issues registered for shipwright could come into play here as well
wget
is safe wrt having mult-arch versions .... correct my reading @HeavyWombat of gnu parallell if I am wrong but while I saw some AIX RPM refs, I did not see any for s390x, for example
So I think it is good you registered the point to help us remember such things in the future, but I also think @otaviof 's simplifying assumption here for now, with thoughts on how to make more pluggable in the future, seem like the right thing
} | ||
cmd := []string{"wget", source.URL} | ||
if filePath := outputPathForBuildSource(source); filePath != "" { | ||
cmd = append(cmd, fmt.Sprintf("%s=%s", outputDocumentFlag, filePath)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the output directory, or some parts of it do not exist, I think wget
will not create those and fail with the download. Should we add mkdir -p
or similar here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating the directory path before downloading the payload is another use-case that pushes for a dedicated application to handle this logic. The new actor should take care of creating the directory structure, respecting authentication, UID/GID and more, before fetching the payload. Furthermore, we can consider parallelism and other investments for performance related subjects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I'm a little confused / need some help on this thread?
So, we have
func outputPathForBuildSource(source buildv1alpha1.BuildSource) string {
if source.HTTP == nil {
return ""
}
return source.HTTP.Path
}
And I see this in the new doc:
Under `.spec.sources` we have the following attributes:
- `.name`: represents the name of resource, required attribute.
- `.type`: data source type, currently only `http` type is supported, required attribute.
- `.url`: universal resource location (URL), required attribute.
- `.http.path`: alternative path to save URL contents, can be either a full or relative path, optional attribute.
When downloading artifacts the process is executed in the same directory where the application source-code is located, by default `/workspace/source`, therefore when using relative path on `.http.path` mind the starting location.
Are you saying @otaviof that another piece of code outside of this PR will be needed for creating source.HTTP.Path
?
If so, I'm not sure we should be adding fields we don't have a viable "default implementation" for all cases. In which case, perhaps we have to be more precise / restrictive in what we support "out of the box" for the field.
One possible way to restrict: perhaps we tweak the last sentence of the doc to say, at least for now, we only support relative paths off of the source code location, in which case a mkdir -p
could be more viable, though the point of owners and file permissions outside of the file we are downloading is a good one ... once we allow dirs to be created, we almost have to ask for that metadata as well, and perhaps have underlying chown
and chmod
calls.
Or perhaps the destination dir location always has to be /workspace/source
and we allow the use of wget -O
to change the name of the file that we store the contents in. i.e. the remote artifact is foo.tar
but we let them call it bar.tar
.
Or I'm just confused / missing the point as I said up top and I need further explanation ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot, not that this has to dictate which way we go, but for the analogous function in openshift builds, making relative subdirs is hard coded perms wise to
err := os.MkdirAll(destDir, 0755)
and there is not chown's done. And we have not had any complaints so far. And it only supports relative paths off of a predefined / hardcoded mount in the pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying @otaviof that another piece of code outside of this PR will be needed for creating
source.HTTP.Path
?
I'm afraid that's currently the case already, Gabe. Here in this PR we have wget
being employed to download, and store, the remote artifacts, that's "another piece of code".
However, the current PR is evolving to become a "shell script" generator backed in the operator code. I would like to address this concern by having a clear separation of concerns where our "custom wget
" would deal with such environment changes, like creating the underlying directory path, and many others to surface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to make changes in this PR to only accept relative locations (to /workspace/source
), since this is the only mount-point where the user is allowed to. I hope that's a good starting point until we can properly address the separation of concerns.
Thanks for the input, the field .http.path
is indeed a sharp corner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So @otaviof @coreydaley @adambkaplan @sbose78 and myself discussed this collectively in a team meeting this morning (US time) / afternoon (Europe time).
I'll try to summarize where we landed:
- we went from only accepting relative paths for
.http.path
to removing the introduction of this field from this PR; instead, it will be introduced when some of the next steps @otaviof alluded to here show up in PRs. For the sake of brevity, I'll call the upcoming work "pluggable download mechanism". - where needed (if anywhere) @otaviof will look at what happens when multiple sources download to the same file name under
/workspace/sources
, and making sure that it is either mitigated or surfaced in a suitable way to the user.
Of course @otaviof @coreydaley @adambkaplan @sbose78 chime in if I've missed something that should be noted here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Folks, please consider latest changes.
When downloading artifacts the process is executed in the same directory where the application source-code is located, by default `/workspace/source`, therefore when using relative path on `.http.path` mind the starting location. | ||
|
||
Additionally, we have plan to keep evolving `.spec.sources` by adding more supported types, this API attribute works as an extension point to support external and internal resource locations. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per #616 (comment) @otaviof I think you need to put the disclaimer about not supporting auth in the URL here in this doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with the latest version of commits @otaviof I don't see the disclaimer in docs/build.md that the current implementation does not support URL's with auth credentials included.
Or did you change the code to support that?
e8473c1
to
ec14098
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took my next pass over you latest changes @otaviof since the last time I reviewed this.
At least for things I previously commented on, the only outstanding item I found was updating the docs/build.md
file and the godoc for the new URL i.e. spec.sources.url to cite not supporting auth credentials in the URL per you comment #616 (comment)
If that comment still holds, the build.md and godoc should cite that IMO. See the specific spots below.
Or if that restriction no longer applies for you, then all my prior asks have been addressed.
Once we sort that out, I'll make an additional overall pass before adding my sign off on this.
thanks
When downloading artifacts the process is executed in the same directory where the application source-code is located, by default `/workspace/source`, therefore when using relative path on `.http.path` mind the starting location. | ||
|
||
Additionally, we have plan to keep evolving `.spec.sources` by adding more supported types, this API attribute works as an extension point to support external and internal resource locations. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with the latest version of commits @otaviof I don't see the disclaimer in docs/build.md that the current implementation does not support URL's with auth credentials included.
Or did you change the code to support that?
// Name instance entry. | ||
Name string `json:"name"` | ||
|
||
// URL remote artifact location. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any restriction around not supporting auth credentials in the URL should be included in the godoc IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump @otaviof - I saw the auth restriction in the markdown down for this, but still IMO the godoc should note the restriction as well
update here to and then I'll add the approve label
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gabemontero it's on the type declaration, o lines 10-11.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok sorry I missed that ... was expecting it on the field
I will defer on the location
/approve
Between @imjasonh who just commented, @HeavyWombat prior comments and making sure he is good, and I see @qu1queee wanted to review but was looking for the needs-rebase to be cleared up, we get those sorted out, squash commits, and then one of those folks can lgtm
Sure. I pushed more changes adding godoc and documentation to highlight the lack of authentication at this point. Thanks! |
@@ -0,0 +1,271 @@ | |||
apiVersion: apiextensions.k8s.io/v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these *_crd.yaml files necessary and intentional? They seem to be redundant with the non-_crd.yaml files. Having two copies might be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since those files are not created on this PR, we should move this conversation to a issue. Would it work for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick +1 from my side to tackle unnecessary suffixes in filenames in a separate issue.
script := []string{} | ||
for _, source := range sources { | ||
if len(script) > 0 { | ||
script = append(script, "&&") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful at all to use sh -e
instead of injecting &&
s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed 👍🏼 Thanks, that's more appropriate for it.
if len(script) > 0 { | ||
script = append(script, "&&") | ||
} | ||
cmd := []string{"wget", source.URL} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything in place to sanitize the source.URL
value here? What happens if a malicious user passes url: "https://redhat.com; rm -rf /"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be handled here, please consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I totally missed that. Thanks! 👍
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gabemontero The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Creating a initial implementation for this feature, using `wget` as a initial web download tool.
All commits have been squashed, please consider. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Changes
This pull-request implements the feature #615 (Remote Artifacts), by enhancing the CRD with
.spec.sources
attribute, and generating a download script to fetch remote artifacts into build context.Usage Example
As a developer, I can describe and consume remote-artifacts, those will be downloaded before the actual build starts. In this example, we have a simple Node.js application which downloads the logo image from a external source.
Submitter Checklist
Release Notes