Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-source apps should get an option to combine sources and render them together #12471

Open
lippertmarkus opened this issue Feb 15, 2023 · 24 comments
Labels
enhancement New feature or request

Comments

@lippertmarkus
Copy link
Member

lippertmarkus commented Feb 15, 2023

Summary

For multi-source apps there are cases where it would make sense to put the files of one source into the directory of another source and render them together.

Motivation

Source 1: Helm Chart which uses .Files.Glob "**.json" to put configuration files into configmaps.
Source 2: Git Repo with additional json files that should be put into the configmaps by the Helm Chart (Source 1) as well.

Proposal

Could be implemented similarly like with the helm.valueFiles. Example:

  sources:
    - repoURL: https://github.com/example/my-json-files.git
      targetRevision: HEAD
      path: '.'
      ref: myjsonfiles
    - chart: myhelmchart
      repoURL: example.com/myhelmrepo
      targetRevision: 0.1.0
      additionalFiles:
         - $myjsonfiles
@lippertmarkus lippertmarkus added the enhancement New feature or request label Feb 15, 2023
@crenshaw-dev
Copy link
Member

Would you just put the files in a subdirectory called “myjsonfiles”?

@lippertmarkus
Copy link
Member Author

lippertmarkus commented Feb 15, 2023

For the example outlined above, that would be enough as .Files.Glob "**.json" would find the files in any subdirectory. For other cases however this might should be configurable, what do you think?

@crenshaw-dev
Copy link
Member

Fair. Maybe:

  sources:
    - repoURL: https://github.com/example/my-json-files.git
      targetRevision: HEAD
      path: '.'
      ref: myjsonfiles
    - chart: myhelmchart
      repoURL: example.com/myhelmrepo
      targetRevision: 0.1.0
      from:
         - ref: $myjsonfiles
           destination: ./myjsonfiles

That would leave room for things like exclude: **/*.tar.gz or renderFirst: true.

@lippertmarkus
Copy link
Member Author

sounds great! @gczuczy do you want to give it a try?

@gczuczy
Copy link
Contributor

gczuczy commented Feb 15, 2023

@lippertmarkus Yes. hopefully @crenshaw-dev can provide guidance regarding the project's bigger picture on the tiny details, and we can get to a point where everyone is sufficiently satisfied.

Btw, is it the usual fork-PR-review-merge flow?

@gczuczy
Copy link
Contributor

gczuczy commented Feb 15, 2023

@crenshaw-dev Adding a from element into the structure requires a CRD change as well, is that fine for a patch release, or is this targeted for 2.7?

Regarding the details, my thoughts so far:

  1. I think having 3 directives would be optimal: ref, src, dst (with whatever names), so we can copy any part of the referenced source, not restricted to the whole tree. Or src could be $ref/path as well. But I think the source path is much needed.
  2. Thinking about it, probably only copying files and directories unconditionally, checking whether symlinks are within the copied subtree, and skipping all other inode types would be the best.
  3. What about hardlinks? Should the logic keep track of inode numbers and keep hardlinks as they are, or else they will be duplicated in the copy
  4. Do we need the option (even if left unused) for a filter? This way later include/exclude logic could be implemented by passing a simple callback.

@crenshaw-dev
Copy link
Member

Btw, is it the usual fork-PR-review-merge flow?

Yep!

is that fine for a patch release, or is this targeted for 2.7?

It should be in a minor release.

If I review it, it'll have to wait for 2.8. I've maxed out what I can reasonably hope to review before 2.7.0-rc1 (Mar. 20, 2023).

You might be able to get another approver to review. I'd recommend looking for reviewers in the contributor office hours meeting tomorrow (and every Thursday) or in the #argo-contributors Slack channel.

I think having 3 directives would be optimal

Can you provide an example of the spec you imagine?

probably only copying files and directories unconditionally, checking whether symlinks are within the copied subtree, and skipping all other inode types would be the best

I dig that. We already have a function to check for out-of-bounds symlinks, so we could reuse that rather than yeeting all symlinks.

What about hardlinks?

My OS fundamentals are rusty. Do we need to consider them? I don't think we'll have them in git repos. Maybe also not in Helm charts and OCI artifacts?

Do we need the option (even if left unused) for a filter?

I think we should use an object in the source field so that we can add stuff like filters later. Maybe even an array of blocks, so you can reference multiple sources.

@gczuczy
Copy link
Contributor

gczuczy commented Feb 15, 2023

If I review it, it'll have to wait for 2.8. I've maxed out what I can reasonably hope to review before 2.7.0-rc1 (Mar. 20, 2023).

You might be able to get another approver to review. I'd recommend looking for reviewers in the contributor office hours meeting tomorrow (and every Thursday) or in the #argo-contributors Slack channel.

Thank you. I've added this PR under my name as a suggestion to the agenda. Can be there easily.

I think having 3 directives would be optimal

Can you provide an example of the spec you imagine?

Not exactly. I rather have requirements in my mind, and I think the actual wording of the spec as a minor detail. Everything works, as far as the ref, the srcpath and dst can be specified.

probably only copying files and directories unconditionally, checking whether symlinks are within the copied subtree, and skipping all other inode types would be the best

I dig that. We already have a function to check for out-of-bounds symlinks, so we could reuse that rather than yeeting all symlinks.

Could you please give me a hint to where to look for that function? That would simplify this part, reusing code is much welcome.

What about hardlinks?

My OS fundamentals are rusty. Do we need to consider them? I don't think we'll have them in git repos. Maybe also not in Helm charts and OCI artifacts?

Actually I don't think it is an issue here. I was thinking about it purely from a filesystem perspective, rather then taking into account the nature of the source of the data. I don't think we'll encounter any, helm and git don't support it to the best of my knowledge, and I don't really know much about OCI artifacts. But I think it's safe to ignore this aspect.

Do we need the option (even if left unused) for a filter?

I think we should use an object in the source field so that we can add stuff like filters later. Maybe even an array of blocks, so you can reference multiple sources.

Personally I would prefer to leave this kind of detail for later, and just keep this in mind when fiddling with the code, so expanding it later will be easier. I think the first goal should be getting the basic feature itself done, then we can discuss extending it, even maybe in another PR.

@crenshaw-dev
Copy link
Member

Could you please give me a hint to where to look for that function? That would simplify this part, reusing code is much welcome.

func CheckOutOfBoundsSymlinks(basePath string) error {

then we can discuss extending it, even maybe in another PR.

Makes sense, we'll just want to make sure that the first PR leaves room in the spec for any expansion we anticipate.

@gczuczy
Copy link
Contributor

gczuczy commented Feb 16, 2023

Could you please give me a hint to where to look for that function? That would simplify this part, reusing code is much welcome.

func CheckOutOfBoundsSymlinks(basePath string) error {

Using that function will make the complete copy operation fail in case of a single symlink is out of bounds. I would prefer an implementation where symlinks are checked individually when processed, and out-of-bound entries are skipped, instead of the whole operation erroring out in case of a single out-of-bound one.

Getting here seems rather straightforward, at path.go:50 the lambda can be moved to a separate function, called as a callback here, and in this PR's logic called directly, and not by the CheckOutOfBoundsSymlinks function.

@gczuczy
Copy link
Contributor

gczuczy commented Feb 16, 2023

I've created a draft PR for now, so the work is linked here and can be tracked.

@gczuczy
Copy link
Contributor

gczuczy commented Feb 16, 2023

May I ask what's the way of updating the protobuf specs after adding structures to the go code? From the source it seems like hack/generate-proto.sh is ought to do the job, however it fails with the following error:

2023/02/16 14:09:48 ./vendor: warning: directory does not exist.
github.com/gogo/protobuf/gogoproto/gogo.proto: File not found.
k8s.io/apimachinery/pkg/util/intstr/generated.proto:8:1: Import "github.com/gogo/protobuf/gogoproto/gogo.proto" was not found or had errors.
2023/02/16 14:09:48 protoc -I . -I ./ -I ./vendor -I /home/gczuczy/go/src/github.com/argoproj/argo-cd/dist/protoc-include --gogo_out=./ k8s.io/apimachinery/pkg/util/intstr/generated.proto
2023/02/16 14:09:48 Unable to generate protoc on k8s.io.apimachinery.pkg.util.intstr: exit status 1

I've installed the tools with hack/install.sh for codegen-go-tools, codegen-tools, protoc.

@crenshaw-dev
Copy link
Member

I would prefer an implementation where symlinks are checked individually

I'd prefer that as well. Unfortunately we've found through experience (and a handful of CVEs) that we can't reliably account for every build tool's (Helm, Kustomize, Jsonnet, plugins) ways of using symlinks. So instead we block them all by default if they leave the repo directory.

May I ask what's the way of updating the protobuf specs after adding structures to the go code?

Make sure Argo CD is cloned into $GOPATH/src/github.com/argoproj/argo-cd. Unfortunately some of our codegen stuff relies on that exact placement.

@gczuczy
Copy link
Contributor

gczuczy commented Feb 16, 2023

I would prefer an implementation where symlinks are checked individually

I'd prefer that as well. Unfortunately we've found through experience (and a handful of CVEs) that we can't reliably account for every build tool's (Helm, Kustomize, Jsonnet, plugins) ways of using symlinks. So instead we block them all by default if they leave the repo directory.

I think skipping out-of-bound symlinks individually satisfies that security requirement. If needed, a hardfail directive can be added which makes the whole operation error out if any found, instead of skipping them.

May I ask what's the way of updating the protobuf specs after adding structures to the go code?

Make sure Argo CD is cloned into $GOPATH/src/github.com/argoproj/argo-cd. Unfortunately some of our codegen stuff relies on that exact placement.

GOPATH wasn't set, but everything is under ~/go/src/github.com/argoproj/argo-cd/. I've re-ran the installers:

~/go/src/github.com/argoproj/argo-cd$ echo $GOPATH
/home/gczuczy/go
~/go/src/github.com/argoproj/argo-cd$ for i in codegen-go-tools codegen-tools protoc; do hack/install.sh $i; done
(...)
~/go/src/github.com/argoproj/argo-cd$ hack/generate-proto.sh
(...)
topological order github.com/argoproj/argo-cd/v2/pkg/apis/application/v1alpha1
2023/02/16 15:19:55 ./vendor: warning: directory does not exist.
/home/gczuczy/go/src/k8s.io/apimachinery/pkg/util/intstr/generated.proto: Input is shadowed in the --proto_path by "k8s.io/apimachinery/pkg/util/intstr/generated.proto".  Either use the latter file as your input or reorder the --proto_path so that the former file's location comes first.
2023/02/16 15:19:55 protoc -I . -I /home/gczuczy/go/src -I ./vendor -I /home/gczuczy/go/src/github.com/argoproj/argo-cd/dist/protoc-include --gogo_out=/home/gczuczy/go/src /home/gczuczy/go/src/k8s.io/apimachinery/pkg/util/intstr/generated.proto
2023/02/16 15:19:55 Unable to generate protoc on k8s.io.apimachinery.pkg.util.intstr: exit status 1
$ go version
go version go1.18.1 linux/amd64

@crenshaw-dev
Copy link
Member

I think skipping out-of-bound symlinks individually satisfies that security requirement. If needed, a hardfail directive can be added which makes the whole operation error out if any found, instead of skipping them.

When you say "skipping," do you mean skipping copying them to the new shared build directory?

./vendor: warning: directory does not exist.

Ah I think you need to run go mod vendor. I believe the make codegen-local target would cover that for you. But it would also do a lot of codegen you don't need (like CLI docs, manifests, etc.).

@gczuczy
Copy link
Contributor

gczuczy commented Feb 16, 2023

I think skipping out-of-bound symlinks individually satisfies that security requirement. If needed, a hardfail directive can be added which makes the whole operation error out if any found, instead of skipping them.

When you say "skipping," do you mean skipping copying them to the new shared build directory?
Exactly. If there is a symlink under the sourcepath that points out of sourcepath, I would just skip that. If needed, introduce a hardfail mode to error out if any found. This logic should satisfy the security requirement, and if we have a hardfail mode, that provides debugging info for the users.

Also, there might be usecases when the source of the copy has a symlink outside of the copy's scope for other purposes. Then that purpose and this feature would conflict. So, ignoring out-of-bound links selectively also makes it work for those cases.

./vendor: warning: directory does not exist.

Ah I think you need to run go mod vendor. I believe the make codegen-local target would cover that for you. But it would also do a lot of codegen you don't need (like CLI docs, manifests, etc.).

Let's see. go mod vendor exited without any output, now trying to regenerate it. make codegen-local errored out with like two pages of stuff:

go run ./hack/gen-catalog docs
# github.com/argoproj/argo-cd/v2/pkg/apiclient/applicationset
pkg/apiclient/applicationset/applicationset.pb.go:645:34: ambiguous selector m.Applicationset.MarshalToSizedBuffer
pkg/apiclient/applicationset/applicationset.pb.go:701:34: ambiguous selector m.Applicationset.MarshalToSizedBuffer
pkg/apiclient/applicationset/applicationset.pb.go:808:24: ambiguous selector m.Applicationset.Size
pkg/apiclient/applicationset/applicationset.pb.go:824:24: ambiguous selector m.Applicationset.Size
pkg/apiclient/applicationset/applicationset.pb.go:1149:31: ambiguous selector m.Applicationset.Unmarshal
pkg/apiclient/applicationset/applicationset.pb.go:1236:31: ambiguous selector m.Applicationset.Unmarshal
pkg/apiclient/applicationset/applicationset.pb.gw.go:59:9: cannot use msg (variable of type *v1alpha1.ApplicationSet) as type protoiface.MessageV1 in return statement:
        *v1alpha1.ApplicationSet does not implement protoiface.MessageV1 (missing ProtoMessage method)
pkg/apiclient/applicationset/applicationset.pb.gw.go:86:9: cannot use msg (variable of type *v1alpha1.ApplicationSet) as type protoiface.MessageV1 in return statement:
        *v1alpha1.ApplicationSet does not implement protoiface.MessageV1 (missing ProtoMessage method)
pkg/apiclient/applicationset/applicationset.pb.gw.go:106:9: cannot use msg (variable of type *v1alpha1.ApplicationSetList) as type protoiface.MessageV1 in return statement:
        *v1alpha1.ApplicationSetList does not implement protoiface.MessageV1 (missing ProtoMessage method)
pkg/apiclient/applicationset/applicationset.pb.gw.go:122:9: cannot use msg (variable of type *v1alpha1.ApplicationSetList) as type protoiface.MessageV1 in return statement:
        *v1alpha1.ApplicationSetList does not implement protoiface.MessageV1 (missing ProtoMessage method)
pkg/apiclient/applicationset/applicationset.pb.gw.go:122:9: too many errors
# github.com/argoproj/argo-cd/v2/reposerver/apiclient
reposerver/apiclient/repository.pb.go:2472:21: v.MarshalToSizedBuffer undefined (type *v1alpha1.RefTarget has no field or method MarshalToSizedBuffer)
reposerver/apiclient/repository.pb.go:2508:31: m.HelmOptions.MarshalToSizedBuffer undefined (type *v1alpha1.HelmOptions has no field or method MarshalToSizedBuffer)
reposerver/apiclient/repository.pb.go:2568:41: m.HelmRepoCreds[iNdEx].MarshalToSizedBuffer undefined (type *v1alpha1.RepoCreds has no field or method MarshalToSizedBuffer)
(...)
reposerver/apiclient/repository.pb.go:2638:33: m.Repos[iNdEx].MarshalToSizedBuffer undefined (type *v1alpha1.Repository has no field or method MarshalToSizedBuffer)
reposerver/apiclient/repository.pb.go:2651:37: m.ApplicationSource.MarshalToSizedBuffer undefined (type *v1alpha1.ApplicationSource has no field or method MarshalToSizedBuffer)
reposerver/apiclient/repository.pb.go:2701:24: m.Repo.MarshalToSizedBuffer undefined (type *v1alpha1.Repository has no field or method MarshalToSizedBuffer)
reposerver/apiclient/repository.pb.go:2912:24: m.Repo.MarshalToSizedBuffer undefined (type *v1alpha1.Repository has no field or method MarshalToSizedBuffer)
reposerver/apiclient/repository.pb.go:2995:23: ambiguous selector m.App.MarshalToSizedBuffer
reposerver/apiclient/repository.pb.go:2995:23: too many errors

and similar messages for multiple pb.go files.

Found a protogen target in the Makefile which does run go mod vendor, how it's still failing:

topological order github.com/argoproj/argo-cd/v2/pkg/apis/application/v1alpha1
2023/02/16 16:06:01 github.com/argoproj/argo-cd/vendor/k8s.io/apimachinery/pkg/apis/meta/v1/generated.proto:10:1: warning: Import k8s.io/apimachinery/pkg/runtime/schema/generated.proto is unused.
2023/02/16 16:06:02 github.com/argoproj/argo-cd/vendor/k8s.io/api/core/v1/generated.proto:12:1: warning: Import k8s.io/apimachinery/pkg/runtime/schema/generated.proto is unused.
2023/02/16 16:06:04 github.com/argoproj/argo-cd/vendor/k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1/generated.proto:11:1: warning: Import k8s.io/apimachinery/pkg/runtime/schema/generated.proto is unused.
2023/02/16 16:06:04 /home/gczuczy/go/src/github.com/argoproj/argo-cd/v2/pkg/apis/application/v1alpha1/generated.proto: Input is shadowed in the --proto_path by "github.com/argoproj/argo-cd/v2/pkg/apis/application/v1alpha1/generated.proto".  Either use the latter file as your input or reorder the --proto_path so that the former file's location comes first.
2023/02/16 16:06:04 protoc -I . -I /home/gczuczy/go/src -I ./vendor -I /home/gczuczy/go/src/github.com/argoproj/argo-cd/dist/protoc-include --gogo_out=/home/gczuczy/go/src /home/gczuczy/go/src/github.com/argoproj/argo-cd/v2/pkg/apis/application/v1alpha1/generated.proto
2023/02/16 16:06:04 Unable to generate protoc on github.com.argoproj.argo_cd.v2.pkg.apis.application.v1alpha1: exit status 1
make: *** [Makefile:194: protogen] Error 1

However, it's not missing vendor anymore.

@crenshaw-dev
Copy link
Member

Also, there might be usecases when the source of the copy has a symlink outside of the copy's scope for other purposes. Then that purpose and this feature would conflict. So, ignoring out-of-bound links selectively also makes it work for those cases.

I'm not sure what you're proposing... how would out-of-bound links be selectively ignored? In general, allowing links outside the source repo isn't acceptable, because the environment (the repo-server) contains sensitive information.

make codegen-local errored out with like two pages of stuff:

Can you push the current state of your repo? I can try to reproduce it locally.

@gczuczy
Copy link
Contributor

gczuczy commented Feb 16, 2023

Also, there might be usecases when the source of the copy has a symlink outside of the copy's scope for other purposes. Then that purpose and this feature would conflict. So, ignoring out-of-bound links selectively also makes it work for those cases.

I'm not sure what you're proposing... how would out-of-bound links be selectively ignored? In general, allowing links outside the source repo isn't acceptable, because the environment (the repo-server) contains sensitive information.

When walking the source tree for copying, check every entry to be a symlink, if it's a symlink then check the target, if the target is outside of the sourcepath of the copy then simply continue to the next entry, ignore it.

Hardfail would be running a check for out-of-bound symlinks before starting the copy, and not even starting the copy if any are found.
Softfail would be not doing a check preemptively, but during the recursion of the source path for the copy checking the links, and if one points out of the sourcepath, then skipping it.

make codegen-local errored out with like two pages of stuff:

Can you push the current state of your repo? I can try to reproduce it locally.
Pushed, 39ed5348 should be it.

@crenshaw-dev
Copy link
Member

Ah I understand now. I agree, skip and move on makes sense.

I think we could actually allow symlinks up to the boundary of the new top-level shared directory, rather than the bounds of the original repo.

Pushed, 39ed534 should be it.

Taking a look.

@gczuczy
Copy link
Contributor

gczuczy commented Feb 16, 2023

Ah I understand now. I agree, skip and move on makes sense.

I think we could actually allow symlinks up to the boundary of the new top-level shared directory, rather than the bounds of the original repo.

Yup. The only thing I think not worth diving into is doing relative checks. There might be a few edge cases where symlinks temporarily are pointing out of the source/destinationpath but getting back inside again, and/or ending up with a different destination after the copy.

For an example an src like: $ref/foo/bar/blah -> ../../foo/b, sourcePath is $ref/foo, destinationPath is irrelevant, but there's directory with the same name at $destinationPath/../foo. In this case when the symlink's target is checked relatively, the step out of bound would be seen and actioned, but when checked absolutely (by appending (src|dst)path/linktarget it could be missed, and after copying would point to a completely different entity. I'm not sure it's worth dealing with such a detail explicitly, despite it being able to provide "funny" side effects.

@gczuczy
Copy link
Contributor

gczuczy commented Feb 16, 2023

@crenshaw-dev If it's something broken in the forked branch, I've added you to the fork's contributors, so you can directly push to it.

@crenshaw-dev
Copy link
Member

@gczuczy pushed the fixes. I think there was a race condition in the codegen-local make target. notifications-docs depended on codegen having already passed.

@bumarcell
Copy link

Thank you guys for the great effort 🙏
Any updated on this?

@NikOverflow
Copy link

Thank you guys for the great effort 🙏 Any updated on this?

i'm also interested. I would love to see this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
5 participants