-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/go: [modules + integration] use several goproxy sources simultaneously #31304
Comments
For this use case, a singe multiplexing goproxy, that forwards to other go proxies could be implemented. Or, simply a file based proxy url in conjunction with go mod pack. |
It is intended to be used mostly with file goproxies¹ and those proxies are intended to be populated via However, a single file goproxy can not be used, due to the distinction between:
For build reliability and security, the first class of modules must be deployed on a read-only goproxy. The second class however, has to be deployed on a read-write goproxy (because the aim of the CI/CD job is to create and write those modules). Having to instantiate a go-specific proxy server in each go CI/CD job just because go tools can not read more than one directory of modules, would be complex, inefficient and fragile. As long as it's just copy files, make directory available, run directory indexing command, it's within easy capabilities of any CI/CD system. Adding a go-specific network server to the mix is not. Remember that a lot of CI/CD systems are not go-specific. They have to since a lot of software is not written in a single language. Anything that requires go-specific processing by the CI/CD system (and is not included in CI/CD system default extension points) is causing problems. ¹ the usual case will be local file goproxy sources (because it's simpler to popular CI/CD job-specific directories than handle the security aspects of allowing access to some urls but not others). Of course, some of the file goproxies may be deployed on network filesystems (but the go command does not need to be aware or that). |
I see, this is for use with file based goproxies. I agree that having a go mod pack command to easily populate a file based goproxy would be very useful. I would think that the problem you are describing could be solved by having two different users, one to run the current job that creates the packed modules and that has read only permissions to the goproxy file location, and another user with write permissions to the goproxy directories who, if the build is sucessful, installs them there? Also, go modules are in essence source only modules without binaries. I wonder, how do distribution maintainers solve this problem for a library in a different programming language, such as Ruby, which also has gem files with source only modules? |
Thanks
On a FHS Linux system, the read-only goproxy directory would not be just protected against writing, it would be owned by root and deployed at a fixed filesystem location. CI/CD read-write directories however can exist wherever the CI/CD job wants to create them in its own filesystem space. So it's not just a read-only/read-write separation, you also have a strong filesystem location separation.
As far as I know the CI/CD environment works the same for other languages. It creates a contained environment, where things that are not owned by the CI/CD job are deployed in standard locations, and locked against modifications. The CI/CD job can create files and directories in its own separate read-write filesystem hierarchy. If the CI/CD job is successful a subset of created files and directories is collected, with a mapping to canonical filesystem locations¹. Another CI/CD run can then request to use all of part of the result, that will then be exposed in the read/only filesystem space. ¹ The rpm idea on how to define the mapping is very basic, here is an empty directory, pretend it's |
BTW part of the motivation of the report series (especially #31300) is to help Go software benefit and catch up to the state of the art in CI/CD systems Linux side, which is evolving right now due to requests from the Rust and Golang Fedora SIGs (rpm-software-management/rpm#104 rpm-software-management/rpm#593 rpm-software-management/mock#245) Other language SIGs like Python and Java were also involved in the design in a less direct way. A lot of our system tools use Python so anything that does not work for Python would have been DOA. Java would really need this too, but is hampered by the multiplicity of its component systems, and years of code rot (due to the "peg specific commit" / "rename and fork on change" / "never merge back" Java dev mindset). I’m pretty pessimistic of Java being able to leverage any CI/CD improvement in the short term. |
This task seems best handled by a multiplexing proxy server like @beoran suggested. Or if a file based approach is preferred, then building a tool that creates a symlink tree on disk. I don't see a need to complicate cmd/go. Moreover, it seems like a separate tool/server would make it easier to adapt to the CI/CD system's evolving needs, rather than being blocked waiting on the Go project to review changes.
Making tools responsible for more tasks when they can be split out separately seems contrary to UNIX design. E.g., we have tee(1) instead of adding the ability to every command to write to both stdout and one or more files.
The CI/CD system is running Go-specific commands though when building a Go package, right? What's the difference if one of those commands starts an extra background process? A proxy server doesn't have to be long-lived. It can be ephemerally launched during a build, run while Go is building, and then get torn down with the rest of the build container. |
Pointing a command to a directory with module files is simple fast and without side effects. Simple is good. Simple is reliable. Even the init process is able to read several directories of unit files, without needing help, let alone the network. Directories exist to help organize and manage files (here organize external-to-job and internal-to-job modules). What's so strange or difficult about using multiple directories? Even Go modules use multiple package directories inside their zip file. No other computing language has a problem reading more than one directory of components. Both symlink trees and server processes are a management headache. Housekeeping a symlink tree is always surprisingly tricky, way more complex than reading several directories. Launching a server process, no matter how ephemeral, is a can of worms in terms of ip/port collisions, network filtering and access rules. Remember that the CI/CD is a secure environment, it's not open bar, the network layer is contained and the kind of containment and filter varies from environment to environment. In unix everything is a file and reading files is encouraged. Dumping everything on the network to avoid file reads is definitely not unix philosophy. |
It's complexity that most users don't need. E.g., GOPATH supports multiple directories, and based on the number of blog posts and shell snippets I see (even from expert Go programmers) that use $GOPATH as though it'll always expand to a single directory, I suspect usage and even awareness of that feature is very low. Moreover, it's unnecessary complexity. The extension point to build an external tool that makes multiple directories look like a single one already exists.
So use an overlay or union filesystem then. |
It's complexity so basic and common it's even included in the init process.
And we use it heavily. Modules are breaking our CI/CD setup. The CI/CD setup is used for more than Go software and is not going to change drastically just for Go modules. What will probably happen is either some form of disabling of Go modules in Fedora and RHEL, or years of cludges giving Fedora and Go a bad rep because the end result won't work well.
That's drastic CI/CD rework for Go land. Again, one of the core objective of the CI/CD system is to be simple, understandable and without side effects. Heavy hammers like custom overlays and network access aren't in this category. Besides, the way Go modules specified list indexes intermingled with module payload files, it is not possible to separate writes into different overlay layers (even if it was not a huge can of worms to start with) without deep overlay awareness to write corresponding indexes at the correct layer. |
Wait how is this not what $GOPROXY does already?
|
Now Go does support multiple module proxies, it did not when this issue was started. But I still think that this problem can best be solved by third party tools, rather than changes to Go itself. |
Let's close it anyway |
This report is part of a series, filled at the request of @mdempsky, focused at making Go modules integrator-friendly.
Please do not close or mark it as duplicate before making sure you’ve read and understood the general context. A lot of work went into identifying problems points precisely.
Needed feature
Go needs to allow using multiple goproxy sources simultaneously (at least 2).
Constrains
/etc/go/something
completed or masked by~/.config/go/something
on Linux systems, as per the Filesystem Hierarchy Standard and the XDG Base Directory SpecificationMotivation
In integrator workflows, a single goproxy can not be used, due to the distinction between:
For build reliability and security, the first class of modules must be deployed on a read-only goproxy. The second class however, has to be deployed on a read-write goproxy (because the aim of the CI/CD job is to create and write those modules).
The text was updated successfully, but these errors were encountered: