-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/cgo: arrange to pass unmodified Go source files to compiler, go/types #16623
Comments
This means making cgo part of the compiler?
One of the reason cmd/cgo is separate it that
other Go compilers could potentially benefit
from that to support cgo. If it supports enough
of cgo magic comments.
|
My idea is to keep cmd/cgo behaving the same as it does today, but to extract much of its functionality as a new package like "internal/cgo" that can be used by cmd/cgo, cmd/compile/internal/gc, and go/types. Like I said above (downside 1), we need cmd/cgo for gccgo (because it's written in C++ and can't rely on a Go library). |
If much of the functionality is extracted into another package,
would we still use cmd/cgo when building cgo programs?
If not, cmd/cgo might bit rot eventually.
|
I expect by default we would not execute cmd/cgo when building cgo programs. I think we could make that an option though (e.g., cgomode=internal/external), or at least run the current cgo regress tests both with and without cmd/cgo, if we're concerned about bit rot. |
It is indeed troublesome to write refactoring tools for cgo-using packages, but I don't see how the proposal solves the problem. Refactoring tools based on |
@alandonovan The proposal is that instead of generating intermediary Go source files, cmd/compile and go/types would consume the original .go source files directly. The refactored cmd/cgo package would provide the logic for processing the It's true that there are other tools that generate Go source code, but those produce .go files that the user is responsible for managing themselves. Cgo is distinguished from them because it's transparently handled by cmd/go, not to mention having explicit runtime and toolchain support. |
I see. While developing the loader package, I tried a similar approach, modifying go/types to accept source files in the cgo dialect, but gri convinced me that it was feature creep and that cgo should be treated as a black box. Here are some of the problems that came up:
Also, we should consider that future extensions to cgo might not be possible to support at all in go/types. |
Thanks for the list of problems. My thoughts:
It seems unlikely to me that we'd add functionality to cmd/cgo that go/types could not support at all. The source Cgo files will need to remain representable by go/ast so that cmd/gofmt can format them. If it ever becomes necessary to process additional inputs beyond the .go files for Cgo processing, we can extend go/types.Checker's API to allow callers to provide them. Lastly, for completeness: to support cgo in go/types, users will need to provide an AST with the C preamble comments, but by default go/parser discards comments. There are a couple ways to resolve this:
|
Anyone have more comments for @mdempsky? |
LGTM. I prefer option 3(ii) over 3(i). We are likely to discover several awkward corner cases as we implement this and use it across the tools, but the result will still be an improvement on the status quo. Having ParseCgo as the default would be nice, but it's fiddly because you have to select the scanner disposition towards comments before you begin. Two approaches to consider are: |
This may be interesting but it seems lower priority than just about anything else we could be doing. And doing this is going to break lots of existing cgo code in mysterious way that will require a lot of debugging. Is there something that just can't cope with the current way cgo works? It sounds like the answer is no. |
I'm also generally skeptical about making cmd/compile include cmd/cgo. That's a layering violation: cgo is a build system-issue, not a compiling Go code issue. I don't want the compiler to turn into the build system. |
Does that mean we should put SWIG (a C++ program) into cmd/compile too? |
I'm curious why you think this. My expectation is that cgo code will compile identically as before.
My motivating use case is it's currently impossible to write robust type-sensitive refactoring tools that work on Go source files that use cgo, because you have to apply go/types to the post-processed source.
I don't expect cmd/compile and go/types to take on any more responsibility for compiling C code than cmd/cgo already does internally. And x/tools/go/loader already has to run cmd/cgo.
Point taken. Though I'll note the source for SWIG bindings are .swig, not .go, so I feel like there's a lower expectation that refactoring tools should---or even need to---support them. Certainly it would be nice if gorename could rename an identifier exported by C++ and have it also update the entire C++ code base, but I don't think anyone expects that. However, I do think it would be nice if gorename would work for renaming symbols in files that happen to also use cgo. Currently, it replaces the file with the cgo-processed source. (I.e., you end up with a source file that says "Created by cgo - DO NOT EDIT" at the top.) |
For refactor tools, it just needs to know the types of each cgo symbol,
right? In that case, all it needs is to know the symbol mangling rule
of cmd/cgo and then it can figure out the necessary information from
cgo output.
|
Wow, I was not aware of that. I've filed #17839. |
@minux cgo does more invasive AST rewrites than just mangling C.bar to _Cfoo_bar. For example, it also rewrites C function calls into function literal invocations to instrument _cgoCheckPointer calls. Even if the solution is to just reconstruct the pre-cgo AST, there should be a standard library or x/tools package for handling that. Otherwise, everytime we change cgo's implementation details, we end up breaking third-party tools. |
But for a refactor tool, the cgo rewrite shouldn't matter (it's cgo
implementation
detail.) I imagine that refactor only cares that C.bar(1) is a function of
type
func (int32) (int, error), and then it should be able to handle C.bar as a
special
kind of function call and carry on (as the refactor tool should only care
about
the pre-cgo Go source files, right?)
|
@minux go/types doesn't currently support that. That's the point of this proposal. |
To be clear, the primary expected benefit is not the three in the list but instead that go/types be modified to be able to more easily analyze packages that use cgo. There's a lot of configuration that goes into the invocations of cgo. Today that configuration is partly in the environment and partly baked into the toolchain, specifically the go command. If cgo moves into a package, then that configuration has to be baked into the package too. That would mean you have to rebuild your, say, godoc just to start viewing a package with a different baked-in way to build arm binaries (say). It seems like it would be better to introduce an easy way for the go command to provide the generated files, and then have go/types invoke the go command. Then you can substitute a different go command easily (via PATH changes) without having to rebuild everything that uses go/types when this kind of thing changes. Also:
Obviously that's the goal. My experience is that this is basically impossible for any significant change to cgo. Something always breaks. Many things, usually. |
That's my motivating case, yes. I meant for that to be covered by items 2 and 3. Sorry for the ambiguity.
As far as I can tell, the information baked into cmd/go is just defaultCC, defaultCXX, and defaultPkgConfig. We could fetch them via
The problem I'm proposing to address here isn't that invoking cgo is tricky (though it is): x/tools/go/loader already supports invoking cgo as needed. The problem is when writing type-sensitive AST refactoring tools, we want to work with the original AST, not the rewritten AST. Otherwise you get issues like #17839 or mdempsky/unconvert#3. |
Another point: if we create this library then the next request will be a way to use it in other tools instead of invoking cmd/cgo. So then we have to design the library well. Cgo, for better or worse, is already designed and works. I think part of the problem with this proposal is that it specifies mechanism. It sounds like the problem you want to solve is how to make go/parser and go/types work on cgo programs in a way that allows, for example, gorename or gorefactor to edit cgo programs. I think it might make sense to shift this proposal to talk about ways to solve that problem. It may not involve cgo as a library at all. It still seems to me that cmd/cgo can be used in a solution to that problem, much more quickly than this refactoring. |
I thought some more about this on my way home. Apologies for the high latency of this conversation overall. I will try to do better. It wasn't clear to me until your last comment that the fundamental problem being addressed here is the problem of not being able to apply code rewriting tools to Go source files that import "C". Rereading the initial post I see that listed as benefit 2, but I did not appreciate that the problem was the lack of access to the original source code in order to write the source code back out. The core of this proposal is really to eliminate source-to-source translation as the implementation mechanism for defining what import "C" means in cgo inputs. Whether cgo is a library is a separate implementation detail - a binary is just a library with a command line as its API - as is the choice of which other program calls into the code in whatever form it takes. Let's put those details aside and focus on what additional functionality would have to be added to cgo to avoid a source-to-source rewrite. It seems to me that if cgo can't rewrite the code, it would instead need to write out a definition of the "C" package that answers whatever questions the compiler might have; the compiler would consult that definition as needed. Cgo would still need to write out _cgo_gotypes.go with the wrappers and type definitions; it just wouldn't need to write out modified copies of the original sources (x.cgo1.go for x.go). For example, cgo might write out a file, say, _cgo_defs.txt. Then the go command could invoke the compiler with -cgodefs _cgo_defs.txt to pass the cgo definitions in. Similarly, whatever x/tools/go/loader does to invoke cgo today it could still roughly do but expect to read _cgo_defs.txt instead of loading modified source files. The next question is what _cgo_defs.txt looks like. A simple but extensible choice would be to define a Go struct
With that extra information, the compiler and go/types could both work on the original source files. Cgo could keep writing the *.cgo1.go files for a while, to let older tools or build systems continue to use them. What I've described handles everything in cgo except the new pointer check calls that @ianlancetaylor added. Those would need to be recreated in the compiler, but of course the compiler is in a better position to write them, and go/types need not care. If you want to move forward with this, my suggestion would be:
The key part of the original proposal was the ability to have cmd/compile and go/types load unmodified source files and do the "translation" themselves. The _cgo_defs.txt file accomplishes that without significant changes to cgo itself or to the build process. The specific format of _cgo_defs.txt must be defined, but that API would have had to have been defined for "cgo as a library" too (along with a lot of other work that _cgo_defs.txt avoids). In fact, you might actually get away with not having _cgo_defs.txt at all. You could instead just hard-code in the compiler the idea that after import "C", C.name is resolved by looking for _Cconst_name or _Ctype_name or _Cfunc_name or _C2func_name. That seems fine to me, especially for a quick prototype. Then the only change to the build process is to invoke the compiler on the original x.go instead of x.cgo1.go. Another possibility would be some kind of special comment on the definitions already in _cgo_gotypes.go.
Whether that makes more sense depends a bit on what is best for x/tools/go/loader. Cgo could always write the comments for the compiler and the _cgo_defs.txt file for go/loader, I suppose. The most important point is that we should focus on what additional functionality is needed from cgo instead of whether it is wrapped up in a library or a separate binary. The functionality can be added and used either way. (And if we leave it as a separate binary, that probably creates less work overall.) |
That's an interesting approach. I just want to clarify that the name translation is in fact context-specific, because of course Also, an example of a case where we need to know more than just the translated name is union types; in order to generate the cgo checks correctly to address issue #15942, we need to know whether the union type can contain a pointer. These notes don't invalidate the basic idea, but they do complicate it. |
To be clear, after invoking cgo, all that x/tools/go/loader currently does with the modified source files is:
If we simply parse the original files instead of the cgo-rewritten files, that would seem to require that go/types needs to process the _cgo_defs.txt file instead. So unless I'm misunderstanding your proposal, it seems to require one or more of:
I've been advocating for 1 (as an internal library instead, but as you point out this is an implementation detail to be considered separately). 2 seems acceptable too, but means committing to a public API between cmd/cgo and go/types. And if we're going to do that, it seems simpler to just make go/types invoke cmd/cgo instead (i.e., approach 1). 3 seems undesirable to me for maintainability. Do you support any of these three approaches? Or did you have any alternatives in mind? |
That would likely also make #13467 more feasible to address, which may be relevant to the cost/benefit analysis here. |
If I understand correctly, the intention is to potentially revisit that restriction (see #13467). From @rsc: #13467 (comment)
|
Oh, I probably spoke too early. #13467 is about allowing C.bar to refer to the same C type across different Go packages. Thus, thus is not at odds with the restriction you pointed out. |
Change https://golang.org/cl/77153 mentions this issue: |
Cgo has always operated by rewriting the AST and invoking go/printer. This CL converts it to use the AST to make decisions but then apply its edits directly to the underlying source text. This approach worked better in rsc.io/grind (used during the C to Go conversion) and also more recently in cmd/cover. It guarantees that all comments and line numbers are preserved exactly. This eliminates a lot of special concern about comments and problems with cgo not preserving meaningful comments. Combined with the CL changing cmd/cover to use the same approach, it means that the combination of applying cgo and applying cover still guarantees all comments and line numbers are preserved exactly. This sets us up to fix some cgo vs cover bugs by swapping the order in which they run during the go command. This also sets up #16623 a bit: the edit list being accumulated here is nearly exactly what you'd want to pass to the compiler for that issue. Change-Id: I7611815be22e7c5c0d4fc3fa11832c42b32c4eb3 Reviewed-on: https://go-review.googlesource.com/77153 Reviewed-by: Ian Lance Taylor <iant@golang.org>
@mdempsky Just curious if you think this will make it into Go1.11? |
@joegrasse Sorry, no. CL 33677 is still unsubmitted. |
@mdempsky Is there a plan to submit it for Go1.12? |
This CL adds a UsesCgo config setting to go/types to specify that the _cgo_gotypes.go file generated by cmd/cgo has been provided as a source file. The type checker then internally resolves C.bar qualified identifiers to _Cfoo_bar as appropriate. It also adds support to srcimporter to automatically run cgo. Unfortunately, this functionality is not compatible with overriding OpenFile, because cmd/cgo and gcc will directly open files. Updates #16623. Updates #35721. Change-Id: I1e1965fe41b765b7a9da3431f2a86cc16025dee2 Reviewed-on: https://go-review.googlesource.com/c/go/+/33677 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
Change https://golang.org/cl/231459 mentions this issue: |
(Reland of golang.org/cl/33677.) This CL adds a UsesCgo config setting to go/types to specify that the _cgo_gotypes.go file generated by cmd/cgo has been provided as a source file. The type checker then internally resolves C.bar qualified identifiers to _Cfoo_bar as appropriate. It also adds support to srcimporter to automatically run cgo. Unfortunately, this functionality is not compatible with overriding OpenFile, because cmd/cgo and gcc will directly open files. Updates #16623. Updates #35721. Change-Id: Ib179d55c8c589916f98ceeae0b9a3e746157253a Reviewed-on: https://go-review.googlesource.com/c/go/+/231459 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
Change https://golang.org/cl/234526 mentions this issue: |
This CL adds a UsesCgo config setting to go/types to specify that the _cgo_gotypes.go file generated by cmd/cgo has been provided as a source file. The type checker then internally resolves C.bar qualified identifiers to _Cfoo_bar as appropriate. It also adds support to srcimporter to automatically run cgo. Unfortunately, this functionality is not compatible with overriding OpenFile, because cmd/cgo and gcc will directly open files. Updates golang#16623. Updates golang#35721. Change-Id: I1e1965fe41b765b7a9da3431f2a86cc16025dee2 Reviewed-on: https://go-review.googlesource.com/c/go/+/33677 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
(Reland of golang.org/cl/33677.) This CL adds a UsesCgo config setting to go/types to specify that the _cgo_gotypes.go file generated by cmd/cgo has been provided as a source file. The type checker then internally resolves C.bar qualified identifiers to _Cfoo_bar as appropriate. It also adds support to srcimporter to automatically run cgo. Unfortunately, this functionality is not compatible with overriding OpenFile, because cmd/cgo and gcc will directly open files. Updates golang#16623. Updates golang#35721. Change-Id: Ib179d55c8c589916f98ceeae0b9a3e746157253a Reviewed-on: https://go-review.googlesource.com/c/go/+/231459 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
Change https://golang.org/cl/237417 mentions this issue: |
This API and functionality was added late in the Go 1.15 release cycle, and use within gopls has revealed some shortcomings. It's possible (but not decided) that we'll want a different API long-term, so for now this CL renames UsesCgo to a non-exported name to avoid long-term commitment under the Go 1 compat guarantee. Updates #16623. Updates #39072. Change-Id: I04bc0c161a84adebe43e926df5df406bc794c3db Reviewed-on: https://go-review.googlesource.com/c/go/+/237417 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>
Change https://golang.org/cl/237423 mentions this issue: |
Change https://golang.org/cl/237422 mentions this issue: |
Updates golang/go#16623. Updates golang/go#39072. Change-Id: I6612cdac14faa5ad724ceb805e1d6c823b4046ff Reviewed-on: https://go-review.googlesource.com/c/tools/+/237423 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>
Sorry for digging out this old issue... I just wanted to note that maybe the priority of this issue is underestimated. Google is developing Carbon just because of interop with existing language, which confirms that this is something very important. At the same time, Go C bindings after many years are still kind of painful to use, also due to #975, which has been around for years and keeps waiting for this very issue to be implemented. Migration of existing applications to Go really could use some more endorsement, so it might be an option to reconsider the priority of this issue. |
Currently cgo support is implemented with cmd/cgo as a separate tool that analyzes cgo-using Go code and transforms it into standard Go code. This proposal is to extract the analysis logic into a separate internal package that can be reused by cmd/compile and go/types directly without needing source rewrites. I.e., turn them into native cgo compilers, rather than needing a separate "cgofront" preprocessor step.
The expected benefits are:
Potential downsides and counter-arguments:
Alternative ideas that might achieve similar benefits while generalizing to tools other than cmd/cgo:
/cc @ianlancetaylor @alandonovan @griesemer
The text was updated successfully, but these errors were encountered: