-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
embed: remove support for embedding directives on local variables #43216
Comments
I'm torn. I made a demo repo to play around with embedding. Here's a function to toggle between using an os.DirFS and an embed.FS at runtime: func getFileSystem(useOS bool) http.FileSystem {
var fsys fs.FS
if useOS {
log.Print("using live mode")
fsys = os.DirFS("static")
} else {
log.Print("using embed mode")
var (
//go:embed static
files embed.FS
err error
)
fsys, err = fs.Sub(files, "static")
if err != nil {
panic(err)
}
}
return http.FS(fsys)
} Having to move In the end, I could be convinced otherwise, but a restriction is easier to loosen by go.mod directive in Go 1.16+ than it would be to add a restriction later, so I gave the 👍 . |
One more thought. It's nice to be able to include version.txt in a file. But version.txt might have a trailing newline. So my version.go example in the demo repo is: var (
Version string = strings.TrimSpace(version)
//go:embed version.txt
version string
) What would be really nice would be if you could write: //go:embed version.txt
var Version string = strings.TrimSpace(_) But it's not clear how something like that could actually work syntactically. An alternative that would work is: var Version = func() string {
//go:embed version.txt
var version embed.String
return strings.TrimSpace(version)
}() Again, it would be a shame to lose the convenience of the function version just because |
@carlmjohnson, thanks for the examples. The rewrites would of course be:
In both cases, the code does not seem noticeably worse to me, and in fact I think I like being able to see more clearly where the embedding uses are. "Clear is better than clever" and all that. |
The other option would be to remove support for []byte, leaving only support for string. That makes clear that it's an immutable value. And it's very easy to create the []byte if needed (just one line). And it preserves the ability to use function-local embeds, which I think is a great property to preserve, given that these values are all immutable for the lifetime of the process. |
At the risk of cluttering this issue with a lot of waffling, I am tentatively moving my 👍 to @ugorji's proposal to remove embed.Bytes. Here's another bit of demo code: var S = func() (s struct {
Number float64
Weather string
Alphabet []string
}) {
//go:embed value.gob
var b []byte
dec := gob.NewDecoder(bytes.NewReader(b))
if err := dec.Decode(&S); err != nil {
panic(err)
}
return
}() The demo was supposed to illustrate that S is some struct that is difficult to construct, so you generate it somehow and write it out to a gob to read in on initialization. When I wrote my demo code, I wondered if I should use Here is what I should have done: var S = func() (s struct {
Number float64
Weather string
Alphabet []string
}) {
//go:embed value.gob
var es embed.String
dec := gob.NewDecoder(strings.NewReader(es))
if err := dec.Decode(&s); err != nil {
panic(err)
}
return
}() And here's what it looks like without function locals (IMO, this is less clear): //go:embed value.gob
var b embed.Bytes
var S = func() (s struct {
Number float64
Weather string
Alphabet []string
}) {
dec := gob.NewDecoder(bytes.NewReader(b))
if err := dec.Decode(&s); err != nil {
panic(err)
}
return
}() |
I don't think the compiler can optimize them away, in the most common cases - like writing them to an So, I absolutely think that using
FTR, I don't think there really is any advantage to this, in and off itself. It would only be an advantage for |
This seems like a tricky case. It's nice to be able to embed local variables to avoid polluting the package's namespace, but local []byte is confusing/error-prone. You can kind of solve this by disallowing embedded []byte, so you get local variables without the mutability issue, but then you lose the clarity of saying "this data is a slice of bytes explicitly". That is, it would be somewhat confusing to readers if I embedded an executable binary as a string, since its not meant to be read by humans. So it seems like what we're doing is deciding if:
is more compelling. Personally my main use cases for //go:embed are embedding executables in my go binary (weird I know) and embedding templates. I think embedding templates is going to be far and away more common than embedding []byte, but I think []byte is going to be fairly common itself, and I don't care too much about being able to embed local variables, so my preference is that we disallow local variables, as in the original comment. |
If you do this at package scope, it should be just as good and let us deal with the removal of embed.Bytes, no? var filedata = func() []byte {
//go:embed file.txt
var s embed.String
return []byte(s)
}() ISTM, the dilemma is you can’t have all three of 1) embed.Bytes 2) local embeds and 3) consistent rules. Of the three, 1 is the best to jettison because it is only sorta convenient but 2 is extremely convenient. (And 3 is just a good language property in general.) |
@carlmjohnson if you don't mind me asking, why do you characterize local variables embeds as "extremely convenient"? While I don't like having things be unnecessarily global, I wouldn't go so far as to call it extremely convenient personally. Is it that you find global namespace pollution really irksome? or is there another issue at play? |
@carlmjohnson I disagree with "just as good" (that's very awkward code) but I agree it performs the same. But I think of all the options brought up so far, that's my least favorite:
TBH the only thing removing |
I think it's a mistake to optimize for what is conceptually a convenience function. This is what I mean. Data embedded in a binary is by definition, static and immutable. Consequently, per go's types, the 2 types that should exist at a minimum are:
With this minimum, the question of restricting where they are defined doesn't come up at all. There are immutable, and thus can exist at a package global level, or within any function/method/function variable, even init(). Folks can then build whatever they want above this. Go prefers to have a few orthogonal features that people can easily build atop, eschewing multiple ways of doing the same thing. Adding embed.Bytes ,which, by definition, is not an immutable sequence of bytes, causes more questions than answers (what happens if someone modifies the global variable, what happens if used in a loop, what happens if used in a recursive function, etc). These questions spurred this late proposal very late in the cycle. The answer (remove a feature and consequently put more work on the programmers) has a clear cost. And for a minimum convenience, as a 1-liner can give the same convenience. package xyz
//go:embed file.txt
var s embed.String
func sBytes() { return []byte(s) } // even gofmt keeps this as a 1-liner
func doSomething() {
//go:embed file.txt
var s embed.String
b := []byte(s) // 1-liner
use(b)
} In my mind, we have better options, if we start with the minimum requirements that
If we want, we might add a convenience use-case along the lines below (which I am ok with, but is clearly not necessary for performance or ease of programming as shown above):
I think these rules are clear, easy to understand, and fit into the expectation that go programmers expect. My 2 cents, in full alignment with @carlmjohnson comments above. |
@ugorji I read your proposal, but I was hoping you could explain a bit more what is the real-world downside of optimizing for a convenient api here? And what is the real-world upside of supporting local variables? |
@hherman1 optimize may have been the wrong term to use. Let me clarify. I think it's a mistake to remove support for local variables because we want support for the mutable use-case ([]byte) to be consistent with immutable use-cases (string, fs.FS), especially since that support is a convenience that can easily be replicated by a 1-liner. Downside of optimizing for the convenient API is this proposal - that we lose support for local variables. Upside of local variables is that you do not pollute global variables unnecessarily due to a limitation of the embed support, so we write and organize code naturally. Middle ground compromise is option (b) is @rsc proposal above, that each declaration of embed.Bytes results in a copy just like []byte(string) conversion does. This intuitively makes sense to users, and can easily be documented in embed.Bytes documentation i.e. equivalent to []byte(string) conversion. Furthermore, it resolves further issues as below: package xyz
//go:embed file1.txt
var s1 embed.Bytes
// ... many lines of code goes here
//go:embed file1.txt
var s2 embed.Bytes Imagine All of these questions can be resolved if we just do a copy as needed, and the answer is intuitive i.e. each variable declaration is like a call to []byte(string). |
Is there a trap here for users? Imagine I’m embedding some decently large template file and I take advantage of the local variable embed support.. how big is the performance penalty? In practice is everyone just going to use global anyways to avoid the cost of copying around larger files on every invocation of a function? |
@ugorji Just to re-state the obvious:
The ability to embed into a
Not to be nitpicky, but so is source code. And you can still write
None of these (except the first, which a) has an obvious answer and b) also arises for
This is circular logic. You are making "
I want to go on record, that I don't want mutability, I want allocation-freedom. Mutability is just the natural (but, in conjunction with local variables, surprising) consequence of using a
Assuming you meant to write So, yes, you are correct that without To put it another way: Assuming local variables where not currently supported by |
This comment has been minimized.
This comment has been minimized.
@ugorji It sounds like you are suggesting that the go tool should permit using |
I tend to agree that if you want a
and you can do that in either local or global scope. The only downside I see here is that there's a copy at startup to initialize So I'm in favor of removing the ability to embed to Disallowing locals would be my second choice. Definitely let's not allow local-scope |
This comment has been minimized.
This comment has been minimized.
Yes, that's my preferred position. |
I agree that there is no strong reason to prefer one of these directions over the other — so why not both? I think we should remove both local-variable embeds and That would give us the option to add support for either local variables or |
I think the two code snippets in #43216 (comment) are only equivalent if the function |
@ianlancetaylor cmd/compile desugars package-block N:N initializations into N individual initializations before applying the package-initialization sorting algorithm. If gccgo does something different that leads to observably different behavior, I think that merits an issue to discuss resolving the inconsistency. |
@mdempsky You're right. |
I prefer one way to two ways to do something. "Compiler optimizations are possible sometimes" is a good support point for "one way to do something", I think I would use more immutable embedded data than mutable one in practice. |
Sorry, I didn't read the So now my opinion agaist two ways to do embedding is not strong as before, BTW, is it possible to support declaring |
It is possible, but const declaration syntax requires an initializer expression (unlike variable declarations). Legacy type checkers like go/types would mistakenly use this initializer as the const value instead of the actual string. I'd imagine that in any scenario where it's useful to have the embedded string as a constant, this would cause problems. |
Based on the discussion above, this seems like a likely accept. |
It is still my preference to remove embed.Bytes and tell users who need to a slice of bytes to either explicitly allocate it from a string in their function or do a conversion as a package level var and reuse that, but I am happier for something to be done than nothing. 👍 |
I don’t recall seeing this specific argument in favor of removing embed.Bytes above: I’m thinking of writing a blog post to introduce file embedding. If this proposal is accepted, I need to add a sentence or two saying “only use embedding at the top level”. On the other hand, if embed.Bytes is removed, there is nothing to for me to explain because I can assume my readers already know about bytes vs strings etc. Having fewer things to explain is better! |
@carlmjohnson I don't feel that's a significant difference though. It amounts to adding a "package-scoped" and an "embed.Bytes" respectively - plus embellishments and redundancies. I'd also argue that there is some anchoring bias - you have already written the version without this proposal and I wouldn't discount the possibility that the article could end up shorter starting from scratch with it (though, again, I don't think the difference is actually significant either way). |
I initially voted thumbs-up for this proposal, before I thought that maybe it is better to think of a different option beyond those listed above. I think others that voted yes may pick a different option if more were defined at the beginning. One of go's design principles is orthogonality of concepts. IMO, the minimum MVP for embedded is: provide read-only semantics at runtime to static data/assets available at compile time, either via a collection (embed.FS) or for a single asset (string). With this, the question of package level vs function-level might not come up, as read-only things (similar to const) can be safely used and passed around anywhere in code. This issue arises only because of support for []byte, which does not provide read-nly semantics, and consequently raises many questions regardless of function-level or package-scope. For example, if multiple package-level definitions occur for the same asset, will they share same memory? This question is not fundamentally different from the questions raised in @rsc issue description regarding function-scoped variables, with options a vs b vs c i.e. we could share array, or make new copy for each defined var, or make array non-writable. Yet d is not an option here, because that means no embedding support. In summary, I prefer string and embed.FS at all scopes, and am ambivalent regarding bytes support (which can be explained easily because bytes do not provide read-only semantics and introduce further questions). Furthermore, I think b is a valid option for bytes support, regardless of the scope, and something that go users will intuitively understand. Having said that, I am comfortable with whatever final decision is made by the core go team, |
The usage of "MVP" seems to imply to me, that the MVP should be possible to extend in the future. In that case, this would be an argument to actually remove both
To re-iterate: Saying this is just as correct as saying "the issue arises only because of local variable embeds". The issue is that the two are incompatible - putting the fault for that on one or the other is misleading.
It seems pretty obvious to me, that the semantics should be "each Note that, unlike for local variable embeds, this doesn't cause follow-up issues with package-scoped variables. There is no performance-problem, because package-scoped variables are only initialized once, whereas block-scoped variables are initialized every time they are executed. |
Removing both is the conservative choice, and it’s probably my second favorite after just removing embed.Bytes, but I don’t think we’re likely to learn much from real world experience that will add to the discussion about whether to add embed.Bytes or local vars in 1.17. I could be wrong, but my feeling is that we’ve pretty much talked it out and it’s up to the Go team to weigh pros and cons. |
There are a few comments but no change in consensus here. Accepted. |
No change in consensus, so accepted. 🎉 |
Change https://golang.org/cl/282714 mentions this issue: |
Change https://golang.org/cl/288072 mentions this issue: |
//go:embed variables can be type aliases. //go:embed variables can't be local to a function. For #43216 For #43602 Fixes #43978 Change-Id: Ib1d104dfa32b97c91d8bfc5ed5d461ca14da188f Reviewed-on: https://go-review.googlesource.com/c/go/+/288072 Trust: Ian Lance Taylor <iant@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
@mdempsky points out in #41191 (comment) that the late addition of string and []byte to the embedding proposal has some unfortunate side effects that we might want to make sure we are happy with.
There are three details of string and []byte embedding that are at least surprising. This issue is one of them. See #43217 for the other two.
The original design had only embedding in globals and only embed.FS. When I circulated the draft privately to a few people who had written embedding tooling, one person asked about local variables, and it seemed easy to add, so I did. (This was fine because the embed.FS was immutable, so it was really just a question of variable scope, not semantics.)
When we had the initial public discussions before making a formal proposal, many people asked for string and []byte. Those too seemed easy to add, so I did. But the two different additions interact poorly, because they created the intersection "function-local, mutable embedded data".
It seems like there are three options for how that would work:
(a) the []byte data is shared by all invocations of the function.
(b) the []byte data is freshly allocated (copied) on each call to the function.
(c) the []byte data is magically unwritable, causing a panic if written
Right now the behavior is (a), but I did not do that intentionally. It is difficult to explain to users, because it differs from the way every other local variable behaves. It seems like a clear bug.
The clearest alternative is (b): when the declaration statement in a function is executed, it is initialized with a fresh copy of the data instead of the actual data. That is less surprising than (a) but potentially very expensive. That's at least not a bug like (a) but certainly a performance surprise that would be good to avoid.
I listed (c) for completeness (mmap the data read-only) but it's not really on the table: we've considered the idea of that kind of data in the past in other, more compelling contexts and decided against it. This case is not nearly special enough to warrant breaking the data model by introducing "unwritable slices" into the core of the language.
That leads me to “(d) none of the above,” which I think is the right answer, not just for Go 1.16 but generally.
The string and []byte functionality seems important ergonomically. There are plenty of times when you just want to embed a single file as data, and having to go through the embed.FS machinery to get just a single string or []byte is a lot more work.
On the other hand, the function-local variable functionality seems much less important ergonomically. It's always easy to move the two lines (//go:embed and var declaration) up above the function.
So if these two can't coexist, the choice seems clear: string and []byte support is doing real work, while function-local variables are not. The underlying data is essentially a global anyway. Writing all embedded data as globals makes it very clear for []byte variables that there's only one instance of the data (and that there's no implicit copying either).
So that's what I suggest: remove support for embedding in local variables. It's easy to put the embedded variables just above the function that needs them, and then it's very clear that they are globals, there are no aliasing surprises, and so on.
What do people think? (Thumbs up / thumbs down is fine.)
Thank you!
The text was updated successfully, but these errors were encountered: