Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/go: support embedding static assets (files) in binaries #35950

Closed
bradfitz opened this issue Dec 3, 2019 · 176 comments
Closed

proposal: cmd/go: support embedding static assets (files) in binaries #35950

bradfitz opened this issue Dec 3, 2019 · 176 comments

Comments

@bradfitz
Copy link
Contributor

bradfitz commented Dec 3, 2019

There are many tools to embed static asset files into binaries:

Actually, https://tech.townsourced.com/post/embedding-static-files-in-go/ lists more:

Proposal

I think it's time to do this well once & reduce duplication, adding official support for embedding file resources into the cmd/go tool.

Problems with the current situation:

  • There are too many tools
  • Using a go:generate-based solution bloats the git history with a second (and slightly larger) copy of each file.
  • Not using go:generate means not being go install-able or making people write their own Makefiles, etc.

Goals:

  • don't check in generated files
  • don't generate *.go files at all (at least not in user's workspace)
  • make go install / go build do the embedding automatically
  • let user choose per file/glob which type of access is needed (e.g. []byte, func() io.Reader, io.ReaderAt, etc)
  • Maybe store assets compressed in the binary where appropriate (e.g. if user only needs an io.Reader)? (edit: but probably not; see comments below)
  • No code execution at compilation time; that is a long-standing Go policy. go build or go install can not run arbitrary code, just like go:generate doesn't run automatically at install time.

The two main implementation approaches are //go:embed Logo logo.jpg or a well-known package (var Logo = embed.File("logo.jpg")).

go:embed approach

For a go:embed approach, one might say that any go/build-selected *.go file can contain something like:

//go:embed Logo logo.jpg

Which, say, compiles to:

func Logo() *io.SectionReader

(adding a dependency to the io package)

Or:

//go:embedglob Assets assets/*.css assets/*.js

compiling to, say:

var Assets interface{
     Files() []string
     Open func(name string) *io.SectionReader
} = runtime.EmbedAsset(123)

Obviously this isn't fully fleshed out. There'd need to be something for compressed files too that yield only an io.Reader.

embed package approach

The other high-level approach is to not have a magic //go:embed syntax and instead just let users write Go code in some new "embed" or "golang.org/x/foo/embed" package:

var Static = embed.Dir("static")
var Logo = embed.File("images/logo.jpg")
var Words = embed.CompressedReader("dict/words")

Then have cmd/go recognize the calls to embed.Foo("foo/*.js") etc and glob do the work in cmd/go, rather than at runtime. Or maybe certain build tags or flags could make it fall back to doing things at runtime instead. Perkeep (linked above) has such a mode, which is nice to speed up incremental development where you don't care about linking one big binary.

Concerns

  • Pick a style (//go:embed* vs a magic package).
  • Block certain files?
    • Probably block embedding ../../../../../../../../../../etc/shadow
    • Maybe block reaching into .git too
@gopherbot gopherbot added this to the Proposal milestone Dec 3, 2019
@ianlancetaylor
Copy link
Member

It's worth considering whether embedglob should support a complete file tree, perhaps using the ** syntax supported by some Unix shells.

@ghost
Copy link

ghost commented Dec 4, 2019

Some people would need the ability to serve the embedded assets with HTTP using the http.FileServer.

I personally use either mjibson/esc (which does that) or in some cases my own file embedding implementation which renames files to create unique paths and adds a map from the original paths to the new ones, e.g. "/js/bootstrap.min.js": "/js/bootstrap.min.827ccb0eea8a706c4c34a16891f84e7b.js". Then you can use this map in the templates like this: href="{{ static_path "/css/bootstrap.min.css" }}".

@cespare
Copy link
Contributor

cespare commented Dec 4, 2019

I think a consequence of this would be that it would be nontrivial to figure out what files are necessary to build a program.

The //go:embed approach introduces another level of complexity too. You'd have to parse the magic comments in order to even typecheck the code. The "embed package" approach seems friendlier to static analysis.

(Just musing out loud here.)

@bradfitz
Copy link
Contributor Author

bradfitz commented Dec 4, 2019

@opennota,

would need the ability to serve the embedded assets with HTTP using the http.FileServer.

Yes, the first link above is a package I wrote (in 2011, before Go 1) and still use, and it supports using http.FileServer: https://godoc.org/perkeep.org/pkg/fileembed#Files.Open

@bradfitz
Copy link
Contributor Author

bradfitz commented Dec 4, 2019

@cespare,

The //go:embed approach introduces another level of complexity too. You'd have to parse the magic comments in order to even typecheck the code. The "embed package" approach seems friendlier to static analysis.

Yes, good point. That's a very strong argument for using a package. It also makes it more readable & documentable, since we can document it all with regular godoc, rather than deep in cmd/go's docs.

@agnivade
Copy link
Contributor

agnivade commented Dec 4, 2019

@bradfitz - Do you want to close this #3035 ?

@bradfitz
Copy link
Contributor Author

bradfitz commented Dec 4, 2019

@agnivade, thanks for finding that! I thought I remembered that but couldn't find it. Let's leave it open for now and see what others think.

@balasanjay
Copy link
Contributor

If we go with the magic package, we could use the unexported type trick to ensure that callers pass compile-time constants as arguments: https://play.golang.org/p/RtHlKjhXcda.

(This is the strategy referenced here: https://groups.google.com/forum/#!topic/golang-nuts/RDA9Hag8RZw/discussion)

@AlexRouSg
Copy link
Contributor

One concern I have is how would it hanle invividual or all assets being too big to fit into memory and whether there would be maybe a build tag or per file access option to choose between pritorizing access time vs memory footprint or some middle ground implementation.

@urandom
Copy link

urandom commented Dec 4, 2019

the way i've solved that problem (because of course i also have my own implementation :) ) is to provide an http.FileSystem implementation that serves all embedded assets. That way, you don't to rely on magic comments in order to appease the typechecker, the assets can easily be served by http, a fallback implementation can be provided for development purposes (http.Dir) without changing the code, and the final implementation is quite versatile, as http.FileSystem covers quite a bit, not only in reading files, but listing directories as well.

One can still use magic comments or whatever to specify what needs to be embedded, though its probably easier to specify all the globs via a plain text file.

@ianlancetaylor
Copy link
Member

@AlexRouSg This proposal would only be for files which are appropriate to include directly in the final executable. It would not be appropriate to use this for files that are too big to fit in memory. There's no reason to complicate this tool to handle that case; for that case, just don't use this tool.

@bradfitz
Copy link
Contributor Author

bradfitz commented Dec 4, 2019

@ianlancetaylor, I think the distinction @AlexRouSg was making was between having the files provided as global []bytes (unpageable, potentially writable memory) vs providing a read-only, on-demand view of an ELF section that can normally live on disk (in the executable), like via an Open call that returns an *io.SectionReader. (I don't want to bake in http.File or http.FileSystem into cmd/go or runtime... net/http can provide an adapter.)

@urandom
Copy link

urandom commented Dec 4, 2019

@bradfitz both http.File itself is an interface with no technical dependencies to the http package. It might be a good idea for any Open method to provide an implementation that conforms to that interface, because both the Stat and Readdir methods are quite useful for such assets

@bradfitz
Copy link
Contributor Author

bradfitz commented Dec 4, 2019

@urandom, it couldn't implement http.FileSystem, though, without referring to the "http.File" name (https://play.golang.org/p/-r3KjG1Gp-8).

@rsc
Copy link
Contributor

rsc commented Dec 4, 2019

@robpike and I talked through a proposal for doing this years ago (before there was a proposal process) and never got back to doing anything. It's been bugging me for years that we never finished doing that. The idea as I remember it was to just have a special directory name like "static" containing the static data and automatically make them available through an API, with no annotations needed.

I'm not convinced about the complexity of a "compressed vs not" knob. If we do that, then people will want us to add control over which compression, compression level, and so on. All we should need to add is the ability to embed a file of plain bytes. If users want to store compressed data in that file, great, the details are up to them and there's no API needed on Go's side at all.

@jayconrod
Copy link
Contributor

A couple thoughts:

  • It should not be possible to embed any file outside the module doing the embedding. We need to make sure files are part of module zip files when we create them, so that also means no symbolic links, case conflicts, etc. We can't change the algorithm that produces zip files without breaking sums.
  • I think it's simpler to restrict embedding to be in the same directory (if //go:embed comments are used) or a specific subdirectory (if static is used). This makes it a lot easier to understand the relationship between packages and embedded files.

Either way, this blocks embedding /etc/shadow or .git. Neither can be included in a module zip.

In general, I'm worried about expanding the scope of the go command too much. However, the fact that there are so many solutions to this problem means there probably ought to be one official solution.

I'm familiar with go_embed_data and go-bindata (of which there are several forks), and this seems to cover those use cases. Are there any important problems the others solve that this doesn't cover?

@DeedleFake
Copy link

DeedleFake commented Dec 4, 2019

Blocking certain files shouldn't be too hard, especially if you use a static or embed directory. Symlinks might complicate that a bit, but you can just prevent it from embedding anything outside of the current module or, if you're on GOPATH, outside of the package containing the directory.

I'm not particularly a fan of a comment that compiles to code, but I also find the pseudo-package that affects compilation to be a bit strange as well. If the directory approach isn't used, maybe it might make a bit more sense to have some kind embed top-level declaration actually built into the language. It would work similarly to import, but would only support local paths and would require a name for it to be assigned to. For example,

embed ui "./ui/build"

func main() {
  file, err := ui.Open("version.txt")
  if err != nil {
    panic(err)
  }
  version, err = ioutil.ReadAll(file)
  if err != nil {
    panic(err)
  }
  file.Close()

  log.Printf("UI Version: %s\n", bytes.TrimSpace(version))
  http.ListenAndServe(":8080", http.EmbeddedDir(ui))
}

Edit: You beat me to it, @jayconrod.

@josharian
Copy link
Contributor

To expand on #35950 (comment), there is a puzzle about the exposed API. The obvious ways to expose the data are []byte, string, and Read-ish interfaces.

The typical case is that you want the embedded data to be immutable. However, all interfaces exposing []byte (which includes io.Reader, io.SectionReader, etc.) must either (1) make a copy, (2) allow mutability, or (3) be immutable despite being a []byte. Exposing the data as strings solves that, but at the cost of an API that will often end up requiring copying anyway, since lots of code that consumes embedded files eventually requires byte slices one way or another.

I'd suggest route (3): be immutable despite being a []byte. You can enforce this cheaply by using a readonly symbol for the backing array. This also lets you safely expose the same data as a []byte and a string; attempts to mutate the data will fail. The compiler can't take advantage of the immutability, but that's not too great of a loss. This is something that toolchain support can bring to the table that (as far as I know) none of the existing codegen packages do.

(A third party codegen package could do this by generating a generic assembly file containing DATA symbols that are marked as readonly, and then short arch-specific assembly files exposing those symbols in the form of strings and []bytes. I wrote CL 163747 specifically with this use case in mind, but never got around to integrating it into any codegen packages.)

@DeedleFake
Copy link

I'm unsure what you're talking about in terms of immutability. io.Reader already enforces immutability. That's the entire point. When you call Read(buf), it copies data into the buffer that you provided. Changing buf after that has zero effect on the internals of the io.Reader.

@bradfitz
Copy link
Contributor Author

bradfitz commented Dec 4, 2019

I agree with @DeedleFake. I don't want to play games with magic []byte array backings. It's okay to copy from the binary into user-provided buffers.

@gdamore
Copy link

gdamore commented Dec 4, 2019

Just another wrinkle here -- I have a different project which uses DTrace source code (embedded). This is sensitive to differences between \n and \r\n. (We can argue whether this is a dumb thing in DTrace or not -- that's beside the point and it is the situation today.)

It's super useful that backticked strings treat both as \n regardless of how they appear in source, and I rely on this with a go-generate to embed the DTrace.

So if there is an embed file added to the go command, I would gently suggest that options to change the handling of CR/CRLF might come in very handy, particularly for folks who might be developing on different systems where the default line endings can be a gotcha.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2019

Like with compression, I'd really like to stop at "copy the file bytes into the binary". CR/CRLF normalization, Unicode normalization, gofmt'ing, all that belongs elsewhere. Check in the files containing the exact bytes you want. (If your version control can't leave them alone, maybe check in gzipped content and gunzip them at runtime.) There are many file munging knobs we could imagine adding. Let's stop at 0.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2019

It may be too late to introduce a new reserved directory name, as much as I'd like to.
(It wasn't too late back in 2014, but it's probably too late now.)
So some kind of opt-in comment may be necessary.

Suppose we define a type runtime.Files. Then you could imagine writing:

//go:embed *.html (or static/* etc)
var files runtime.Files

And then at runtime you just call files.Open to get back an interface { io.ReadSeeker; io.ReaderAt } with the data. Note that the var is unexported, so one package can't go around grubbing in another package's embedded files.

Names TBD but as far as the mechanism it seems like that should be enough, and I don't see how to make it simpler. (Simplifications welcome of course!)

@rsc
Copy link
Contributor

rsc commented Dec 4, 2019

Whatever we do, it needs to be possible to support with Bazel and Gazelle too. That would mean having Gazelle recognize the comment and write out a Bazel rule saying the globs, and then we'd need to expose a tool (go tool embedgen or whatever) to generate the extra file to include in the build (the go command would do this automatically and never actually show the extra file). That seems straightforward enough.

@gdamore
Copy link

gdamore commented Dec 4, 2019

If various munging won't do the trick, then that's an argument against using this new facility. It's not a stopper for me -- I can use go generate like I've been doing, but it means I cannot benefit from the new feature.

With respect to munging in general -- I can imagine a solution where someone provides an implementation of an interface (something like a Reader() on one side, and something to receive the file on the other -- maybe instantianted with an io.Reader from the file itself) -- which the go cmd would build and run to prefilter the file before embedding. Then folks can provide whatever filter they want. I imagine some folks would provide quasi-standard filters like a dos2unix implementation, compression, etc. (Maybe they should be chainable even.)

I guess there'd have to be an assumption that whatever the embedded processor is, it must be compilable on ~every build system, as go would be building a temporary native tool for this purpose.

@magical
Copy link
Contributor

magical commented Dec 4, 2019

It may be too late to introduce a new reserved directory name, as much as I'd like to. [...] some kind of opt-in comment may be necessary.

If the files are only accessible through a special package, say runtime/embed, then importing that package could be the opt-in signal.

@TheMightyGit
Copy link

TheMightyGit commented Jul 6, 2020

As we already have a way of injecting (albeit limited) data into a build via the existing ldflags link flag -X importpath.name=value, could that code path be adjusted to accept -X importpath.name=@filename to inject external arbitrary data?

I realise this doesn't cover all of the stated goals of the original issue, but as an extension of the existing -X functionality does it seem a reasonable step forward?

(And if that works out then extending the go.mod syntax as a neater way of specifying ldflags -X values is a next reasonable step?)

@earthboundkid
Copy link
Contributor

That's a very interesting idea, but I'm worried about the security implications.

It's pretty common to do -X 'pkg.BuildVersion=$(git rev-parse HEAD)', but we wouldn't want to let go.mod run arbitrary commands, would we? (I guess go generate does, but that's not something you typically run for downloaded OSS packages.) If go.mod can't handle that, it ends up missing a major use case, so ldflags would still be very common.

Then there's the other issue of making sure @filename is not a symlink to /etc/passwd or whatever.

@flimzy
Copy link
Contributor

flimzy commented Jul 6, 2020

Using the linker precludes support for WASM, and possibly other targets that don't use a linker.

@rsc
Copy link
Contributor

rsc commented Jul 21, 2020

Based on the discussion here, @bradfitz and I worked out a design that sits somewhere in the middle of the two approaches considered above, taking what seems to be the best of each. I've posted a draft design doc, video, and code (links below). Instead of comments on this issue, please use the Reddit Q&A for comments on this specific draft design - Reddit threads and scales discussions better than GitHub does. Thanks!

Video: https://golang.org/s/draft-embed-video
Design: https://golang.org/s/draft-embed-design
Q&A: https://golang.org/s/draft-embed-reddit
Code: https://golang.org/s/draft-embed-code

@ghost
Copy link

ghost commented Jul 22, 2020

@rsc In my opinion, the go:embed proposal is inferior to providing universal sandboxed Go code execution at compile-time which would include reading files and transforming read data into an optimal format best suitable for consumption at runtime.

@diamondburned
Copy link

@atomsymbol That sounds like something waaay outside the scope of this issue.

@ghost
Copy link

ghost commented Jul 22, 2020

@atomsymbol That sounds like something waaay outside the scope of this issue.

I am aware of that.

@kokes
Copy link

kokes commented Jul 31, 2020

I read through the proposal and scanned the code, but couldn't find an answer to this: Will this embedding scheme contain information about the file on disk (~os.Stat)? Or will these timestamps get reset to build time? Either way, these are useful pieces information that gets used in various places, e.g. we can send a 304 for unchanged assets based on this.

Thanks!

Edit: found it in the reddit thread.

The modification time for all embedded files is the zero time, for exactly the reproducibility concerns you listed. (Modules don't even record modification times, again for the same reason.)

https://old.reddit.com/r/golang/comments/hv96ny/qa_goembed_draft_design/fytj7my/

@thomasf
Copy link

thomasf commented Jul 31, 2020

Either way, these are useful pieces information that gets used in various places, e.g. we can send a 304 for unchanged assets based on this.

An ETag header based on the file data hash would solve that problem without having to know anything about dates. But that would have to be known by http.HandlerFS or something to be able to work and to not waste resources it would have to be done only once per file.

@earthboundkid
Copy link
Contributor

But that would have to be known by http.HandlerFS or something to be able to work and to not waste resources it would have to be done only once per file.

How would http.HandlerFS know that the fs.FS was immutable? Should there be an IsImmutable() bool optional interface?

@thomasf
Copy link

thomasf commented Jul 31, 2020

How would http.HandlerFS know that the fs.FS was immutable? Should there be an IsImmutable() bool optional interface?

I don't want to get into implementation details because I'm not the designer of these things but http.HandlerFS could check if it's an embed.FS type and act upon that as a special case, I don't think anyone wants to expand the FS API right now. There could also be an option argument to HandlerFS specifically to tell it to treat a filesystem as immutable. Also if this is done on application start up and all ctime/mtime have zero value handlerFS could use that info to "know" that the file hasn't changed but there are also file systems which might not have mtime or have it disabled so there might be problems there as well.

@rsc
Copy link
Contributor

rsc commented Sep 2, 2020

I wasn't watching the comments on this issue.

@atomsymbol welcome back! It's great to see you commenting here again.
I agree in principle that if we had sandboxing many things would be easier.
On the other hand many things might be harder - builds might never finish.
In any event, we definitely don't have that kind of sandboxing today. :-)

@kokes I am not sure about the details,
but we'll make sure serving an embed.Files over HTTP gets ETags right by default.

@rsc
Copy link
Contributor

rsc commented Sep 2, 2020

I have filed #41191 for accepting the design draft posted back in July.
I am going to close this issue as superseded by that one.
Thanks for the great preliminary discussion here.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests