Preprocessor output files are deleted when hbs renderer is called (from mdBook > 0.3.1) #1087

sytsereitsma · 2019-11-04T12:47:58Z

Hi,

I have a preprocessor (mdbook-plantuml) which creates files in the book/img directory. The changes in issue #985, remove these files again (the destination.exists() if clause in Renderer::render(...) in src/renderer/html_handlebars/hbs_renderer.rs).

This kind of breaks my preprocessor :-(

Any ideas on how this could/should be fixed?

Sytse

ehuss · 2019-11-04T21:50:01Z

@Michael-F-Bryan can you take a look at this?

sytsereitsma · 2019-11-05T07:47:03Z

I think a solution could be extending the Renderer trait with a prepare function, which can be called before running the preprocessor.

Michael-F-Bryan · 2019-11-05T11:40:11Z

I wouldn't expect a preprocessor to be touching anything within another backend's output directory. That could end up being quite brittle because you're making assumptions on how the backend will render things.

Your idea of lifecycle methods (e.g. before_preprocess()) is a good one, although we'd need to update the 3rd party plugin mechanism to take it into account. One way this could be propagated to 3rd party backends would be to append an argument to the executed command, with no arguments meaning "run the normal rendering code". Another idea is to use environment variables (e.g. MDBOOK_PHASE=render or MDBOOK_PHASE=before_preprocess). Both could be implemented in a backwards-compatible way.

A second idea is to add a section to the [build] config to tell mdbook "before running the preprocessors for backend X, delete its output directory":

[build]
delete-output-directory-before-build = ["html"]

The latter would probably be easier to implement, but I'm not sure if it would be scalable/extensible or considered hacky.

Otherwise, instead of generating image files, mdbook-plantuml could use data URIs. That way when you do your transformation, instead of replacing the original text with a a normal <img src="some-file.png"/> you could emit <img src="data:image/png;base64,R0lGXoGAiqh..." />.

@sytsereitsma what are your thoughts? Do any of those solutions sound easier/better to you? I'm happy to make PRs for any mdbook changes.

sytsereitsma · 2019-11-05T13:41:25Z

@Michael-F-Bryan I'll change my preprocessor to use data URIs to avoid the issue altogether. You have a nice point where my preprocessor should be independent of the renderer.

I do think the lifecycle methods can avoid the problem (e.g. for other preprocessors). It is very difficult to troubleshoot for a regular user because it just seems as if the preprocessor does not work properly.

Didn't know about data URIs, thanks for the tip 👍

Michael-F-Bryan · 2019-11-05T14:06:45Z

It is very difficult to troubleshoot for a regular user because it just seems as if the preprocessor does not work properly.

See sytsereitsma/mdbook-plantuml#7 (comment). You can write error messages to STDERR and return a non-zero exit code from the preprocessor to let users know about errors.

I do think the lifecycle methods can avoid the problem (e.g. for other preprocessors).

If we were to go down the lifecycle method path, can you think of any other lifecycle methods we may want to add to either the preprocessor or renderer?

sytsereitsma · 2019-11-06T07:27:28Z

As far as my preprocessor is concerned everything is working fine. It's just that the output gets deleted once it is done :-)

Regarding the lifecycle methods I do not see any other sensible method than before_preprocessor when looking at MDBook::execute_build_process (src/book/mod.rs).

I have one more suggestion. Since deleting a directory is a significant operation, I'd add a log message when performing the deletion (maybe even an info message).

Regarding fixing my preprocessor I think I will copy the files to the source dir of the book to be renderer agnostic. The maximum size of data URIs differs between browsers (according to MDN opera only allows 65k). Next to that it is likely that another renderer does not support data URIs.

dylanowen · 2019-11-06T21:32:06Z

I ran into this issue as well with https://github.com/dylanowen/mdbook-graphviz I "solved" it by putting my output in the src directory which is pretty gross.

As for suggestions of what we could do. I like the model we have now where we're always operating across stdin -> stdout so chaining / modifying markdown works nicely.

What about extending BookItem to include a Resource File or adding Resources to the Book? Preprocessors that wanted to process on resource files would have that option, or they could also generate them and pass them down the line. Then our entire src directory becomes something we're operating on vs just the markdown.

I'm not sure how we'd serialize the resource files across the json boundary, and I'm not sure about backwards compatibility but it would give preprocessors more power to interact with the various files without needing to modify the src or the output directories.

Michael-F-Bryan · 2019-11-07T00:14:25Z

I like the idea of adding a "resource" section to the Book. In theory this should be backwards compatible because the JSON deserializer would interpret a missing "resource" key as an empty Vec<Resource>.

It wouldn't be backwards compatible if we were to add all non-markdown files as Resources to the Book , though (preprocessors/renderers not aware of Resources would silently drop them). That feels like a more general solution because it means we load all files under src/ into memory so files generated by a preprocessor aren't special.

I agree that writing into the src/ or book/html/ directories feels wrong. It'd be like if a build script wrote to target/debug/ or src/ instead of the directory it's told write to ($OUTPUT_DIR, the equivalent of MDBook::build_dir_for()).

sytsereitsma · 2019-11-08T19:45:58Z

Something that came to mind.... Images can get rather large and plentiful, meaning the amount of data to transfer over stdio (and json) may get quite large. How about outputting to a 'preprocessor-resources' directory, which gets copied into the book output dir by the renderer?

sytsereitsma · 2019-12-09T15:09:34Z

Hi Guys,

I'd like to see this resolved. The serve command gets into an infinite regeneration loop because the files in the src dir are modified by the preprocessor (with the workaround I created).

I'm more than happy to create a PR for this, but I'd like to have consensus on the path to follow.

As discussed above we have the following options (so far):

The before_preprocessor lifecycle method (quick hacky fix)
Extending the Book with a resources section and serialize the images using the stdin interface between the preprocessor and mdBook (is there a length limit for JSON strings in serde_json?).
I'd suggest deflated base64 encoded image data,
Use a preprocessor-resources directory for staging (which is ignored by the serve command)

Sytse

dylanowen · 2019-12-13T01:17:42Z

That's a good point about transferring large amounts of data over stdio, I've been unable to find much info on the limits or performance hits for it though.

My rationale for passing data through stdout/in was that it would give preprocessors more power to interact with each other.

For example we could create a "Thumbnail Preprocessor" that would take in images in the src directory and generate a thumbnails page to reference them. In the current model this is possible, but it'd be great if the output of mdbook-plantuml could feed into this Thumbnail preprocessor, without needing to stage anything in the src directory.

sytsereitsma · 2019-12-13T09:48:39Z

I'm not too worried about the stdio overhead compared to the whole sequence of events (i.e. generate a file, load it in memory, zip and encode it transfer it, extract and save it). My main concern is the unneeded data serialization/deserialization, even though that probably is a negligible overhead compared to the graphviz rendering.

So time to put the assumptions to the test and generate some data to base the decisions on. I'll do some spikes on how serde json handles large string values and see if I can collect some metrics on different strategies.

segfaultsourcery · 2020-01-16T15:15:52Z

How about a slightly different approach, where in the first step of the build process, the src tree gets copied into an intermediary work directory (let's call it prep) that pre-processors can write to. Once they're done, the normal build takes over, but instead of building the files found in src, it builds whatever is found in prep. Once built, prep can safely be destroyed.

Should solve the issue for this preprocesssor and create a better user-experience. Also enables other downstream renderers to work agnostically and not only the html renderer. - rust-lang/mdBook#1087

Should solve the issue for this preprocesssor and create a better user-experience. Also enables other downstream renderers to work agnostically and not only the html renderer. Can convert text files to data-uri but will still defer to the original text embedding method to preserve historical functionality - rust-lang/mdBook#1087

* Creating DataURI for image inlining Should solve the issue for this preprocesssor and create a better user-experience. Also enables other downstream renderers to work agnostically and not only the html renderer. Can convert text files to data-uri but will still defer to the original text embedding method to preserve historical functionality - rust-lang/mdBook#1087 * Made the use of data URIs configurable * Cargo fmt and compiler warnings * Do not create data uri images in book src dir Create them in the .mdbook-plantuml-cache dir in the book root instead * Update e2e tests * Use markdown output instead of <img> tags and allow clicking Clicking appears to be blocked for data uris on most browsers Co-authored-by: Paul FREAKN Baker <paul-nelson-baker@users.noreply.github.com>

sytsereitsma mentioned this issue Nov 4, 2019

Allow backends to cache previous results #985

Merged

sytsereitsma added a commit to sytsereitsma/mdBook that referenced this issue Feb 12, 2020

[fixes rust-lang#1087] Stage preprocessor generated in a temp dir

8017083

ehuss added the A-Preprocessor Area: Preprocessors label Apr 21, 2020

ehuss mentioned this issue May 23, 2020

Clean build directory before preprocessors run instead of after. #1235

Closed

dylanowen mentioned this issue Aug 25, 2020

Running mdbook watch with this plugin causes it to continuously trigger dylanowen/mdbook-graphviz#2

Closed

sytsereitsma mentioned this issue Aug 29, 2020

Fix #1087 stage preprocessor generated files in temp dir #1144

Open

sytsereitsma mentioned this issue Sep 28, 2020

Allow preprocessors to pass generated resources (fix issue #1087) #1341

Open

ehuss mentioned this issue Jul 4, 2021

Are we looking for more maintainers? #1589

Closed

niko-dunixi mentioned this issue Feb 22, 2022

Creating DataURI for image inlining sytsereitsma/mdbook-plantuml#31

Closed

niko-dunixi mentioned this issue Feb 22, 2022

Creating DataURI for image inlining sytsereitsma/mdbook-plantuml#32

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessor output files are deleted when hbs renderer is called (from mdBook > 0.3.1) #1087

Preprocessor output files are deleted when hbs renderer is called (from mdBook > 0.3.1) #1087

sytsereitsma commented Nov 4, 2019

ehuss commented Nov 4, 2019

sytsereitsma commented Nov 5, 2019

Michael-F-Bryan commented Nov 5, 2019

sytsereitsma commented Nov 5, 2019

Michael-F-Bryan commented Nov 5, 2019

sytsereitsma commented Nov 6, 2019

dylanowen commented Nov 6, 2019

Michael-F-Bryan commented Nov 7, 2019

sytsereitsma commented Nov 8, 2019

sytsereitsma commented Dec 9, 2019

dylanowen commented Dec 13, 2019

sytsereitsma commented Dec 13, 2019

segfaultsourcery commented Jan 16, 2020

Preprocessor output files are deleted when hbs renderer is called (from mdBook > 0.3.1) #1087

Preprocessor output files are deleted when hbs renderer is called (from mdBook > 0.3.1) #1087

Comments

sytsereitsma commented Nov 4, 2019

ehuss commented Nov 4, 2019

sytsereitsma commented Nov 5, 2019

Michael-F-Bryan commented Nov 5, 2019

sytsereitsma commented Nov 5, 2019

Michael-F-Bryan commented Nov 5, 2019

sytsereitsma commented Nov 6, 2019

dylanowen commented Nov 6, 2019

Michael-F-Bryan commented Nov 7, 2019

sytsereitsma commented Nov 8, 2019

sytsereitsma commented Dec 9, 2019

dylanowen commented Dec 13, 2019

sytsereitsma commented Dec 13, 2019

segfaultsourcery commented Jan 16, 2020