-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epub exporter does not include embedded attachments; proposal for an output agnostic mechanism #473
Comments
@mpacer On further reflection here ... maybe I'm mixing up your concept of epub and markdown. It seems it's possible to embed base64 images within a markdown document using this approach illustrated here. This red-dot test |
I think the +1 to designing a good mechanism for an intermediate step with files stored in a temporary directory. This is what nbconvert to PDF does as well (with Latex as the intermediate), and there are problems with that (see #552), so we have at least two cases for it. |
@takluyver ... thanks for your input here. I may be misunderstanding things, but in the case of markdown, the files generated from embedded images will be in a directory that can't just be temporary and thrown away, as with the PDF solution, since the |
For Markdown export, it wouldn't use a temporary directory - we already have a way to create a permanent directory on export, which is used when extracting images from outputs. The temporary directory would be for converting to epub, which makes a markdown intermediate and then converts that to epub. A quick Google suggests that images etc. are contained inside the epub file (which is actually a zip archive), so it doesn't need to reference images in a separate file. |
Thanks for clarifying. |
Images are being embedded in attachments as base64 encoded strings.
Right now the epub exporter does seem to be getting a link to an attachment like structure for some kind of strange file system query, e.g.:
[NbConvertApp] Converting notebook hide_cells_based_on_tags.ipynb to epub [NbConvertApp] Writing 9663 bytes to notebook.md [NbConvertApp] Building Epub pandoc: Could not find media 'attachment://ScreenShot2016-10-12at19.20.34.png', skipping… [NbConvertApp] Epub successfully created [NbConvertApp] Writing 7101 bytes to hide_cells_based_on_tags.epub
(NB: in that ↑ I changed the ` to a ' and the ... to a … for better highlighting)
This makes me think that something similar might be happening (or not happening) elsewhere specifically in #328. Some of the discussion there partially inspires that which is below.
I think there may be a output agnostic way to approach this, as a three step gather, tap, and clean (optionally) process. First, we gather and organise all of the relevant resources into a single location with known relative directory structure. Second, we use format specific mechanisms to include these images. Third, we optionally clean up everything to return it to the state that it was in (if we want it to be a single file per #328 (comment)).
To encapsulate these steps a creating a new directory in which to work will be useful. We can treat the events as happening from the root level of the directory & build up the structure, that means that we can give things canonical known locations in known structures. Then, because it can be done in terms of relative paths, the code that stores and finds files can rely on a common file path function by specifying locations in terms of relative paths as defined in the canonical structure. That takes care of 1. Format specific stuff can then be developed on these common locations, which will take a while but will take care of 2. And then by using temporary directories optionally, that allows for easy cleanup.
For example, the epub reader uses the markdown exporter as an intermediate step, producing the file in a temporary directory. This is because the markdown exporter spits out a bunch of media files to be referenced if they are output. Pandoc's epub exporter can find these files and include them in its native format. However, we do not do this for attached files, instead those are embedded as
![ScreenShot2016-10-12at19.20.34.png](attachment://ScreenShot2016-10-12at19.20.34.png)
, which does not point to a file system location. If we treat input attachments as we do output, the markdown exporter will be able to make attachments visible as easily as it does the output images. If we change the link to a more appropriate location such as![ScreenShot2016-10-12at19.20.34.png](./attachment/ScreenShot2016-10-12at19.20.34.png)
this would be sufficient to find the attached files.And we can likely use a similar means we should make it so that the markdown to html conversion can either include these as embedded images or as separate files. In one case you just include the dataURI in the other case you maintain the same mechanism as described above for epub. The same machinery can support both versions. Then, instead of trying to figure out how to pass them in independently , we create them as separate files and then read them back in. Yes it will be less efficient, but then we will have a common mechanism for achieving all of this.
From there we can work backwards and figure out ways to solve the problem in a more efficient manner. But this should be able to be done without a postprocessor but rather as a standard default option based on somewhat common output agnostic machinery.
I'm going to try to make this work for the epub exporter regardless because we're already using a TemporaryWorkingDirectory, so it'll make for a good test case. The way I'll approach it is by giving a hook to do this in the markdown exporter itself, since it's already handling the correct file placement for the output, I figure I can mirror that for the attachments.
Tips on how to make it generalisable are extremely welcome, however I may pursue a local optimum for the epub solution and then try to abstract away from that rather than transform any piece of advise on proper generalisation to code from the get-go. If in a week that hasn't gone anywhere then I'll know I'm barking up the wrong tree because as far as I'm expecting it, this shouldn't be too hard of a modification to make.
Relates to #467.
The text was updated successfully, but these errors were encountered: