Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Download converted documents with uploadable configuration #2413

Closed
wants to merge 34 commits into from

Conversation

mpacer
Copy link
Member

@mpacer mpacer commented Apr 13, 2017

At long last! The ability to upload a custom configuration file to customise exports is here.

There are still a lot of issues I need to work out in terms of the UI, but the core functionality is present.

Steps to accomplish this:

  1. click on File → Custom Download
  2. choose your output format from the dropdown menu
  3. click on choose file and add your .json config file
  4. click submit

Looking forward to getting feedback on this.

Note: by doing this, I've actually made the current "Download as" sub-menu somewhat redundant as that is equivalent to this case but where you simply don't give it any config file.

I'm a little concerned about size limitations due to the current anchor + dataURI approach to downloading things, but I figured others would have clearer thoughts on that than I would be able to express. edit: I realised that the size worries stemmed from an earlier attempt of mine and that I had actually addressed that by using the blob object and createObjectURL approach.

Bonus: it allows downloading without needing to open a new window, which I see as a win.

Thank you to everyone who helped bring this to fruition (@minrk, @takluyver and especially @Carreau).

cc: @bollwyvl

@mpacer
Copy link
Member Author

mpacer commented Apr 13, 2017

cc @rgbkrk and @ivanov may also be interested in this.

}
)

print(resources)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgotten debug

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


@web.authenticated
def get(self, format, path):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicking 2 whitelines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you meant the one below get and the one above self.finish().

fileformat.append(
$('<option>script</option>')
.attr('value','script')
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may be able to chain the append, or put that in a loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, via loop (which takes a standard structure which will make it easier for linking into by 3rd party exporters)

form.append(fileinput);
body.append(form);

var notebook_path = utils.encode_uri_components(this.notebook.notebook_path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you've the same line that defines notebook_path above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gone

};
if (fileinput[0].files.length>0){
filereader.readAsText(fileinput[0].files[0]);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else on same line as }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

xhr.setRequestHeader('X-XSRFToken', xsrf_token);
xhr.onreadystatechange = function(){
if(xhr.readyState === 4){
var blob = xhr.response;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you figured out the unicode thing ? was is "just" not using jquery ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more or less that seemed to do it as bizarre as that is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know exactly what the issue is, but I hope we can work out how to do this with a more familiar API than the low-level XMLHttpRequest interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We searched a lot, and many responses on internet were "do not use jquery". It's trying to do fancy stuff to guess encoding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's possible to turn the encoding detection off? We have Ajax functionality that works throughout the rest of the codebase, I find it hard to believe it's totally incompatible in this one place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, in our specific case, raw XMLHttpRequest was the only thing that was stated (by others) to work for passing around binary objects unaltered as we wanted. And we tried using other mechanisms to get this to work and none of them worked — figuring out this is probably the part that took longest of this entire PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is precisely the place in which it's totally incompatible. It's a known issue: http://stackoverflow.com/a/23797348/1816995. That answer admittedly is a bit old, but I spent a long time trying to solve it and this works without finagling, so I'd prefer to stick with it.

a.download = fileName;
a.click();
window.URL.revokeObjectURL(url);
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably want to remove a from the dom.

@@ -84,6 +84,7 @@
<a href="#">Open...</a></li>
<!-- <hr/> -->
<li class="divider"></li>
<li id="custom_download"><a href="#">Custom Download</a></li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to put that back in its original place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think moving it here is an improvement in the design in the menubar. It should not require drilling down into the menu structure to access this feature.

I'm fine moving it down next to "Download as…" but it shouldn't be one option among the various downloads as it is strictly speaking a broader class of functionality.

Re: "Download as…" I'm thinking of actually switching them to use this post method as then they won't open a new window (which I've always disliked).

@Carreau
Copy link
Member

Carreau commented Apr 14, 2017

Thanks that's great. We might be able to make this more readable with Promises at some point. Don't forget to write some documentation as well :-)

Copy link
Member

@takluyver takluyver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also post a screenshot of the new dialog, so it's easy to see what it will look like?

json_config = json.loads(json_upload.get("config",{}))
c.merge(json_config)
nb_content = json.dumps(json_upload["notebook"])
self.call_nbconvert(format, path, config=c, content=nb_content, post=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing - it allows for both specifying a path and passing the notebook in the request body. I think we should stick to one or the other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's for symmetry with get that was defined before. The "only" difference between get and post is that it gets an extra config object. There is also an efficiency question that if the notebook is on the server already we don't need to send it on the wire (and that's how it works currently). In the end we want to be able to post the notebook content and this is part of the transition effort.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is adding a new handler, so I feel like it should be possible to come up with a neater, clearer way - there's no existing code using POST to this endpoint, so compatibility isn't a concern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually not adding a new handler (from the perspective of the web endpoints), but merely distinguishing whether it got a GET or a POST request. I was trying to keep the code as similar in the conversion step as possible and that seemed the cleanest way to do it.

I'm going to write something longer in the main PR comment thread based on a conversation with Matthias. But the approach I took is the way to have the cleanest backwards compatibility story. We're going to have to come up with a migration story for the more comprehensive version of this feature.

def get(self, format, path):

exporter = get_exporter(format, config=self.config, log=self.log)
def call_nbconvert(self, format, path, config=None, content=None, post=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the post parameter used anywhere? I don't see it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that might have been a remnant of an earlier effort, I'll nix it.

notebook: notebook.toJSON(),
config: config
};
return json_to_pass;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same on this side, you're passing both the path to the notebook and the serialised notebook file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer that's on purpose: currently you need the path which forces us to save the notebook before converting. Also yo need the "path" as it affects things like the naming of the generated files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change the API more fundamentally, but then the name should probably be passed in explicitly as part of the JSON object.

xhr.setRequestHeader('X-XSRFToken', xsrf_token);
xhr.onreadystatechange = function(){
if(xhr.readyState === 4){
var blob = xhr.response;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know exactly what the issue is, but I hope we can work out how to do this with a more familiar API than the low-level XMLHttpRequest interface.

a.style = "display: none";
return function (data, fileName) {
var blob = new Blob([data], {type: "octet/stream"}),
url = window.URL.createObjectURL(blob);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, I didn't know about object URLs.


var that = this;

var handle_nbconvert_post = function(){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming stuff: 'handling' something usually refers to the code that runs when that something is received. This function is getting values from the UI and sending the post request.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It technically is receiving the file if it has been uploaded when clicking the "Download" button.

But, ok, it's changed to trigger_nbconvert_post.

on_done();
}
};
var on_done = function(){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming stuff again: on what being done? on_config_file_read?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also being called if there is no config file being passed in so that wouldn't be quite right either.

if(xhr.readyState === 4){
var blob = xhr.response;
var content_disposition = xhr.getResponseHeader('Content-Disposition');
var filename = content_disposition.match(/filename="(.+)"/)[1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point to have a fallback of some kind, in case the regex doesn't match, or the header isn't even there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, I should just have an error?

I'm pretty sure that the request did not work properly if it doesn't have an attachment and a filename.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably an error, it's just best for it to be a clear one rather than a cryptic one because something you assumed would always be true no longer is.

@mpacer
Copy link
Member Author

mpacer commented Apr 20, 2017

@Carreau and I talked about the choice of continuing to include path and other aspects of the old GET api in this and we realised that there might be issues with backwards compatibility and the way that the endpoints that are being looked for are being defined.

For example a post request handler of a sort exists that does not have a path following it, but the way that the path_regex swallows all paths in the url postceding the format results in us having no way to specify a different API without special casing some of the ways that tornado will handle the current APIs.

As a more general topic, we need to discuss our path toward having nbconvert as a service. I was trying to write this in a way that would allow that fairly easily and I made decisions while keeping that goal in mind.

For example, right now this is transmitting the entire notebook as json when that notebook is already on the server. That introduces higher bandwidth costs but accommodates the case where the notebook is not on the server. But solving that would also require passing along (or embedding) resources to the endpoint that would not otherwise be available for conversion. That seems like a much harder problem, but something we should be thinking about.

@Carreau had specific suggestions about how to move forward with this PR and what issues to open…I don't remember all of them in particular (and need to run and want to post this now) so I'll let him raise those points later.

@takluyver
Copy link
Member

Fair enough about the raw XHR, though I still hope we can make that code a bit neater.

I'm still puzzled why you're talking about backwards compatibility. You're adding a new operation that wasn't possible before, so there's nothing for it to be backwards compatible with. It doesn't matter that the same URLs already work for GET requests; POST requests are entirely separate. If you don't want to pass the path in the URL, make a new endpoint like /nbconvert-config/html (for instance). Or replace the existing nbconvert post handler - I don't think anything uses that.

Backwards compatibility is what forces us to make awkward compromises. You have the liberty of designing a new API that doesn't have to be backwards compatible with anything, so design it to be beautiful and simple and consistent, not involving a load of duplicate information with a 'backwards compatible' label stuck on it.

@mpacer
Copy link
Member Author

mpacer commented Apr 28, 2017

oops…messed up a rebase and forced pushed without thinking about it. I'll try to fix this.

@ellisonbg
Copy link
Contributor

Recently I have been looking at the UX design around RStudio's document conversion stuff. I knmow some folks who use Jupyter and RStudio and from their perspective, the UX of RStudio is so far ahead of us on this point, they wouldn't think of using nbconvert - even though nbconvert does much of what is offered on RStudio from the technical perspective.

What is a remarkable is that from a technical perspective, what RStudio does is much more primitive that what we do in the notebook (custom REST endpoint, menus, custom UI). They have nicer UI, but the result of that UI is to just fire off a command in the terminal. It converts things in place and leaves the converted documents (HTML, pdf, etc.) in that directory.

I think we should step back from the technical details of this and do some UX design work to understand what Jupyter users really need in this area. It may be that we can provide what they need with some small UI improvements and no difficult technical work.

I think studying the RMarkdown/RStudio model would be a good starting point.

@mpacer
Copy link
Member Author

mpacer commented Oct 4, 2017

@ellisonbg I agree with you that we have much to learn from the RMarkdown and RStudio approach to document creation. Fortunately, that is exactly the kind of thing that this PR is trying to do!

nbconvert (as a CLI) works as a simple command that you can just fire off in the terminal, which converts things in place and leaves the file(s†) in the directory. You can configure it by passing in a configuration file with the --config argument.

Unfortunately, the notebook doesn't have any good way to surface this functionality when exporting to different formats. That is even taking into account that nbconvert is being used on the backend to manage the conversion.

The primary problem is that there is no user-friendly way to pass a configuration file to nbconvert from the front end.

It is (strictly speaking) possible to configure your exports in a way that is mostly in the front-end. Specifically, the configuration file that nbconvert uses by default is the same configuration file that the NotebookApp receives at the point of its instantiation. So the only way to currently change the configuration of an export is to shut down the server, and then restart it.

Forcing a user to stop whatever they were doing in order to shut down the server and restart it is about as user hostile an approach to configuration as I can imagine. Solving that was the first task of this PR.

This PR makes it possible to pass in a JSON version of the configuration file to the nbconvert calls, which allows us to get around that limitation. You can just create whatever configuration file you want and pass it in.

The second problem that this addresses has to do with how the download occurs. Currently our downloads occur by redirecting us to a new tab where we pass the information needed for conversion via the url. At best, opening a new tab is an inconvenient and unnecessary context switch.

This PR allows us to trigger the download in the background of the current tab without any interruption to a person's workflow.

A third problem this approach inadvertently solves is a feature that @ian-r-rose was asking for: the ability to pass in a notebook as part of the data attached to a POST request that is then able to be converted by nbconvert and return the resulting file in the response to the POST request.

Perhaps most importantly to your points, this PR solves many problems that will make it easier for others to try other UX deisgns on the basis of the infrastructure built here.

So, I think in fact your arguments are reasons for moving forward with this PR, not stepping away from it. Hopefully with these explanations, you agree. If not, then please respond, and I will do my best to address any remaining concerns.


Footnotes:

†: For example, for document formats that don't support images being directly embedded (e.g., plaintext formats like rst), nbconvert creates a directory that contains the relevant images along side the main file.

@ellisonbg
Copy link
Contributor

I probably didn't flush out enough details. Here is what I am proposing:

  • Use existing Jupyter services for working with the filesystem and editing files.
  • For the configuration file, provide a nice way for the user to create and edit a config file as just a regular text file that would sit next to the notebook to be converted. We have an existing REST endpoint for working with files. It could be as simple as a button that write a sample config file to that directory over the rest endpoint.
  • For the editing of that file, just use our exsiting file editor UI.
  • For the running of nbconvert, don't use a REST end point, just open a terminal in the UI for the user and send the right command to it, something like nbconvert yournotebook.ipynb --config config.pys.

In the classic notebook, all of this is possible, but it will be a bit clunky because the different UI's needed (file browser, notebook, file editor, terminal) are on different pages. But it is still possible. In JupyterLab, we can provide an extremely nice and optimized UX around these abstractions.

This is exactly how RStudio works. The only difference is that RMarkdown allows the conversion config to be put into the markdown file in a YAML block. We could also explore things like that as well if we want, but it is not needed for all of this to work.

@ian-r-rose
Copy link
Member

It seems to me that there are two points of discussion here which are related but distinct:

  1. How do we improve the existing notebook UX for running nbconvert? @mpacer's points about not needing to open a new tab seem like they should be uncontroversial.
  2. In order to get the UX we want, do we need to expand to another REST endpoint? @ellisonbg, it seems like you are suggesting punting on that for custom-configured nbconvert exports and just having people doing that manually through the terminal+file-editor. This obviates the need for restarting the server, as I understand it.
    One place where that would be difficult to make work is the use-case that I ran into with @jupyterlab/google-drive: If the notebook store is not on the server, the current path-based nbconvert REST API does not work (cf jupyterlab/jupyterlab-google-drive Remove genutils from requirements.txt #51). In that case, actually being able to POST a notebook contents+configuration is actually a really nice solution.

@Carreau
Copy link
Member

Carreau commented Oct 4, 2017

I agree with M and Ian, and we had this discussion at the dev meeting a couple of month ago; while the approach that you (@ellisonbg) is proposing has some advantage from the user point of view it is in large part orthogonal to what is developed here. This not only expose a UI to do some conversion; but also expose nbconvert through a rest API. This has been discussed quite a bit in various context and is necessary for other improvement we'd like to do; like an nbconvert CLI that use remote conversion - for example because getting latex installation correct is not easy.

I also want to point out that not everyone is using and developing on top of JupyterLab and that the rest API for nbconvert is extremely useful for custom clients wishing to do conversions.

@mpacer mpacer force-pushed the mmd_export branch 2 times, most recently from 7d1f362 to deaac69 Compare October 6, 2017 23:48
@mpacer
Copy link
Member Author

mpacer commented Oct 7, 2017

I'm at a loss as to why this is suddenly failing when 817a87c didn't fail.

@rgbkrk do you have any thoughts? This looks really similar to the errors in #2884, but I can't figure out what could be triggering race conditions all of a sudden when it worked in the previous commit…

@mpacer
Copy link
Member Author

mpacer commented Oct 7, 2017

ok… now when i go back and run tests locally on 817a87c, the tests now fail. I'm deeply baffled as to why. but I'm going to stop trying to appease the casperjs gods and call this a night.

If someone tries this branch and they can't get this to work, please let me know.

@ellisonbg
Copy link
Contributor

@Carreau thanks for clarifying the intended usage of the different parts of this PR.

I completely understand the general need for a REST API for nbconvert. In particular, the main usage case that I see is a hosted version of nbconvert that is run separately from the rest of the notebook server.

In general, I tend to think that the REST services of the notebook server are one of the project's "open standards" and we should evolve it conservatively and with very clear versioning semantics and documentation. At the same time, I want nbconvert to be able to move quickly and innovate.

Because of this, I would probably feel more comfortable with the server side parts of this being in nbconvert and used in the notebook server as a server extension. That would also make it very easy to develop a separate server that just does nbconvert REST calls. We have taken the exact same approach with the Git server extension that exposes a REST API to git.

For the classic notebook I am fine with the UI additions in this PR, if the server side stuff is moved to a server extension, the UI could remain and add a small amount of logic to see if the server side stuff is installed and enabled. It could also be added as a nbextension, but that is a pain. The only potential issue there is version mismatch between the UI and the server extension.

For JupyterLab usage, the UI would probably be developed as a JupyterLab extension (an npm extension) so any of these options are fine on that front.

I agree these things are mostly separate from the issues I was bringing up about the usability of nbconvert. However, my points about the usability still stand and I don't think that these proposed changes remove the need for us to consider those matters on their own.

var url =
utils.url_path_join(
that.base_url,
"nbconvert",
Copy link
Member

@rgbkrk rgbkrk Feb 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch consts to var per
remove trailing commas
@mpacer
Copy link
Member Author

mpacer commented Feb 21, 2018

Working on some tests & changes that came about as a result of writing those tests.

I've introduced a decision point. I need to make either the configuration file that is uploaded the source of truth for the output format or the setting in the application. I'm going to go with the configuration for now, as there would be no way to otherwise point to a non-entrypoint specified output format.

- use Config object
- keep naming consistent with other Handlers
- use export_format instead of output_format, to match
NbConvert.export_format
- search configuration for export format and fallback to the json body
also, change name to NbconvertConfigHandler

var fileformat = $("<select>");

var export_opts = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see slides here... could be the case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the tests pass after I made the changes @rgbkrk suggested, but I definitely should include slides there. Apologies!

<li id="download_markdown"><a href="#">{% trans %}Markdown (.md){% endtrans %}</a></li>
<li id="download_rst"><a href="#">{% trans %}reST (.rst){% endtrans %}</a></li>
<li id="download_latex"><a href="#">{% trans %}LaTeX (.tex){% endtrans %}</a></li>
<li id="download_pdf"><a href="#">{% trans %}PDF via LaTeX (.pdf){% endtrans %}</a></li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask you why you remove {% trans %} on these ones?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a while ago that I had made that decision with input from either @takluyver or @Carreau, on the grounds that for some of those, a translation would never make sense (e.g., LaTeX should always be LaTeX). However you’re right that we can always include the translation tag and assume that there won’t be a mistranslation (e.g., treating reST as rest (the verb) with weird capitalisation and translating accordingly). Certainly Notebook, Script and the word “via” make sense to translate. I can add those tags back.

- change the capitalization to match menu options
- add slides
- add translations back
Thanks @damianavila
mpacer added a commit to mpacer/notebook that referenced this pull request Feb 23, 2018
This changes the base-path to be / instead of /api so that in the future
other APIs that return files and do not redirect you to a new address
(e.g., see jupyter#2413).

This also improves the way that the api spec handler responds to a
request, by actually displaying the spec with the appropriate
content-type header (text/x-yaml) instead of triggering a download with
the StaticFileHandler's default application/octet-stream mimetype.
@mpacer
Copy link
Member Author

mpacer commented May 1, 2018

@gnestor @takluyver I just realised I never wrote out the API spec for this, so I definitely need to get that in before this would be merged and now that our spec is cleaner, this should be much easier.

@blink1073
Copy link
Contributor

Closing this PR as stale as part of housekeeping (and because of the merge conflicts). Thanks for exploring this @mpacer!

@blink1073 blink1073 closed this Jun 8, 2020
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 24, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants