Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for merging notebooks #1

Closed
jbn opened this issue Apr 30, 2017 · 4 comments
Closed

Add support for merging notebooks #1

jbn opened this issue Apr 30, 2017 · 4 comments

Comments

@jbn
Copy link
Owner

jbn commented Apr 30, 2017

@fperez wrote an nbmerge.py script which "Merge[s]/concatenate[s] multiple IPython notebooks into one." I use it a lot. Evidently, other people do, too. In early 2016, he opened an issue to add the script as an nbconvert tool, but nothing came of it.

As noted in the issue thread, @takluyver's BookBook does a merge/concat to implement a use case very similar to mine. Briefly skimming his repository

Briefly skimming his repository made me realize a few features to keep in mind. In particular, when translating to latex -- the typical case for academic dissertation writers -- it helps to preserve internal linking. (I didn't finish reading the code, but I think he does this by injecting latex labels by notebook.)

Also, when looking at @fperez's original issue, he and @Carreau came up with good (i.e. unsurprising) semantics for metadata merging and notebook naming:

metadata = {}
for n in reversed(notebooks):
    metadata.update(n.metadata)

BookBook also uses <number>-<name>.ipynb semantics for sorting. Specifically, it's a glob over *-*.ipynb (latex.py:143), lexicographically sorted. My convention uses lexicographic sorting, also. However, instead of that glob, it's aggregates all file that ending with .ipynb which do not begin with _. This lets me name files for inclusion as ###_Title_of_Notebook.ipynb. I think this is convenient because:

  1. My eyes don't see the underscores;
  2. You can produce formal titles by name.replace('_', ' '), stripping (optional) number prefix;
  3. There are no spaces to worry about when performing shell actions.

I don't think it's possible to implement the merger as a preprocessor. Preprocessors are stateless, so you can't implement a reduce operation. So, it does seem like a nbstripoutput-like package makes more sense. It fits in a Makefile script just fine. However, there is the tricky-bit of marking file boundaries in the metadata, in case the final phase needs them. If you build the merged notebook first and do no demarcation, that information is gone. I think the most straight-forward way is having the merge script add cell-metadata to the first cell in each notebook during processing. This leaves the original notebooks untouched but doesn't introduce order-of-operations issues in command execution (think: building bibtex files for your pdf.)

Because of the inability to reduce, this doesn't fit into the nbconvert pipeline. So, I need to look up how to implement a cli callable as:

jupyter nbconcat

Or, follow @kynan's lead and implement it as an independent package, which seems more polite anyway -- jupyter is not my namespace. (Keep in mind issue #11 from that repo, concerning unicode piping.)

@jbn
Copy link
Owner Author

jbn commented May 2, 2017

Decided on the last solution. See: nbmerge

@jbn jbn closed this as completed May 2, 2017
@kynan
Copy link

kynan commented Jun 15, 2017

Happy to be an inspiration, but that's too much credit :)

@jbn
Copy link
Owner Author

jbn commented Jun 15, 2017

@kynan Thanks for nbmerge.

This [dissertate] project is stalled, but certainly not dead. Definitely will be borrowing some githooks from your work.

@kynan
Copy link

kynan commented Jun 15, 2017

Yes, please do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants