Add support for merging notebooks #1

jbn · 2017-04-30T18:04:18Z

@fperez wrote an nbmerge.py script which "Merge[s]/concatenate[s] multiple IPython notebooks into one." I use it a lot. Evidently, other people do, too. In early 2016, he opened an issue to add the script as an nbconvert tool, but nothing came of it.

As noted in the issue thread, @takluyver's BookBook does a merge/concat to implement a use case very similar to mine. Briefly skimming his repository

Briefly skimming his repository made me realize a few features to keep in mind. In particular, when translating to latex -- the typical case for academic dissertation writers -- it helps to preserve internal linking. (I didn't finish reading the code, but I think he does this by injecting latex labels by notebook.)

Also, when looking at @fperez's original issue, he and @Carreau came up with good (i.e. unsurprising) semantics for metadata merging and notebook naming:

metadata = {}
for n in reversed(notebooks):
    metadata.update(n.metadata)

BookBook also uses <number>-<name>.ipynb semantics for sorting. Specifically, it's a glob over *-*.ipynb (latex.py:143), lexicographically sorted. My convention uses lexicographic sorting, also. However, instead of that glob, it's aggregates all file that ending with .ipynb which do not begin with _. This lets me name files for inclusion as ###_Title_of_Notebook.ipynb. I think this is convenient because:

My eyes don't see the underscores;
You can produce formal titles by name.replace('_', ' '), stripping (optional) number prefix;
There are no spaces to worry about when performing shell actions.

I don't think it's possible to implement the merger as a preprocessor. Preprocessors are stateless, so you can't implement a reduce operation. So, it does seem like a nbstripoutput-like package makes more sense. It fits in a Makefile script just fine. However, there is the tricky-bit of marking file boundaries in the metadata, in case the final phase needs them. If you build the merged notebook first and do no demarcation, that information is gone. I think the most straight-forward way is having the merge script add cell-metadata to the first cell in each notebook during processing. This leaves the original notebooks untouched but doesn't introduce order-of-operations issues in command execution (think: building bibtex files for your pdf.)

Because of the inability to reduce, this doesn't fit into the nbconvert pipeline. So, I need to look up how to implement a cli callable as:

jupyter nbconcat

Or, follow @kynan's lead and implement it as an independent package, which seems more polite anyway -- jupyter is not my namespace. (Keep in mind issue #11 from that repo, concerning unicode piping.)

The text was updated successfully, but these errors were encountered:

jbn · 2017-05-02T11:41:37Z

Decided on the last solution. See: nbmerge

kynan · 2017-06-15T19:34:29Z

Happy to be an inspiration, but that's too much credit :)

jbn · 2017-06-15T20:00:01Z

@kynan Thanks for nbmerge.

This [dissertate] project is stalled, but certainly not dead. Definitely will be borrowing some githooks from your work.

kynan · 2017-06-15T22:28:53Z

Yes, please do!

jbn closed this as completed May 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for merging notebooks #1

Add support for merging notebooks #1

jbn commented Apr 30, 2017

jbn commented May 2, 2017

kynan commented Jun 15, 2017

jbn commented Jun 15, 2017

kynan commented Jun 15, 2017

Add support for merging notebooks #1

Add support for merging notebooks #1

Comments

jbn commented Apr 30, 2017

jbn commented May 2, 2017

kynan commented Jun 15, 2017

jbn commented Jun 15, 2017

kynan commented Jun 15, 2017