Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a list of which formats offer roundtripping ? #426

Closed
stuaxo opened this issue Jan 28, 2020 · 8 comments
Closed

Is there a list of which formats offer roundtripping ? #426

stuaxo opened this issue Jan 28, 2020 · 8 comments
Milestone

Comments

@stuaxo
Copy link

stuaxo commented Jan 28, 2020

I tried jupytext initially, with .py, but I noticed, when roundtripping to ipynb, that multiple newlines can result in code cells splitting up.

Now I'm trying .md, as I could see this in github, though it turns out to be not completely ideal as github shows all the metadata at the top.

What is the best format to use that can handle a round trip ?

@mwouts
Copy link
Owner

mwouts commented Feb 6, 2020

Hello @stuaxo , thanks for asking.

The three main formats, i.e. py:percent, py:light and md should be robust to the round-trip. Now the question is how you define the cells: py:percent is the most explicit with # %% before every cell, the two others may indeed use single or double blank lines as cell breaks.

These formats are documented here.

Given a text file, or an .ipynb notebook, you can test the round-trip with

jupytext --to ipynb --test script.py # or --test-strict

as github shows all the metadata at the top

For this, you can give a try to the notebook metadata filter. Uncheck the 'include metadata' entry in the Jupytext menu, or execute:

jupytext --update-metadata '{"jupytext": {"notebook_metadata_filter":"-all"}}' notebook.ipynb

@mwouts
Copy link
Owner

mwouts commented Feb 16, 2020

@stuaxo , I will add a mention that the main formats support roundtripping in the Jupytext CLI, see #441. Do you think this would have helped in your case? Thanks

@stuaxo
Copy link
Author

stuaxo commented Feb 17, 2020

That looks good, is there a list that maps extensions to the format names somewhere ?

When I started I was trying notebook -> py, does that mean I was using the light format ?

@mwouts
Copy link
Owner

mwouts commented Feb 17, 2020

That looks good, is there a list that maps extensions to the format names somewhere ?

That's a good idea! I'll update the PR. Basically,

  • the .md extension corresponds to either the markdown (default) or pandoc formats (calls pandoc, so you should have a recent version of pandoc installed)
  • the .Rmd extension corresponds to the rmarkdown format (default)
  • the script formats uses the language extension, i.e. .py in the context of a Python notebook. And the available formats are light (the default), percent (# %% markers), sphinx (if you are using sphinx-gallery, cannot store cell metadata), hydrogen (same as percent, except that magic commands are not commented out), and nomarker (no cell marker nor metadata, obviously not robust to a round trip).

@mwouts mwouts closed this as completed Feb 18, 2020
@stuaxo
Copy link
Author

stuaxo commented Feb 18, 2020

Thanks.

@cgpu
Copy link

cgpu commented Feb 20, 2020

Hi @mwouts ,

Thank you for creating Jupytext, it's proved to be extremely useful for humane git diffs for me and my team.

Re-opening this issue to avoid redundancy in a newly created relevant issue.
Round-tripping from ipynb to Rmd, and from Rmd back to ipynb I have an almost identical notebook. Some metadata do change, making the ipynb look modified in git diffs.

For example,

I am converting an .ipynb R kernel notebook to .Rmd.

# jupytext installed via conda: `conda install conda-forge::jupytext=1.3.3`
jupytext --to Rmd test.ipynb

Then I am converting back from .Rmd to .ipynb

jupytext --to ipynb test.Rmd

and doing git diff to check fidelity of conversion in the roundtrip of the ipynb.

git diff test.ipynb

This gives me a minor diff in principle, only some metadata.

image

However this messes with automatic sha256sums of files and also highlights the .ipynb as modified in PRs.

❓ Is there a way to set parameters when converting for an identical roundtrip .ipynb?
❓ Is there a way to create the initial test.ipynb for an identical .ipynb when roundtrip converting ?

My initial test.ipynb (created via Jupyter Lab GUI) looks like this:

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hello world"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cat(\"Hello world\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}

My jupytext regenerated test.ipynb looks like this:

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hello world"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cat(\"Hello world\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}

My diff is the following:

},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "3.6.1"

image

Let me know if you need more information for investigating this.

@mwouts
Copy link
Owner

mwouts commented Feb 20, 2020

Hello @cgpu , could you please give a try to the --update flag? I mean,

jupytext --update --to ipynb test.Rmd

What you see above is the effect of Jupytext's default filter for the metadata, which excludes the language_info part from the notebook. The --update flag will take care of preserving the metadata that is in the notebook, and not in the text file.

Alternatively, you can also have a look at the metadata filters and set notebook_metadata_filter: all - but I think the --update approach is simpler.

@cgpu
Copy link

cgpu commented Mar 3, 2020

Hello @cgpu , could you please give a try to the --update flag? I mean,

jupytext --update --to ipynb test.Rmd

What you see above is the effect of Jupytext's default filter for the metadata, which excludes the language_info part from the notebook. The --update flag will take care of preserving the metadata that is in the notebook, and not in the text file.

Alternatively, you can also have a look at the metadata filters and set notebook_metadata_filter: all - but I think the --update approach is simpler.

Thank you for the heads up, I used it as suggested jupytext --update --to ipynb test.Rmd to a .ipynb file with cleared output and it works great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants