mwouts · mwouts · Oct 29, 2018 · Oct 18, 2018 · Oct 22, 2018 · Oct 22, 2018
diff --git a/HISTORY.rst b/HISTORY.rst
@@ -3,6 +3,16 @@
 Release History
 ---------------
 
+0.8.4 (2018-10-29)
+++++++++++++++++++++++
+
+**Improvements**
+
+- Notebook metadata is filtered - only the most common metadata are stored in the text representation (#105)
+- New config option ``freeze_metadata`` on the content manager and on the command line interface (defaults to ``False``). Use this option to avoid creating a YAML header or cell metadata if there was none initially. (#110)
+- Language magic arguments are preserved in R Markdown, and also supported in ``light`` and ``percent`` scripts (#111, #114, #115)
+- First markdown cell exported as a docstring when using the Sphinx format (#107)
+
 0.8.3 (2018-10-19)
 ++++++++++++++++++++++
 

diff --git a/README.md b/README.md
@@ -177,8 +177,9 @@ jupytext --test --update notebook.ipynb -to py:percent
 Note that `jupytext --test` compares the resulting notebooks according to its expectations. If you wish to proceed to a strict comparison of the two notebooks, use `jupytext --test-strict`, and use the flag `-x` to report with more details on the first difference, if any.
 
 Please note that
-- When you associate a Jupyter kernel with your text notebook, that information goes to a YAML header at the top of your script or Markdown document. And Jupytext itself may create a `jupytext` entry in the notebook metadata.
-- Cell metadata are available in `light` and `percent` formats for all cell types. Sphinx Gallery scripts in `sphinx` format do not support cell metadata. R Markdown and R scripts in `spin` format support cell metadata for code cells only. Markdown documents do not support cell metadata. And a few cell metadata (`autoscroll`, `collapsed`, `scrolled`, `trusted`) are never included in the text representation, but are still preserved by the paired notebooks, and the `--update` conversion.
+- When you associate a Jupyter kernel with your text notebook, that information goes to a YAML header at the top of your script or Markdown document. And Jupytext itself may create a `jupytext` entry in the notebook metadata. Have a look at the [`freeze_metadata` option](#cell-and-notebook-metadata-filtering) if you want to avoid this.
+- Cell metadata are available in `light` and `percent` formats for all cell types. Sphinx Gallery scripts in `sphinx` format do not support cell metadata. R Markdown and R scripts in `spin` format support cell metadata for code cells only. Markdown documents do not support cell metadata.
+- By default, a few cell metadata are not included in the text representation of the notebook. And only the most standard notebook metadata are exported. Learn more on this in this in the [metadata filtering](#Cell-and-notebook-metadata-filtering) section.
 - Representing a Jupyter notebook as a Markdown or R Markdown document has the effect of splitting markdown cells with two consecutive blank lines into multiple cells (as the two blank line pattern is used to separate cells).
 
 ## Format specifications
@@ -253,14 +254,6 @@ The `spin` format implements these [specifications](https://rmarkdown.rstudio.co
 - Markdown cells are commented with `#' `.
 - Code cells are exported verbatim. Cell metadata are signalled with `#+`. Cells end with a blank line, an explicit start of cell marker, or a markdown cell.
 
-## Extending the `light` and `percent` formats to more languages
-
-You want to extend the `light` and `percent` format to another language? Please let us know! In principle that is easy, and you will only have to:
-- document the language extension and comment by adding one line to `_SCRIPT_EXTENSIONS` in `languages.py`.
-- contribute a sample notebook in `tests\notebooks\ipynb_[language]`.
-- add two tests in `test_mirror.py`: one for the `light` format, and another one for the `percent` format.
-- Make sure that the tests pass, and that the text representations of your notebook, found in  `tests\notebooks\mirror\ipynb_to_script` and `tests\notebooks\mirror\ipynb_to_percent`, are valid scripts.
-
 ## Jupyter Notebook or Jupyter Lab?
 
 Jupytext works very well with the Jupyter Notebook editor, and we recommend that you get used to Jupytext within `jupyter notebook` first.
@@ -282,6 +275,37 @@ c.ContentsManager.comment_magics = True # or False
 
 Also, you may want some cells to be active only in the Python, or R Markdown representation. For this, use the `active` cell metadata. Set `"active": "ipynb"` if you want that cell to be active only in the Jupyter notebook. And `"active": "py"` if you want it to be active only in the Python script. And `"active": "ipynb,py"` if you want it to be active in both, but not in the R Markdown representation...
 
+## Cell and notebook metadata filtering
+
+The text representation of the notebook focuses on the part of the notebook that you have written. That is also the part of the notebook that should go under version control. Outputs and metadata that are (re)-constructed automatically when the notebook is executed do not need to enter the text representation.
+
+To that aim, cell metadata `autoscroll`, `collapsed`, `scrolled`, `trusted` and `ExecuteTime` are not included in the text representation. And only the required notebook metadata: `kernelspec`, `language_info` and `jupytext` are saved when a notebook is exported as text.
+
+When a paired notebook is loaded, Jupytext reconstructs the filtered metadata using the `.ipynb` file. Please keep in mind that the `.ipynb` file is typically not distributed accross contributors, and that the cell metadata may be lost when an input cell changes (cells are matched according to their contents). Thus, if some cell or notebook metadata are important to your notebook, you should preserve it in the text version. Change the default metadata filtering as follows:
+- If you want to preserve all the notebook metadata but `widgets` and `varInspector` in the YAML header, set a notebook metadata `"jupytext": {"metadata_filter": {"notebook": "all,-widgets,-varInspector"}}`
+- If you want to preserve the `toc` section (in addition to the default YAML header), use `"jupytext": {"metadata_filter": {"notebook": "toc"}}`
+- At last, if you want to modify the default cell filter and allow `ExecuteTime` and `autoscroll`, but not `hide_ouput`, use `"jupytext": {"metadata_filter": {"cells": "ExecuteTime,autoscroll,-hide_ouput"}}`
+
+A default value for these filters can be set on Jupytext's content manager using, for instance
+```
+c.default_notebook_metadata_filter = "all,-widgets,-varInspector"
+c.default_cell_metadata_filter = "ExecuteTime,autoscroll,-hide_ouput"
+```
+Help us improving the default configuration: if you are aware of a notebook metadata that should not be filtered, or of a cell metadata that should always be filtered, please open an issue and let us know.
+
+Finally, if you prefer that scripts and markdown files with no YAML header do not get one (nor additional cell metadata) when opened and saved in Jupyter, use the `freeze_metadata` option on the command line `jupytext`, or set the following option on Jupytext's content manager:
+```python
+c.ContentsManager.freeze_metadata = True
+```
+
+## Extending the `light` and `percent` formats to more languages
+
+You want to extend the `light` and `percent` format to another language? Please let us know! In principle that is easy, and you will only have to:
+- document the language extension and comment by adding one line to `_SCRIPT_EXTENSIONS` in `languages.py`.
+- contribute a sample notebook in `tests\notebooks\ipynb_[language]`.
+- add two tests in `test_mirror.py`: one for the `light` format, and another one for the `percent` format.
+- Make sure that the tests pass, and that the text representations of your notebook, found in  `tests\notebooks\mirror\ipynb_to_script` and `tests\notebooks\mirror\ipynb_to_percent`, are valid scripts.
+
 ## Jupytext's releases and backward compatibility
 
 Jupytext will continue to evolve as we collect more feedback, and discover more ways to represent notebooks as text files. When a new release of Jupytext comes out, we make our best to ensure that it will not break your notebooks. Format changes will not happen often, and we try hard not to introduce breaking changes.

diff --git a/jupytext/cell_metadata.py b/jupytext/cell_metadata.py
@@ -26,14 +26,14 @@
 
 _BOOLEAN_OPTIONS_DICTIONARY = [('hide_input', 'echo', True),
                                ('hide_output', 'include', True)]
-_IGNORE_METADATA = [
+_IGNORE_CELL_METADATA = ','.join('-{}'.format(name) for name in [
     # Frequent cell metadata that should not enter the text representation
     # (these metadata are preserved in the paired Jupyter notebook).
     'autoscroll', 'collapsed', 'scrolled', 'trusted', 'ExecuteTime',
     # Pre-jupytext metadata
     'skipline', 'noskipline',
     # Jupytext metadata
-    'lines_to_next_cell', 'lines_to_end_of_cell_marker']
+    'cell_marker', 'lines_to_next_cell', 'lines_to_end_of_cell_marker'])
 _PERCENT_CELL = re.compile(
     r'(# |#)%%([^\{\[]*)(|\[raw\]|\[markdown\])([^\{\[]*)(|\{.*\})\s*$')
 
@@ -68,7 +68,6 @@ def metadata_to_rmd_options(language, metadata):
     :return:
     """
     options = (language or 'R').lower()
-    metadata = filter_metadata(metadata)
     if 'name' in metadata:
         options += ' ' + metadata['name'] + ','
         del metadata['name']
@@ -237,9 +236,6 @@ def rmd_options_to_metadata(options):
         else:
             if update_metadata_from_rmd_options(name, value, metadata):
                 continue
-            if name == 'active':
-                metadata[name] = value.replace('"', '').replace("'", '')
-                continue
             try:
                 metadata[name] = _py_logical_values(value)
                 continue
@@ -252,7 +248,7 @@ def rmd_options_to_metadata(options):
     if ('active' in metadata or metadata.get('run_control', {}).get('frozen') is True) and 'eval' in metadata:
         del metadata['eval']
 
-    return language, metadata
+    return metadata.get('language') or language, metadata
 
 
 def md_options_to_metadata(options):
@@ -283,7 +279,9 @@ def try_eval_metadata(metadata, name):
     value = metadata[name]
     if not isinstance(value, (str, unicode)):
         return
-    if value.startswith('"') or value.startswith("'"):
+    if (value.startswith('"') and value.endswith('"')) or (value.startswith("'") and value.endswith("'")):
+        if name in ['active', 'magic_args', 'language']:
+            metadata[name] = value[1:-1]
         return
     if value.startswith('c(') and value.endswith(')'):
         value = '[' + value[2:-1] + ']'
@@ -304,11 +302,6 @@ def json_options_to_metadata(options, add_brackets=True):
         return {}
 
 
-def filter_metadata(metadata):
-    """Filter technical metadata"""
-    return {k: metadata[k] for k in metadata if k not in _IGNORE_METADATA}
-
-
 def metadata_to_json_options(metadata):
     """Represent metadata as json text"""
     return json.dumps(metadata)

diff --git a/jupytext/cell_reader.py b/jupytext/cell_reader.py
@@ -76,7 +76,6 @@ class BaseCellReader(object):
 
     cell_type = None
     language = None
-    default_language = 'python'
     default_comment_magics = None
     metadata = None
     content = []
@@ -95,6 +94,7 @@ class BaseCellReader(object):
     def __init__(self, ext, comment_magics=None):
         """Create a cell reader with empty content"""
         self.ext = ext
+        self.default_language = _SCRIPT_EXTENSIONS.get(ext, {}).get('language', 'python')
         self.comment_magics = comment_magics if comment_magics is not None else self.default_comment_magics
 
     def read(self, lines):
@@ -106,8 +106,7 @@ def read(self, lines):
         self.metadata_and_language_from_option_line(lines[0])
 
         if self.metadata and 'language' in self.metadata:
-            self.language = self.metadata['language']
-            del self.metadata['language']
+            self.language = self.metadata.pop('language')
 
         # Parse cell till its end and set content, lines_to_next_cell
         pos_next_cell = self.find_cell_content(lines)
@@ -202,10 +201,14 @@ def find_cell_content(self, lines):
         # Cell content
         source = lines[cell_start:cell_end_marker]
 
-        self.content = self.uncomment_code_and_magics(source)
+        if not is_active(self.ext, self.metadata) or \
+                ('active' not in self.metadata and self.language and self.language != self.default_language):
+            self.content = uncomment(source, self.comment if self.ext != '.R' else '#')
+        else:
+            self.content = self.uncomment_code_and_magics(source)
 
         # Exactly two empty lines at the end of cell (caused by PEP8)?
-        if (self.ext == '.py' and explicit_eoc and last_two_lines_blank(source)):
+        if self.ext == '.py' and explicit_eoc and last_two_lines_blank(source):
             self.content = source[:-2]
             self.metadata['lines_to_end_of_cell_marker'] = 2
 

diff --git a/jupytext/cell_to_text.py b/jupytext/cell_to_text.py
@@ -3,9 +3,9 @@
 import re
 from copy import copy
 from .languages import cell_language
-from .cell_metadata import filter_metadata, is_active, \
-    metadata_to_rmd_options, metadata_to_json_options, \
-    metadata_to_double_percent_options
+from .cell_metadata import is_active, _IGNORE_CELL_METADATA
+from .cell_metadata import metadata_to_rmd_options, metadata_to_json_options, metadata_to_double_percent_options
+from .metadata_filter import filter_metadata
 from .magics import comment_magic, escape_code_start
 from .cell_reader import LightScriptCellReader
 from .languages import _SCRIPT_EXTENSIONS
@@ -31,13 +31,29 @@ def comment_lines(lines, prefix):
 class BaseCellExporter(object):
     """A class that represent a notebook cell as text"""
     default_comment_magics = None
+    parse_cell_language = True
 
-    def __init__(self, cell, default_language, ext, comment_magics=None):
+    def __init__(self, cell, default_language, ext, comment_magics=None, cell_metadata_filter=None):
         self.ext = ext
         self.cell_type = cell.cell_type
         self.source = cell_source(cell)
-        self.metadata = filter_metadata(cell.metadata)
-        self.language = cell_language(self.source) or default_language
+        self.unfiltered_metadata = cell.metadata
+        self.metadata = filter_metadata(copy(cell.metadata), cell_metadata_filter, _IGNORE_CELL_METADATA)
+        self.language, magic_args = cell_language(self.source) if self.parse_cell_language else (None, None)
+
+        if self.language:
+            if magic_args:
+                if ext.endswith('.Rmd'):
+                    if "'" in magic_args:
+                        magic_args = '"' + magic_args + '"'
+                    else:
+                        magic_args = "'" + magic_args + "'"
+                self.metadata['magic_args'] = magic_args
+
+            if not ext.endswith('.Rmd'):
+                self.metadata['language'] = self.language
+
+        self.language = self.language or default_language
         self.default_language = default_language
         self.comment = _SCRIPT_EXTENSIONS.get(ext, {}).get('comment', '#')
         self.comment_magics = comment_magics if comment_magics is not None else self.default_comment_magics
@@ -96,8 +112,8 @@ class MarkdownCellExporter(BaseCellExporter):
     """A class that represent a notebook cell as Markdown"""
     default_comment_magics = False
 
-    def __init__(self, cell, default_language, ext, comment_magics=None):
-        BaseCellExporter.__init__(self, cell, default_language, ext, comment_magics)
+    def __init__(self, *args, **kwargs):
+        BaseCellExporter.__init__(self, *args, **kwargs)
         self.comment = ''
 
     def code_to_text(self):
@@ -119,8 +135,8 @@ class RMarkdownCellExporter(BaseCellExporter):
     """A class that represent a notebook cell as Markdown"""
     default_comment_magics = True
 
-    def __init__(self, cell, default_language, ext, comment_magics=None):
-        BaseCellExporter.__init__(self, cell, default_language, ext, comment_magics)
+    def __init__(self, *args, **kwargs):
+        BaseCellExporter.__init__(self, *args, **kwargs)
         self.comment = ''
 
     def code_to_text(self):
@@ -158,6 +174,12 @@ class LightScriptCellExporter(BaseCellExporter):
     """A class that represent a notebook cell as a Python or Julia script"""
     default_comment_magics = True
 
+    def __init__(self, *args, **kwargs):
+        BaseCellExporter.__init__(self, *args, **kwargs)
+        for key in ['endofcell']:
+            if key in self.unfiltered_metadata:
+                self.metadata[key] = self.unfiltered_metadata[key]
+
     def is_code(self):
         # Treat markdown cells with metadata as code cells (#66)
         if self.cell_type == 'markdown' and self.metadata:
@@ -169,10 +191,8 @@ def is_code(self):
     def code_to_text(self):
         """Return the text representation of a code cell"""
         active = is_active(self.ext, self.metadata)
-        if active and self.language != self.default_language:
+        if self.language != self.default_language and 'active' not in self.metadata:
             active = False
-            self.metadata['active'] = 'ipynb'
-            self.metadata['language'] = self.language
 
         source = copy(self.source)
         escape_code_start(source, self.ext, self.language)
@@ -232,8 +252,8 @@ class RScriptCellExporter(BaseCellExporter):
     """A class that can represent a notebook cell as a R script"""
     default_comment_magics = True
 
-    def __init__(self, cell, default_language, ext, comment_magics=None):
-        BaseCellExporter.__init__(self, cell, default_language, ext, comment_magics)
+    def __init__(self, *args, **kwargs):
+        BaseCellExporter.__init__(self, *args, **kwargs)
         self.comment = "#'"
 
     def code_to_text(self):
@@ -267,6 +287,7 @@ class DoublePercentCellExporter(BaseCellExporter):
     """A class that can represent a notebook cell as an
     Hydrogen/Spyder/VScode script (#59)"""
     default_comment_magics = False
+    parse_cell_language = False
 
     def code_to_text(self):
         """Not used"""
@@ -303,10 +324,14 @@ class SphinxGalleryCellExporter(BaseCellExporter):
     default_cell_marker = '#' * 79
     default_comment_magics = True
 
-    def __init__(self, cell, default_language, ext, comment_magics=None):
-        BaseCellExporter.__init__(self, cell, default_language, ext, comment_magics)
+    def __init__(self, *args, **kwargs):
+        BaseCellExporter.__init__(self, *args, **kwargs)
         self.comment = '#'
 
+        for key in ['cell_marker']:
+            if key in self.unfiltered_metadata:
+                self.metadata[key] = self.unfiltered_metadata[key]
+
     def code_to_text(self):
         """Not used"""
         pass