Skip to content

Cl app / pre-commit hook to clean Jupyter Notebooks metadata, execution_count and optionally output.

License

Notifications You must be signed in to change notification settings

ayasyrev/nbmetaclean

Repository files navigation

nbmetaclean

Collections of python scripts for checking and cleaning Jupyter Notebooks metadata, execution_count and optionally output. Can be used as command line tool or pre-commit hook.

Pure Python, no dependencies.

Can be used as a pre-commit hook or as a command line tool.

PyPI - Python Version PyPI Status Tests
Codecov

nbmetaclean

Clean Jupyter Notebooks metadata, execution_count and optionally output.

nbcheck

Check Jupyter Notebooks for errors and (or) warnings in outputs.

Base usage

Pre-commit hook

Nbmetaclean can be used as a pre-commit hook, with pre-commit You do not need to install nbmetaclean, it will be installed automatically. add to .pre-commit-config.yaml:

repos:
    - repo: https://github.com/ayasyrev/nbmetaclean
        rev: 0.1.1
        hooks:
        - id: nbmetaclean
        - id: nbcheck
          args: [ --ec, --err, --warn ]

Command line tool

Without install:

If you use uv package manager, you can nbmetaclean without install. To clean notebooks:

uvx nbmetaclean

To check notebooks:

uvx --from nbmetaclean nbcheck --ec --err --warn

Install:

pip install nbmetaclean

Usage: run nbmetaclean or nbcheck command with path to notebook or folder with notebooks. If no path is provided, current directory will be used as path.

It is possible to use nbclean command instead of nbmetaclean. nbmetaclean will be used by defaults in favour of usage with uvx

nbmetaclean

nbcheck should be run with flags:

  • --ec for execution_count check
  • --err for check errors in outputs
  • --warn for check warnings in outputs
nbcheck --ec --err --warn

Nbmetaclean

Default settings

By default, the following settings are used:

  • Clean notebook metadata, except authors and language_info / name.
  • Clean cells execution_count.
  • Preserve metadata at cells.
  • Preserve cells outputs.
  • After cleaning notebook, timestamp for file will be set to previous values.

Arguments

Check available arguments:

nbmetaclean -h

usage: nbmetaclean [-h] [-s] [--not_ec] [--not-pt] [--dont_clear_nb_metadata] [--clear_cell_metadata] [--clear_outputs]
[--nb_metadata_preserve_mask NB_METADATA_PRESERVE_MASK [NB_METADATA_PRESERVE_MASK ...]]
[--cell_metadata_preserve_mask CELL_METADATA_PRESERVE_MASK [CELL_METADATA_PRESERVE_MASK ...]] [--dont_merge_masks] [--clean_hidden_nbs] [-D] [-V]
[path ...]

Clean metadata and execution_count from Jupyter notebooks.

positional arguments:
  path                  Path for nb or folder with notebooks.

options:
  -h, --help            show this help message and exit
  -s, --silent          Silent mode.
  --not_ec              Do not clear execution_count.
  --not-pt              Do not preserve timestamp.
  --dont_clear_nb_metadata
                        Do not clear notebook metadata.
  --clear_cell_metadata
                        Clear cell metadata.
  --clear_outputs       Clear outputs.
  --nb_metadata_preserve_mask NB_METADATA_PRESERVE_MASK [NB_METADATA_PRESERVE_MASK ...]
                        Preserve mask for notebook metadata.
  --cell_metadata_preserve_mask CELL_METADATA_PRESERVE_MASK [CELL_METADATA_PRESERVE_MASK ...]
                        Preserve mask for cell metadata.
  --dont_merge_masks    Do not merge masks.
  --clean_hidden_nbs    Clean hidden notebooks.
  -D, --dry_run         perform a trial run, don't write results
  -V, --verbose         Verbose mode. Print extra information.

Execution_count

If you want to leave execution_count add --not_ec flag at command line or args: [--not_ec] line to .pre-commit-config.yaml.

repos:
    - repo: https://github.com/ayasyrev/nbmetaclean
        rev: 0.1.1
        hooks:
        - id: nbmetaclean
          args: [ --not_ec ]
nbmetaclean --not_ec

Clear outputs

If you want to clear outputs, add --clear_outputs at command line or [ --clean_outputs ] line to .pre-commit-config.yaml.

repos:
    - repo: https://github.com/ayasyrev/nbmetaclean
        rev: 0.1.1
        hooks:
        - id: nbmetaclean
          args: [ --clean_outputs ]
nbmetaclean --clean_outputs

Nbcheck

Check Jupyter Notebooks for correct execution_count, errors and (or) warnings in outputs.

Execution_count

Check that all code cells executed one after another.

Strict mode

By default, execution_count check in strict mode. All cells must be executed, one after another.

pre-commit config example:

repos:
    - repo: https://github.com/ayasyrev/nbmetaclean
        rev: 0.1.1
        hooks:
        - id: nbcheck
          args: [ --ec ]

command line example:

nbcheck --ec

Not strict mode

--not_strict flag can be used to check that next cell executed after previous one, but execution number can be more than +1.

pre-commit config example:

repos:
    - repo: https://github.com/ayasyrev/nbmetaclean
        rev: 0.1.1
        hooks:
        - id: nbcheck
          args: [ --ec, --not_strict ]

command line example:

nbcheck --ec --not_strict

Allow notebooks with no execution_count

--no_exec flag allows notebooks with all cells without execution_count. If notebook has cells with execution_count and without execution_count, pre-commit will return error.

pre-commit config example:

repos:
    - repo: https://github.com/ayasyrev/nbmetaclean
        rev: 0.1.1
        - id: nbcheck
          args: [ --ec, --no_exec ]

command line example:

nbcheck --ec --no_exec

Errors and Warnings

--err and --warn flags can be used to check for errors and warnings in outputs.

pre-commit config example:

repos:
    - repo: https://github.com/ayasyrev/nbmetaclean
        rev: 0.1.1
        hooks:
        - id: nbcheck
          args: [ --err, --warn ]

command line example:

nbcheck --err --warn