papermill for CI/CD is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.
Papermill lets you:
- parameterize notebooks
- execute notebooks
This opens up new opportunities for how notebooks can be used. For example:
- Perhaps you have a financial report that you wish to run with different values on the first or last day of a month or at the beginning or end of the year, using parameters makes this task easier.
- Do you want to run a notebook and depending on its results, choose a particular notebook to run next? You can now programmatically execute a workflow without having to copy and paste from notebook to notebook manually.
-
This is forked from papermill
-
This works with run-notebook github action
-
The output from notebook goes to a
stdout
and to a file calledpapermill-nb-runner.out
iflog_output
option is set -
Skips executing any cells that has
debug
tag in them -
Writes errors to
stderr
and other levels tostdout
-
Sets environment variable
papermill-nb-runner
totrue
, so that notebook can check for it's existence and condition cells based on context of where the notebook is runningEg:
import os if "papermill-nb-runner" in os.environ: print("I am getting executed by papermill")
From the command line:
pip install papermill
For all optional io dependencies, you can specify individual bundles
like s3
, or azure
-- or use all
pip install papermill[all]
This library will support python 2.7 and 3.5+ until end-of-life for python 2 in 2020. After which python 2 support will halt and only 3.x version will be maintained.
To parameterize your notebook designate a cell with the tag parameters
.
Papermill looks for the parameters
cell and treats this cell as defaults for the parameters passed in at execution time. Papermill will add a new cell tagged with injected-parameters
with input parameters in order to overwrite the values in parameters
. If no cell is tagged with parameters
the injected cell will be inserted at the top of the notebook.
Additionally, if you rerun notebooks through papermill and it will reuse the injected-parameters
cell from the prior run. In this case Papermill will replace the old injected-parameters
cell with the new run's inputs.
The two ways to execute the notebook with parameters are: (1) through the Python API and (2) through the command line interface.
import papermill as pm
pm.execute_notebook(
'path/to/input.ipynb',
'path/to/output.ipynb',
parameters = dict(alpha=0.6, ratio=0.1)
)
Here's an example of a local notebook being executed and output to an Amazon S3 account:
$ papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1
NOTE:
If you use multiple AWS accounts, and you have properly configured your AWS credentials, then you can specify which account to use by setting the AWS_PROFILE
environment variable at the command-line. For example:
$ AWS_PROFILE=dev_account papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1
In the above example, two parameters are set: alpha
and l1_ratio
using -p
(--parameters
also works). Parameter values that look like booleans or numbers will be interpreted as such. Here are the different ways users may set parameters:
$ papermill local/input.ipynb s3://bkt/output.ipynb -r version 1.0
Using -r
or --parameters_raw
, users can set parameters one by one. However, unlike -p
, the parameter will remain a string, even if it may be interpreted as a number or boolean.
$ papermill local/input.ipynb s3://bkt/output.ipynb -f parameters.yaml
Using -f
or --parameters_file
, users can provide a YAML file from which parameter values should be read.
$ papermill local/input.ipynb s3://bkt/output.ipynb -y "
alpha: 0.6
l1_ratio: 0.1"
Using -y
or --parameters_yaml
, users can directly provide a YAML string containing parameter values.
$ papermill local/input.ipynb s3://bkt/output.ipynb -b YWxwaGE6IDAuNgpsMV9yYXRpbzogMC4xCg==
Using -b
or --parameters_base64
, users can provide a YAML string, base64-encoded, containing parameter values.
When using YAML to pass arguments, through -y
, -b
or -f
, parameter values can be arrays or dictionaries:
$ papermill local/input.ipynb s3://bkt/output.ipynb -y "
x:
- 0.0
- 1.0
- 2.0
- 3.0
linear_function:
slope: 3.0
intercept: 1.0"
Papermill supports the following name handlers for input and output paths during execution:
-
Local file system:
local
-
HTTP, HTTPS protocol:
http://, https://
-
Amazon Web Services: AWS S3
s3://
-
Azure: Azure DataLake Store, Azure Blob Store
adl://, abs://
-
Google Cloud: Google Cloud Storage
gs://
Read CONTRIBUTING.md for guidelines on how to setup a local development environment and make code changes back to Papermill.
For development guidelines look in the DEVELOPMENT_GUIDE.md file. This should inform you on how to make particular additions to the code base.
We host the Papermill documentation on ReadTheDocs.