Programmatic execution of notebooks #2031

miguelgfierro · 2023-10-31T11:43:50Z

Description

This PR removes papermill and scrapbook and adds the same functionality

Related Issues

Fixes #2012

References

Checklist:

I have followed the contribution guidelines and code style for this project.
I have added tests covering my contributions.
I have updated the documentation accordingly.
This PR is being made to staging branch and not to main branch.

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

miguelgfierro · 2023-10-31T16:29:27Z

Weird error. The input is 100k, but the regex parser inputs 10k

tests/functional/examples/test_notebooks_gpu.py F                                                                                                                     [100%]

================================================================================= FAILURES ==================================================================================
______________________________________________________ test_ncf_deep_dive_functional[100k-10-512-expected_values0-42] _______________________________________________________

notebooks = {'als_deep_dive': '/home/u/MS/recommenders/examples/02_model_collaborative_filtering/als_deep_dive.ipynb', 'als_pyspar...aseline_deep_dive.ipynb', 'benchmark_movielens': '/home/u/MS/recommenders/examples/06_benchmarks/movielens.ipynb', ...}
output_notebook = 'output.ipynb', kernel_name = 'python3', size = '100k', epochs = 10, batch_size = 512
expected_values = {'map': 0.0435856, 'map2': 0.0510391, 'ndcg': 0.37586, 'ndcg2': 0.202186, ...}, seed = 42

    @pytest.mark.gpu
    @pytest.mark.notebooks
    @pytest.mark.parametrize(
        "size, epochs, batch_size, expected_values, seed",
        [
            (
                "100k",
                10,
                512,
                {
                    "map": 0.0435856,
                    "ndcg": 0.37586,
                    "precision": 0.169353,
                    "recall": 0.0923963,
                    "map2": 0.0510391,
                    "ndcg2": 0.202186,
                    "precision2": 0.179533,
                    "recall2": 0.106434,
                },
                42,
            )
        ],
    )
    def test_ncf_deep_dive_functional(
        notebooks,
        output_notebook,
        kernel_name,
        size,
        epochs,
        batch_size,
        expected_values,
        seed,
    ):
        notebook_path = notebooks["ncf_deep_dive"]
>       execute_notebook(
            notebook_path,
            output_notebook,
            kernel_name=kernel_name,
            parameters=dict(
                TOP_K=10,
                MOVIELENS_DATA_SIZE=size,
                EPOCHS=epochs,
                BATCH_SIZE=batch_size,
                SEED=seed,
            ),
        )

tests/functional/examples/test_notebooks_gpu.py:91:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
recommenders/utils/notebook_utils.py:99: in execute_notebook
    executed_notebook, _ = execute_preprocessor.preprocess(
../../anaconda/envs/recommenders/lib/python3.9/site-packages/nbconvert/preprocessors/execute.py:100: in preprocess
    self.preprocess_cell(cell, resources, index)
../../anaconda/envs/recommenders/lib/python3.9/site-packages/nbconvert/preprocessors/execute.py:121: in preprocess_cell
    cell = self.execute_cell(cell, index, store_history=True)
../../anaconda/envs/recommenders/lib/python3.9/site-packages/jupyter_core/utils/__init__.py:166: in wrapped
    return loop.run_until_complete(inner)
../../anaconda/envs/recommenders/lib/python3.9/asyncio/base_events.py:647: in run_until_complete
    return future.result()
../../anaconda/envs/recommenders/lib/python3.9/site-packages/nbclient/client.py:1058: in async_execute_cell
    await self._check_raise_for_error(cell, cell_index, exec_reply)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <nbconvert.preprocessors.execute.ExecutePreprocessor object at 0x7fc7855dc6d0>
cell = {'cell_type': 'code', 'execution_count': 3, 'metadata': {'execution': {'iopub.status.busy': '2023-10-31T16:26:54.66711...oad_pandas_df(\n    size=MOVIELENS_DATA_SIZE,\n    header=["userID", "itemID", "rating", "timestamp"]\n)\n\ndf.head()'}
cell_index = 9
exec_reply = {'buffers': [], 'content': {'ename': 'ValueError', 'engine_info': {'engine_id': -1, 'engine_uuid': 'e1defabf-6d1f-40f2...e, 'engine': 'e1defabf-6d1f-40f2-a86b-3533e758ecca', 'started': '2023-10-31T16:26:54.667305Z', 'status': 'error'}, ...}

    async def _check_raise_for_error(
        self, cell: NotebookNode, cell_index: int, exec_reply: t.Optional[t.Dict]
    ) -> None:
        if exec_reply is None:
            return None

        exec_reply_content = exec_reply['content']
        if exec_reply_content['status'] != 'error':
            return None

        cell_allows_errors = (not self.force_raise_errors) and (
            self.allow_errors
            or exec_reply_content.get('ename') in self.allow_error_names
            or "raises-exception" in cell.metadata.get("tags", [])
        )
        await run_hook(
            self.on_cell_error, cell=cell, cell_index=cell_index, execute_reply=exec_reply
        )
        if not cell_allows_errors:
>           raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)
E           nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
E           ------------------
E           df = movielens.load_pandas_df(
E               size=MOVIELENS_DATA_SIZE,
E               header=["userID", "itemID", "rating", "timestamp"]
E           )
E
E           df.head()
E           ------------------
E
E
E           ---------------------------------------------------------------------------
E           ValueError                                Traceback (most recent call last)
E           Cell In[3], line 1
E           ----> 1 df = movielens.load_pandas_df(
E                 2     size=MOVIELENS_DATA_SIZE,
E                 3     header=["userID", "itemID", "rating", "timestamp"]
E                 4 )
E                 6 df.head()
E
E           File ~/MS/recommenders/recommenders/datasets/movielens.py:201, in load_pandas_df(size, header, local_cache_path, title_col, genres_col, year_col)
E               199 size = size.lower()
E               200 if size not in DATA_FORMAT and size not in MOCK_DATA_FORMAT:
E           --> 201     raise ValueError(f"Size: {size}. " + ERROR_MOVIE_LENS_SIZE)
E               203 if header is None:
E               204     header = DEFAULT_HEADER
E
E           ValueError: Size: 10k. Invalid data size. Should be one of {100k, 1m, 10m, or 20m, or mock100}

../../anaconda/envs/recommenders/lib/python3.9/site-packages/nbclient/client.py:914: CellExecutionError
============================================================================= warnings summary ==============================================================================
../../anaconda/envs/recommenders/lib/python3.9/site-packages/jupyter_client/connect.py:20
  /home/u/anaconda/envs/recommenders/lib/python3.9/site-packages/jupyter_client/connect.py:20: DeprecationWarning: Jupyter is migrating its paths to use standard platformdirs
  given by the platformdirs library.  To remove this warning and
  see the appropriate new directories, set the environment variable
  `JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
  The use of platformdirs will be the default in `jupyter_core` v6
    from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================== short test summary info ==========================================================================
FAILED tests/functional/examples/test_notebooks_gpu.py::test_ncf_deep_dive_functional[100k-10-512-expected_values0-42] - nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
======================================================================= 1 failed, 1 warning in 5.23s ========================================================================

Similar error but with a different notebook:

    @pytest.mark.notebooks
    @pytest.mark.experimental
    def test_rlrmc_quickstart_runs(notebooks, output_notebook, kernel_name):
        notebook_path = notebooks["rlrmc_quickstart"]
>       execute_notebook(
            notebook_path,
            output_notebook,
            kernel_name=kernel_name,
            parameters=dict(rank_parameter=2, MOVIELENS_DATA_SIZE="mock100"),
        )

tests/unit/examples/test_notebooks_python.py:88: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
recommenders/utils/notebook_utils.py:99: in execute_notebook
    executed_notebook, _ = execute_preprocessor.preprocess(
/azureml-envs/azureml_8854b0bdccc7bb7425b7c3f2145bc96f/lib/python3.9/site-packages/nbconvert/preprocessors/execute.py:102: in preprocess
    self.preprocess_cell(cell, resources, index)
/azureml-envs/azureml_8854b0bdccc7bb7425b7c3f2145bc96f/lib/python3.9/site-packages/nbconvert/preprocessors/execute.py:123: in preprocess_cell
    cell = self.execute_cell(cell, index, store_history=True)
/azureml-envs/azureml_8854b0bdccc7bb7425b7c3f2145bc96f/lib/python3.9/site-packages/jupyter_core/utils/__init__.py:173: in wrapped
    return loop.run_until_complete(inner)
/azureml-envs/azureml_8854b0bdccc7bb7425b7c3f2145bc96f/lib/python3.9/asyncio/base_events.py:647: in run_until_complete
    return future.result()
/azureml-envs/azureml_8854b0bdccc7bb7425b7c3f2145bc96f/lib/python3.9/site-packages/nbclient/client.py:1058: in async_execute_cell
    await self._check_raise_for_error(cell, cell_index, exec_reply)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <nbconvert.preprocessors.execute.ExecutePreprocessor object at 0x14e3f8053940>
cell = ***'cell_type': 'code', 'execution_count': 4, 'metadata': ***'execution': ***'iopub.status.busy': '2023-10-31T16:49:29.42054...= movielens.load_pandas_df(\n    size=MOVIELENS_DATA_SIZE,\n    header=["userID", "itemID", "rating", "timestamp"]\n)'***
cell_index = 7
exec_reply = ***'buffers': [], 'content': ***'ename': 'ValueError', 'engine_info': ***'engine_id': -1, 'engine_uuid': '79883de8-cb38-47df...e, 'engine': '79883de8-cb38-47df-a3b2-e63115050117', 'started': '2023-10-31T16:49:29.420956Z', 'status': 'error'***, ...***

    async def _check_raise_for_error(
        self, cell: NotebookNode, cell_index: int, exec_reply: t.Optional[t.Dict]
    ) -> None:
        if exec_reply is None:
            return None
    
        exec_reply_content = exec_reply['content']
        if exec_reply_content['status'] != 'error':
            return None
    
        cell_allows_errors = (not self.force_raise_errors) and (
            self.allow_errors
            or exec_reply_content.get('ename') in self.allow_error_names
            or "raises-exception" in cell.metadata.get("tags", [])
        )
        await run_hook(
            self.on_cell_error, cell=cell, cell_index=cell_index, execute_reply=exec_reply
        )
        if not cell_allows_errors:
>           raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)
E           nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
E           ------------------
E           
E           df = movielens.load_pandas_df(
E               size=MOVIELENS_DATA_SIZE,
E               header=["userID", "itemID", "rating", "timestamp"]
E           )
E           ------------------
E           
E           
E           ---------------------------------------------------------------------------
E           ValueError                                Traceback (most recent call last)
E           Cell In[4], line 1
E           ----> 1 df = movielens.load_pandas_df(
E                 2 size=MOVIELENS_DATA_SIZE,
E                 3 header=["userID","itemID","rating","timestamp"]
E                 4 )
E           
E           File /mnt/azureml/cr/j/f1d53f64bdb5410196f4cc9b6e069605/exe/wd/recommenders/datasets/movielens.py:201, in load_pandas_df(size, header, local_cache_path, title_col, genres_col, year_col)
E               199 size = size.lower()
E               200 if size not in DATA_FORMAT and size not in MOCK_DATA_FORMAT:
E           --> 201     raise ValueError(f"Size: ***size***. " + ERROR_MOVIE_LENS_SIZE)
E               203 if header is None:
E               204     header = DEFAULT_HEADER
E           
E           ValueError: Size: 2m. Invalid data size. Should be one of ***100k, 1m, 10m, or 20m, or mock100***

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

loomlike · 2023-10-31T20:13:19Z

@miguelgfierro Sorry to ask dumb question as I missed the discussion, but why do we reinvent the wheel here? Couldn't papermill to execute the notebook (which seems to be still actively developed, not like the scrapbook) + other open-sourced recording packages, e.g. mlflow for recording and verifying metrics? Actually mlflow recording codes will show a good example for recording metrics too in our notebooks...

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

miguelgfierro · 2023-11-01T05:43:49Z

It seems that papermill is also not maintained: https://pypi.org/project/papermill/#history. They haven't updated it in over a year. MLFlow for recording is an interesting idea, the only problem would be that we would add another dependency. One of the reasons to do this from scratch is to reduce dependencies.

This code doesn't add any new dependency and it will allow us to do the same functionality we had. If in the future, there is an appetite to change the recording of the data with MLFlow, we can add it.

loomlike

Thank you for the changes. These are huge changes and I really appreciate your hard work! I left few comments that are not critical, so feel free to fix them or leave it for later.

I think at some point, we'll want to split this "notebook util" into a separate project/package because of two reasons: 1) it's not relevant to "recommenders" 2) this utility is super useful for any DS projects that has notebook examples and it will be very beneficial for them to use the utility.

recommenders/utils/notebook_utils.py

tests/unit/recommenders/utils/test_notebook_utils.py

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com>

SimonYansenZhao · 2023-11-13T13:53:01Z

@miguelgfierro I think the pattern matching is incorrect. See the example below that uses the pattern matching in execute_notebook():

>>> import re
>>> pattern = re.compile(rf"\bmy_param\s*=\s*([^#\n]+)(?:#.*$)?", re.MULTILINE)
>>> cell_source = "\"my_param = 'abc'\n\", \"another_param = 'abc'\n\""
>>> matches = re.findall(pattern, "\"my_param = 'abc'\n\", \"another_param = 'abc'\n\"")
>>> matches
["'abc'"]
>>> cell_source.replace(matches[0].strip(), '10')
'"my_param = 10\n", "another_param = 10\n"'

All parameters whose value is 'abc' above are changed.

Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com>

SimonYansenZhao · 2023-11-14T11:16:56Z

@miguelgfierro I fixed the pattern matching bug. Now a new error is catched. I'll take a look the day after tomorrow.

loomlike · 2023-11-14T17:38:24Z

@miguelgfierro I think the pattern matching is incorrect. See the example below that uses the pattern matching in execute_notebook():
>>> import re
>>> pattern = re.compile(rf"\bmy_param\s*=\s*([^#\n]+)(?:#.*$)?", re.MULTILINE)
>>> cell_source = "\"my_param = 'abc'\n\", \"another_param = 'abc'\n\""
>>> matches = re.findall(pattern, "\"my_param = 'abc'\n\", \"another_param = 'abc'\n\"")
>>> matches
["'abc'"]
>>> cell_source.replace(matches[0].strip(), '10')
'"my_param = 10\n", "another_param = 10\n"'
All parameters whose value is 'abc' above are changed.

@SimonYansenZhao can we modularize the parameter pattern matching & replace part to pull out from execute_notebook so that we can unit tests better?

Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com>

miguelgfierro · 2023-11-18T08:01:15Z

examples/00_quick_start/dkn_MIND.ipynb

                "import tensorflow as tf\n",
                "tf.get_logger().setLevel(\"ERROR\") # only show error messages\n",
                "tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)\n",
                "\n",
                "from recommenders.models.deeprec.deeprec_utils import download_deeprec_resources, prepare_hparams\n",
                "from recommenders.models.deeprec.models.dkn import DKN\n",
                "from recommenders.models.deeprec.io.dkn_iterator import DKNTextIterator\n",
+                "from recommenders.utils.notebook_utils import store_metadata\n",


There is a weird error in the DKN notebook. It's a timeout. I have never seen this error.

It might be related to a bad configuration of CUDA (see below)? Let me rerun that test.

@pytest.mark.notebooks @pytest.mark.gpu def test_dkn_quickstart(notebooks, output_notebook, kernel_name): notebook_path = notebooks["dkn_quickstart"] > execute_notebook( notebook_path, output_notebook, kernel_name=kernel_name, parameters=dict(EPOCHS=1, BATCH_SIZE=500), ) tests/unit/examples/test_notebooks_gpu.py:118: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ recommenders/utils/notebook_utils.py:107: in execute_notebook executed_notebook, _ = execute_preprocessor.preprocess( /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib/python3.8/site-packages/nbconvert/preprocessors/execute.py:102: in preprocess self.preprocess_cell(cell, resources, index) /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib/python3.8/site-packages/nbconvert/preprocessors/execute.py:123: in preprocess_cell cell = self.execute_cell(cell, index, store_history=True) /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib/python3.8/site-packages/jupyter_core/utils/__init__.py:173: in wrapped return loop.run_until_complete(inner) /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib/python3.8/asyncio/base_events.py:616: in run_until_complete return future.result() /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib/python3.8/site-packages/nbclient/client.py:1005: in async_execute_cell exec_reply = await self.task_poll_for_reply /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib/python3.8/site-packages/nbclient/client.py:806: in _async_poll_for_reply error_on_timeout_execute_reply = await self._async_handle_timeout(timeout, cell) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <nbconvert.preprocessors.execute.ExecutePreprocessor object at 0x152370dcb400> timeout = 600 cell = ***'cell_type': 'code', 'execution_count': 7, 'metadata': ***'pycharm': ***'is_executing': False***, 'scrolled': True, 'execut...\x1b[49m\x1b[43m)\x1b[49m\n', '\x1b[0;31mKeyboardInterrupt\x1b[0m: ']***], 'source': 'model.fit(train_file, valid_file)'*** async def _async_handle_timeout( self, timeout: int, cell: NotebookNode | None = None ) -> None | dict[str, t.Any]: self.log.error("Timeout waiting for execute reply (%is)." % timeout) if self.interrupt_on_timeout: self.log.error("Interrupting kernel") assert self.km is not None await ensure_async(self.km.interrupt_kernel()) if self.error_on_timeout: execute_reply = ***"content": *****self.error_on_timeout, "status": "error"*** return execute_reply return None else: assert cell is not None > raise CellTimeoutError.error_from_timeout_and_cell( "Cell execution timed out", timeout, cell ) E nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 600 seconds. E The message was: Cell execution timed out. E Here is a preview of the cell contents: E ------------------- E model.fit(train_file, valid_file) E ------------------- /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib/python3.8/site-packages/nbclient/client.py:856: CellTimeoutError ----------------------------- Captured stdout call ----------------------------- ERROR:traitlets:Timeout waiting for execute reply (600s). ----------------------------- Captured stderr call ----------------------------- 2023-11-18 07:19:58.273399: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib: 2023-11-18 07:19:58.273444: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2023-11-18 07:20:01.260672: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib: 2023-11-18 07:20:01.260819: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib: 2023-11-18 07:20:01.260909: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib: 2023-11-18 07:20:01.260994: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib: 2023-11-18 07:20:01.261077: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib: 2023-11-18 07:20:01.261163: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib: 2023-11-18 07:20:01.261246: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib: 2023-11-18 07:20:01.261330: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib: 2023-11-18 07:20:01.261341: W tensorflow/core/common_runtime/gpu/gpu_device.cc:[1850](https://github.com/recommenders-team/recommenders/actions/runs/6912445137/job/18808189826?pr=2031#step:3:1857)] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2023-11-18 07:20:02.067266: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-11-18 07:20:02.068875: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... ------------------------------ Captured log call ------------------------------- ERROR traitlets:client.py:845 Timeout waiting for execute reply (600s).

@SimonYansenZhao this is the current error. I believe it is not related to the multiline problem

SimonYansenZhao · 2023-11-18T09:04:24Z

The pattern matching in the notebook utils now cannot extract multiline parameter values. For example the following test

recommenders/tests/unit/examples/test_notebooks_gpu.py

Lines 77 to 94 in b000b78

    
           def test_wide_deep(notebooks, output_notebook, kernel_name, tmp): 
        
               notebook_path = notebooks["wide_deep"] 
        
               # Simple test (train only 1 batch == 1 step) 
        
               model_dir = os.path.join(tmp, "wide_deep_0") 
        
               os.mkdir(model_dir) 
        
               params = { 
        
                   "MOVIELENS_DATA_SIZE": "mock100", 
        
                   "STEPS": 1, 
        
                   "EVALUATE_WHILE_TRAINING": False, 
        
                   "MODEL_DIR": model_dir, 
        
                   "EXPORT_DIR_BASE": model_dir, 
        
                   "RATING_METRICS": ["rmse"], 
        
                   "RANKING_METRICS": ["ndcg_at_k"], 
        
               } 
        
               pm.execute_notebook( 
        
                   notebook_path, output_notebook, kernel_name=kernel_name, parameters=params 
        
               )

when doing the value substitution for RANKING_METRICS in 00_quick_start/wide_deep_movielens.ipynb

RANKING_METRICS = [
    evaluator.ndcg_at_k.__name__,
    evaluator.precision_at_k.__name__,
]

will leads the following result:

RANKING_METRICS = ["ndcg_at_k"]
    evaluator.ndcg_at_k.__name__,
    evaluator.precision_at_k.__name__,
]

So the current solution is to rewrite all multiline parameters into one line. See the commit.

SimonYansenZhao · 2023-11-20T08:47:36Z

@miguelgfierro I think the pattern matching is incorrect. See the example below that uses the pattern matching in execute_notebook():
>>> import re
>>> pattern = re.compile(rf"\bmy_param\s*=\s*([^#\n]+)(?:#.*$)?", re.MULTILINE)
>>> cell_source = "\"my_param = 'abc'\n\", \"another_param = 'abc'\n\""
>>> matches = re.findall(pattern, "\"my_param = 'abc'\n\", \"another_param = 'abc'\n\"")
>>> matches
["'abc'"]
>>> cell_source.replace(matches[0].strip(), '10')
'"my_param = 10\n", "another_param = 10\n"'
All parameters whose value is 'abc' above are changed.
@SimonYansenZhao can we modularize the parameter pattern matching & replace part to pull out from execute_notebook so that we can unit tests better?

@loomlike Sure, but now we need to make all tests passed before refactoring.

miguelgfierro · 2023-11-24T11:59:53Z

there is an error, the system doesn't install cuda 11, but 12:

2023-11-18 08:08:06.357546: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 08:08:06.357591: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-11-18 08:08:09.200050: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 08:08:09.200183: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 08:08:09.200272: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 08:08:09.200354: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 08:08:09.200437: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 08:08:09.200518: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 08:08:09.200624: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 08:08:09.200711: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:


This was installed:
INFO:submit_groupwise_azureml_pytest.py: nvidia-cublas-cu12:12.1.3.1
INFO:submit_groupwise_azureml_pytest.py: nvidia-cuda-cupti-cu12:12.1.105
INFO:submit_groupwise_azureml_pytest.py: nvidia-cuda-nvrtc-cu12:12.1.105
INFO:submit_groupwise_azureml_pytest.py: nvidia-cuda-runtime-cu12:12.1.105
INFO:submit_groupwise_azureml_pytest.py: nvidia-cudnn-cu12:8.9.2.26
INFO:submit_groupwise_azureml_pytest.py: nvidia-cufft-cu12:11.0.2.54
INFO:submit_groupwise_azureml_pytest.py: nvidia-curand-cu12:10.3.2.106
INFO:submit_groupwise_azureml_pytest.py: nvidia-cusolver-cu12:11.4.5.107
INFO:submit_groupwise_azureml_pytest.py: nvidia-cusparse-cu12:12.1.0.106
INFO:submit_groupwise_azureml_pytest.py: nvidia-ml-py3:7.352.0
INFO:submit_groupwise_azureml_pytest.py: nvidia-nccl-cu12:2.18.1
INFO:submit_groupwise_azureml_pytest.py: nvidia-nvjitlink-cu12:12.3.101
INFO:submit_groupwise_azureml_pytest.py: nvidia-nvtx-cu12:12.1.105

INFO:submit_groupwise_azureml_pytest.py: tensorboard:2.8.0
INFO:submit_groupwise_azureml_pytest.py: tensorboard-data-server:0.6.1
INFO:submit_groupwise_azureml_pytest.py: tensorboard-plugin-wit:1.8.1
INFO:submit_groupwise_azureml_pytest.py: tensorflow:2.8.4
INFO:submit_groupwise_azureml_pytest.py: tensorflow-estimator:2.8.0
INFO:submit_groupwise_azureml_pytest.py: tensorflow-io-gcs-filesystem:0.34.0

INFO:submit_groupwise_azureml_pytest.py: torch:2.1.1
INFO:submit_groupwise_azureml_pytest.py: torchvision:0.16.1

I tried to nvidia-ml-py3>=7.352.0,<12, but the CUDA is still 12. See: https://github.com/recommenders-team/recommenders/actions/runs/6980488739/job/18995849434

I tried to nvidia-ml-py3>=7.352.0,<11, and removed all test except the GPU and triggered PR gate. -> same error https://github.com/recommenders-team/recommenders/actions/runs/6981304867/job/18998574212

Tried remove nvidia-ml-py3 and comment transformers from base deps -> same error https://github.com/recommenders-team/recommenders/actions/runs/6982265055/job/19001085823 it is not clear who is installing the nvidia packages

Tried to comment pytorch ->still getting installed cuda 12 https://github.com/recommenders-team/recommenders/actions/runs/6987799139/job/19014709919

Tried commenting pytorch, fastai, tfslim and leave only tensorflow==2.8.4 -> I don´t get the cuda 12 here https://github.com/recommenders-team/recommenders/actions/runs/6988238260/job/19015619226 so one of them is installing it.

Tried with TF and torch ->Torch is installing cuda12 like nvidia-cublas-cu12. https://github.com/recommenders-team/recommenders/actions/runs/6991615620/job/19022370652

Trying tensorflow==2.8.4 and torch>=1.13.1,<2 ->It installs some nvidia libs but not all that are needed: These are installed nvidia-cublas-cu11:11.10.3.66, nvidia-cuda-nvrtc-cu11:11.7.99, nvidia-cuda-runtime-cu11:11.7.99, nvidia-cudnn-cu11:8.5.0.96, but some are not like Could not load dynamic library 'libcudart.so.11.0' see https://github.com/recommenders-team/recommenders/actions/runs/6993911589/job/19027085507

Trying tensorflow==2.8.4 and torch>=1.13.1,<2 and add all nvidia-cu11 deps "nvidia-cublas-cu11","nvidia-cuda-cupti-cu11","nvidia-cuda-nvrtc-cu11","nvidia-cuda-runtime-cu11","nvidia-cudnn-cu11","nvidia-cufft-cu11","nvidia-curand-cu11","nvidia-cusolver-cu11","nvidia-cusparse-cu11","nvidia-ml-py3","nvidia-nccl-cu11","nvidia-nvjitlink-cu11","nvidia-nvtx-cu11", -> error https://github.com/recommenders-team/recommenders/actions/runs/6994198154 nvidia-nvjitlink-cu11 doesn't exist. I removed it

Try again without nvidia-nvjitlink-cu11 -> I still get the time out error. See https://github.com/recommenders-team/recommenders/actions/runs/7007905771/job/19063048904 nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 600 seconds.

Try torch>=1.13.1,<2@https://download.pytorch.org/whl/cu118 -> error: error in recommenders setup command: 'extras_require' must be a dictionary whose values are strings or lists of strings containing valid project/version requirement specifiers

Installed in local with:

        "nvidia-cublas-cu11",
        "nvidia-cuda-cupti-cu11",
        "nvidia-cuda-nvrtc-cu11",
        "nvidia-cuda-runtime-cu11",
        "nvidia-cudnn-cu11",
        "nvidia-cufft-cu11",
        "nvidia-curand-cu11",
        "nvidia-cusolver-cu11",
        "nvidia-cusparse-cu11",
        "nvidia-ml-py3",
        "nvidia-nccl-cu11",
        "nvidia-nvtx-cu11",
        "tensorflow==2.8.4",  # FIXME: Temporarily pinned due to issue with TF version > 2.10.1 See #2018
        "torch>=1.13.1,<2",

Got an the same error:

~/MS/recommenders$ pytest tests/unit/examples/test_notebooks_gpu.py::test_dkn_quickstart
============================= test session starts ==============================platform linux -- Python 3.9.18, pytest-7.4.3, pluggy-1.3.0
rootdir: /home/u/MS/recommenders
configfile: pyproject.toml
plugins: cov-4.1.0, typeguard-4.1.5, anyio-4.1.0, mock-3.12.0, hypothesis-6.91.0collected 1 item

tests/unit/examples/test_notebooks_gpu.py F                              [100%]

=================================== FAILURES ===================================_____________________________ test_dkn_quickstart ______________________________
self = <nbconvert.preprocessors.execute.ExecutePreprocessor object at 0x7fa6432c7130>
msg_id = '5e3667f3-7c5d09fe769d0a7452444f4c_11851_8'
cell = {'cell_type': 'code', 'execution_count': 7, 'metadata': {'pycharm': {'is_executing': False}, 'scrolled': True, 'execut..., 'iopub.execute_input': '2023-11-28T12:00:46.180970Z'}}, 'outputs': [], 'source': 'model.fit(train_file, valid_file)'}
timeout = 600
task_poll_output_msg = <Task pending name='Task-37' coro=<NotebookClient._async_poll_output_msg() running at /home/u/anaconda/envs/test_reco/...da/envs/test_reco/lib/python3.9/site-packages/zmq/_future.py:412, <TaskWakeupMethWrapper object at 0x7fa642661e50>()]>>
task_poll_kernel_alive = <Task cancelled name='Task-36' coro=<NotebookClient._async_poll_kernel_alive() done, defined at /home/u/anaconda/envs/test_reco/lib/python3.9/site-packages/nbclient/client.py:821>>

    async def _async_poll_for_reply(
        self,
        msg_id: str,
        cell: NotebookNode,
        timeout: int | None,
        task_poll_output_msg: asyncio.Future[t.Any],
        task_poll_kernel_alive: asyncio.Future[t.Any],
    ) -> dict[str, t.Any]:
        msg: dict[str, t.Any]
        assert self.kc is not None
        new_timeout: float | None = None
        if timeout is not None:
            deadline = monotonic() + timeout
            new_timeout = float(timeout)
        error_on_timeout_execute_reply = None
        while True:
            try:
                if error_on_timeout_execute_reply:
                    msg = error_on_timeout_execute_reply  # type:ignore[unreachable]
                    msg["parent_header"] = {"msg_id": msg_id}
                else:
>                   msg = await ensure_async(self.kc.shell_channel.get_msg(timeout=new_timeout))

../../anaconda/envs/test_reco/lib/python3.9/site-packages/nbclient/client.py:782:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../anaconda/envs/test_reco/lib/python3.9/site-packages/jupyter_core/utils/__init__.py:189: in ensure_async
    result = await obj
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <jupyter_client.channels.AsyncZMQSocketChannel object at 0x7fa643014c40>
timeout = 600000.0

    async def get_msg(  # type:ignore[override]
        self, timeout: t.Optional[float] = None
    ) -> t.Dict[str, t.Any]:
        """Gets a message if there is one that is ready."""
        assert self.socket is not None
        if timeout is not None:
            timeout *= 1000  # seconds to ms
        ready = await self.socket.poll(timeout)
        if ready:
            res = await self._recv()
            return res
        else:
>           raise Empty
E           _queue.Empty

../../anaconda/envs/test_reco/lib/python3.9/site-packages/jupyter_client/channels.py:315: Empty

During handling of the above exception, another exception occurred:

notebooks = {'als_deep_dive': '/home/u/MS/recommenders/examples/02_model_collaborative_filtering/als_deep_dive.ipynb', 'als_pyspar...aseline_deep_dive.ipynb', 'benchmark_movielens': '/home/u/MS/recommenders/examples/06_benchmarks/movielens.ipynb', ...}
output_notebook = 'output.ipynb', kernel_name = 'python3'

    @pytest.mark.notebooks
    @pytest.mark.gpu
    def test_dkn_quickstart(notebooks, output_notebook, kernel_name):
        notebook_path = notebooks["dkn_quickstart"]
>       execute_notebook(
            notebook_path,
            output_notebook,
            kernel_name=kernel_name,
            parameters=dict(EPOCHS=1, BATCH_SIZE=500),
        )

tests/unit/examples/test_notebooks_gpu.py:118:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
recommenders/utils/notebook_utils.py:107: in execute_notebook
    executed_notebook, _ = execute_preprocessor.preprocess(
../../anaconda/envs/test_reco/lib/python3.9/site-packages/nbconvert/preprocessors/execute.py:102: in preprocess
    self.preprocess_cell(cell, resources, index)
../../anaconda/envs/test_reco/lib/python3.9/site-packages/nbconvert/preprocessors/execute.py:123: in preprocess_cell
    cell = self.execute_cell(cell, index, store_history=True)
../../anaconda/envs/test_reco/lib/python3.9/site-packages/jupyter_core/utils/__init__.py:173: in wrapped
    return loop.run_until_complete(inner)
../../anaconda/envs/test_reco/lib/python3.9/asyncio/base_events.py:647: in run_until_complete
    return future.result()
../../anaconda/envs/test_reco/lib/python3.9/site-packages/nbclient/client.py:1005: in async_execute_cell
    exec_reply = await self.task_poll_for_reply
../../anaconda/envs/test_reco/lib/python3.9/site-packages/nbclient/client.py:806: in _async_poll_for_reply
    error_on_timeout_execute_reply = await self._async_handle_timeout(timeout, cell)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <nbconvert.preprocessors.execute.ExecutePreprocessor object at 0x7fa6432c7130>
timeout = 600
cell = {'cell_type': 'code', 'execution_count': 7, 'metadata': {'pycharm': {'is_executing': False}, 'scrolled': True, 'execut..., 'iopub.execute_input': '2023-11-28T12:00:46.180970Z'}}, 'outputs': [], 'source': 'model.fit(train_file, valid_file)'}

    async def _async_handle_timeout(
        self, timeout: int, cell: NotebookNode | None = None
    ) -> None | dict[str, t.Any]:
        self.log.error("Timeout waiting for execute reply (%is)." % timeout)
        if self.interrupt_on_timeout:
            self.log.error("Interrupting kernel")
            assert self.km is not None
            await ensure_async(self.km.interrupt_kernel())
            if self.error_on_timeout:
                execute_reply = {"content": {**self.error_on_timeout, "status": "error"}}
                return execute_reply
            return None
        else:
            assert cell is not None
>           raise CellTimeoutError.error_from_timeout_and_cell(
                "Cell execution timed out", timeout, cell
            )
E           nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 600 seconds.
E           The message was: Cell execution timed out.
E           Here is a preview of the cell contents:
E           -------------------
E           model.fit(train_file, valid_file)
E           -------------------

../../anaconda/envs/test_reco/lib/python3.9/site-packages/nbclient/client.py:856: CellTimeoutError
----------------------------- Captured stderr call -----------------------------2023-11-28 13:00:18.703631: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-11-28 13:00:18.774212: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2023-11-28 13:00:18.786748: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2023-11-28 13:00:19.809460: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-28 13:00:19.815773: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-11-28 13:00:19.815805: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2023-11-28 13:00:21.593795: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 300000000 exceeds 10% of free system memory.
2023-11-28 13:00:23.448917: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 300000000 exceeds 10% of free system memory.
2023-11-28 13:00:25.435549: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 300000000 exceeds 10% of free system memory.
2023-11-28 13:00:26.060217: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 300000000 exceeds 10% of free system memory.
2023-11-28 13:00:26.659435: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 300000000 exceeds 10% of free system memory.
------------------------------ Captured log call -------------------------------ERROR    traitlets:client.py:845 Timeout waiting for execute reply (600s).
=============================== warnings summary ===============================../../anaconda/envs/test_reco/lib/python3.9/site-packages/jupyter_client/connect.py:22
  /home/u/anaconda/envs/test_reco/lib/python3.9/site-packages/jupyter_client/connect.py:22: DeprecationWarning: Jupyter is migrating its paths to use standard platformdirs
  given by the platformdirs library.  To remove this warning and
  see the appropriate new directories, set the environment variable
  `JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
  The use of platformdirs will be the default in `jupyter_core` v6
    from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================FAILED tests/unit/examples/test_notebooks_gpu.py::test_dkn_quickstart - nbclient.exceptions.CellTimeoutError: A cell timed out while it was being e...

Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com>

miguelgfierro added 10 commits October 30, 2023 22:59

Remove scrapbook and papermill deps

51a2831

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

notebook utils programmatic execution

964bca0

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

Test notebook programmatic

ab63e1c

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

Added test notebook for utils

1ac3023

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

data notebooks

6a03c42

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

Replace papermill and scrapbook for new internal function

e1c2b63

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

Replace papermill and scrapbook for new internal function

70c068e

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

Update new programmatic execution code

54c2278

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

Update new programmatic execution code

055e5f0

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

Update notebooks with new utility

e60008a

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

miguelgfierro requested review from gramhagen, anargyri, loomlike, wutaomsft and SimonYansenZhao as code owners October 31, 2023 11:43

🐛

1174258

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

Issue with xDeepFM WIP

5a4bd35

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

🐛

ee344e3

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

loomlike approved these changes Nov 1, 2023

View reviewed changes

recommenders/utils/notebook_utils.py Outdated Show resolved Hide resolved

recommenders/utils/notebook_utils.py Outdated Show resolved Hide resolved

recommenders/utils/notebook_utils.py Show resolved Hide resolved

tests/unit/recommenders/utils/test_notebook_utils.py Show resolved Hide resolved

miguelgfierro and others added 6 commits November 3, 2023 22:10

🐛

922011c

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

Document the tests in programmatic notebook

f7b5fdf

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

📝

c2d9d13

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

WIP

397555c

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

WIP

58cbcef

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>

Import missing store_metadata

ddf1b10

Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com>

loomlike mentioned this pull request Nov 13, 2023

Staging to main: Fix bug in MAP and added new notebook programmatic execution #2035

Merged

4 tasks

Correct pattern matching and substitution

bd9573e

Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com>

Merge multiline parameters into one line

8c6aaed

Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com>

miguelgfierro commented Nov 18, 2023

View reviewed changes

Increase timeout

4992cb6

Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com>

miguelgfierro merged commit b57cec2 into staging Dec 18, 2023
20 checks passed

miguelgfierro deleted the miguel/programmatic_execution_notebook branch December 18, 2023 15:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Programmatic execution of notebooks #2031

Programmatic execution of notebooks #2031

miguelgfierro commented Oct 31, 2023

miguelgfierro commented Oct 31, 2023 •

edited

Loading

loomlike commented Oct 31, 2023 •

edited

Loading

miguelgfierro commented Nov 1, 2023

loomlike left a comment

SimonYansenZhao commented Nov 13, 2023 •

edited

Loading

SimonYansenZhao commented Nov 14, 2023

loomlike commented Nov 14, 2023 •

edited

Loading

miguelgfierro Nov 18, 2023

miguelgfierro Nov 18, 2023

SimonYansenZhao commented Nov 18, 2023

SimonYansenZhao commented Nov 20, 2023

miguelgfierro commented Nov 24, 2023 •

edited

Loading

Programmatic execution of notebooks #2031

Programmatic execution of notebooks #2031

Conversation

miguelgfierro commented Oct 31, 2023

Description

Related Issues

References

Checklist:

miguelgfierro commented Oct 31, 2023 • edited Loading

loomlike commented Oct 31, 2023 • edited Loading

miguelgfierro commented Nov 1, 2023

loomlike left a comment

Choose a reason for hiding this comment

SimonYansenZhao commented Nov 13, 2023 • edited Loading

SimonYansenZhao commented Nov 14, 2023

loomlike commented Nov 14, 2023 • edited Loading

miguelgfierro Nov 18, 2023

Choose a reason for hiding this comment

miguelgfierro Nov 18, 2023

Choose a reason for hiding this comment

SimonYansenZhao commented Nov 18, 2023

SimonYansenZhao commented Nov 20, 2023

miguelgfierro commented Nov 24, 2023 • edited Loading

miguelgfierro commented Oct 31, 2023 •

edited

Loading

loomlike commented Oct 31, 2023 •

edited

Loading

SimonYansenZhao commented Nov 13, 2023 •

edited

Loading

loomlike commented Nov 14, 2023 •

edited

Loading

miguelgfierro commented Nov 24, 2023 •

edited

Loading