Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Programmatic execution of notebooks #2031

Merged
merged 22 commits into from
Dec 18, 2023

Conversation

miguelgfierro
Copy link
Collaborator

Description

This PR removes papermill and scrapbook and adds the same functionality

Related Issues

Fixes #2012

References

Checklist:

  • I have followed the contribution guidelines and code style for this project.
  • I have added tests covering my contributions.
  • I have updated the documentation accordingly.
  • This PR is being made to staging branch and not to main branch.

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
@miguelgfierro
Copy link
Collaborator Author

miguelgfierro commented Oct 31, 2023

Weird error. The input is 100k, but the regex parser inputs 10k

tests/functional/examples/test_notebooks_gpu.py F                                                                                                                     [100%]

================================================================================= FAILURES ==================================================================================
______________________________________________________ test_ncf_deep_dive_functional[100k-10-512-expected_values0-42] _______________________________________________________

notebooks = {'als_deep_dive': '/home/u/MS/recommenders/examples/02_model_collaborative_filtering/als_deep_dive.ipynb', 'als_pyspar...aseline_deep_dive.ipynb', 'benchmark_movielens': '/home/u/MS/recommenders/examples/06_benchmarks/movielens.ipynb', ...}
output_notebook = 'output.ipynb', kernel_name = 'python3', size = '100k', epochs = 10, batch_size = 512
expected_values = {'map': 0.0435856, 'map2': 0.0510391, 'ndcg': 0.37586, 'ndcg2': 0.202186, ...}, seed = 42

    @pytest.mark.gpu
    @pytest.mark.notebooks
    @pytest.mark.parametrize(
        "size, epochs, batch_size, expected_values, seed",
        [
            (
                "100k",
                10,
                512,
                {
                    "map": 0.0435856,
                    "ndcg": 0.37586,
                    "precision": 0.169353,
                    "recall": 0.0923963,
                    "map2": 0.0510391,
                    "ndcg2": 0.202186,
                    "precision2": 0.179533,
                    "recall2": 0.106434,
                },
                42,
            )
        ],
    )
    def test_ncf_deep_dive_functional(
        notebooks,
        output_notebook,
        kernel_name,
        size,
        epochs,
        batch_size,
        expected_values,
        seed,
    ):
        notebook_path = notebooks["ncf_deep_dive"]
>       execute_notebook(
            notebook_path,
            output_notebook,
            kernel_name=kernel_name,
            parameters=dict(
                TOP_K=10,
                MOVIELENS_DATA_SIZE=size,
                EPOCHS=epochs,
                BATCH_SIZE=batch_size,
                SEED=seed,
            ),
        )

tests/functional/examples/test_notebooks_gpu.py:91:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
recommenders/utils/notebook_utils.py:99: in execute_notebook
    executed_notebook, _ = execute_preprocessor.preprocess(
../../anaconda/envs/recommenders/lib/python3.9/site-packages/nbconvert/preprocessors/execute.py:100: in preprocess
    self.preprocess_cell(cell, resources, index)
../../anaconda/envs/recommenders/lib/python3.9/site-packages/nbconvert/preprocessors/execute.py:121: in preprocess_cell
    cell = self.execute_cell(cell, index, store_history=True)
../../anaconda/envs/recommenders/lib/python3.9/site-packages/jupyter_core/utils/__init__.py:166: in wrapped
    return loop.run_until_complete(inner)
../../anaconda/envs/recommenders/lib/python3.9/asyncio/base_events.py:647: in run_until_complete
    return future.result()
../../anaconda/envs/recommenders/lib/python3.9/site-packages/nbclient/client.py:1058: in async_execute_cell
    await self._check_raise_for_error(cell, cell_index, exec_reply)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <nbconvert.preprocessors.execute.ExecutePreprocessor object at 0x7fc7855dc6d0>
cell = {'cell_type': 'code', 'execution_count': 3, 'metadata': {'execution': {'iopub.status.busy': '2023-10-31T16:26:54.66711...oad_pandas_df(\n    size=MOVIELENS_DATA_SIZE,\n    header=["userID", "itemID", "rating", "timestamp"]\n)\n\ndf.head()'}
cell_index = 9
exec_reply = {'buffers': [], 'content': {'ename': 'ValueError', 'engine_info': {'engine_id': -1, 'engine_uuid': 'e1defabf-6d1f-40f2...e, 'engine': 'e1defabf-6d1f-40f2-a86b-3533e758ecca', 'started': '2023-10-31T16:26:54.667305Z', 'status': 'error'}, ...}

    async def _check_raise_for_error(
        self, cell: NotebookNode, cell_index: int, exec_reply: t.Optional[t.Dict]
    ) -> None:
        if exec_reply is None:
            return None

        exec_reply_content = exec_reply['content']
        if exec_reply_content['status'] != 'error':
            return None

        cell_allows_errors = (not self.force_raise_errors) and (
            self.allow_errors
            or exec_reply_content.get('ename') in self.allow_error_names
            or "raises-exception" in cell.metadata.get("tags", [])
        )
        await run_hook(
            self.on_cell_error, cell=cell, cell_index=cell_index, execute_reply=exec_reply
        )
        if not cell_allows_errors:
>           raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)
E           nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
E           ------------------
E           df = movielens.load_pandas_df(
E               size=MOVIELENS_DATA_SIZE,
E               header=["userID", "itemID", "rating", "timestamp"]
E           )
E
E           df.head()
E           ------------------
E
E
E           ---------------------------------------------------------------------------
E           ValueError                                Traceback (most recent call last)
E           Cell In[3], line 1
E           ----> 1 df = movielens.load_pandas_df(
E                 2     size=MOVIELENS_DATA_SIZE,
E                 3     header=["userID", "itemID", "rating", "timestamp"]
E                 4 )
E                 6 df.head()
E
E           File ~/MS/recommenders/recommenders/datasets/movielens.py:201, in load_pandas_df(size, header, local_cache_path, title_col, genres_col, year_col)
E               199 size = size.lower()
E               200 if size not in DATA_FORMAT and size not in MOCK_DATA_FORMAT:
E           --> 201     raise ValueError(f"Size: {size}. " + ERROR_MOVIE_LENS_SIZE)
E               203 if header is None:
E               204     header = DEFAULT_HEADER
E
E           ValueError: Size: 10k. Invalid data size. Should be one of {100k, 1m, 10m, or 20m, or mock100}

../../anaconda/envs/recommenders/lib/python3.9/site-packages/nbclient/client.py:914: CellExecutionError
============================================================================= warnings summary ==============================================================================
../../anaconda/envs/recommenders/lib/python3.9/site-packages/jupyter_client/connect.py:20
  /home/u/anaconda/envs/recommenders/lib/python3.9/site-packages/jupyter_client/connect.py:20: DeprecationWarning: Jupyter is migrating its paths to use standard platformdirs
  given by the platformdirs library.  To remove this warning and
  see the appropriate new directories, set the environment variable
  `JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
  The use of platformdirs will be the default in `jupyter_core` v6
    from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================== short test summary info ==========================================================================
FAILED tests/functional/examples/test_notebooks_gpu.py::test_ncf_deep_dive_functional[100k-10-512-expected_values0-42] - nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
======================================================================= 1 failed, 1 warning in 5.23s ========================================================================

Similar error but with a different notebook:

    @pytest.mark.notebooks
    @pytest.mark.experimental
    def test_rlrmc_quickstart_runs(notebooks, output_notebook, kernel_name):
        notebook_path = notebooks["rlrmc_quickstart"]
>       execute_notebook(
            notebook_path,
            output_notebook,
            kernel_name=kernel_name,
            parameters=dict(rank_parameter=2, MOVIELENS_DATA_SIZE="mock100"),
        )

tests/unit/examples/test_notebooks_python.py:88: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
recommenders/utils/notebook_utils.py:99: in execute_notebook
    executed_notebook, _ = execute_preprocessor.preprocess(
/azureml-envs/azureml_8854b0bdccc7bb7425b7c3f2145bc96f/lib/python3.9/site-packages/nbconvert/preprocessors/execute.py:102: in preprocess
    self.preprocess_cell(cell, resources, index)
/azureml-envs/azureml_8854b0bdccc7bb7425b7c3f2145bc96f/lib/python3.9/site-packages/nbconvert/preprocessors/execute.py:123: in preprocess_cell
    cell = self.execute_cell(cell, index, store_history=True)
/azureml-envs/azureml_8854b0bdccc7bb7425b7c3f2145bc96f/lib/python3.9/site-packages/jupyter_core/utils/__init__.py:173: in wrapped
    return loop.run_until_complete(inner)
/azureml-envs/azureml_8854b0bdccc7bb7425b7c3f2145bc96f/lib/python3.9/asyncio/base_events.py:647: in run_until_complete
    return future.result()
/azureml-envs/azureml_8854b0bdccc7bb7425b7c3f2145bc96f/lib/python3.9/site-packages/nbclient/client.py:1058: in async_execute_cell
    await self._check_raise_for_error(cell, cell_index, exec_reply)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <nbconvert.preprocessors.execute.ExecutePreprocessor object at 0x14e3f8053940>
cell = ***'cell_type': 'code', 'execution_count': 4, 'metadata': ***'execution': ***'iopub.status.busy': '2023-10-31T16:49:29.42054...= movielens.load_pandas_df(\n    size=MOVIELENS_DATA_SIZE,\n    header=["userID", "itemID", "rating", "timestamp"]\n)'***
cell_index = 7
exec_reply = ***'buffers': [], 'content': ***'ename': 'ValueError', 'engine_info': ***'engine_id': -1, 'engine_uuid': '79883de8-cb38-47df...e, 'engine': '79883de8-cb38-47df-a3b2-e63115050117', 'started': '2023-10-31T16:49:29.420956Z', 'status': 'error'***, ...***

    async def _check_raise_for_error(
        self, cell: NotebookNode, cell_index: int, exec_reply: t.Optional[t.Dict]
    ) -> None:
        if exec_reply is None:
            return None
    
        exec_reply_content = exec_reply['content']
        if exec_reply_content['status'] != 'error':
            return None
    
        cell_allows_errors = (not self.force_raise_errors) and (
            self.allow_errors
            or exec_reply_content.get('ename') in self.allow_error_names
            or "raises-exception" in cell.metadata.get("tags", [])
        )
        await run_hook(
            self.on_cell_error, cell=cell, cell_index=cell_index, execute_reply=exec_reply
        )
        if not cell_allows_errors:
>           raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)
E           nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
E           ------------------
E           
E           df = movielens.load_pandas_df(
E               size=MOVIELENS_DATA_SIZE,
E               header=["userID", "itemID", "rating", "timestamp"]
E           )
E           ------------------
E           
E           
E           ---------------------------------------------------------------------------
E           ValueError                                Traceback (most recent call last)
E           Cell In[4], line 1
E           ----> 1 df = movielens.load_pandas_df(
E                 2 size=MOVIELENS_DATA_SIZE,
E                 3 header=["userID","itemID","rating","timestamp"]
E                 4 )
E           
E           File /mnt/azureml/cr/j/f1d53f64bdb5410196f4cc9b6e069605/exe/wd/recommenders/datasets/movielens.py:201, in load_pandas_df(size, header, local_cache_path, title_col, genres_col, year_col)
E               199 size = size.lower()
E               200 if size not in DATA_FORMAT and size not in MOCK_DATA_FORMAT:
E           --> 201     raise ValueError(f"Size: ***size***. " + ERROR_MOVIE_LENS_SIZE)
E               203 if header is None:
E               204     header = DEFAULT_HEADER
E           
E           ValueError: Size: 2m. Invalid data size. Should be one of ***100k, 1m, 10m, or 20m, or mock100***

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
@loomlike
Copy link
Collaborator

loomlike commented Oct 31, 2023

@miguelgfierro Sorry to ask dumb question as I missed the discussion, but why do we reinvent the wheel here? Couldn't papermill to execute the notebook (which seems to be still actively developed, not like the scrapbook) + other open-sourced recording packages, e.g. mlflow for recording and verifying metrics? Actually mlflow recording codes will show a good example for recording metrics too in our notebooks...

Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
@miguelgfierro
Copy link
Collaborator Author

It seems that papermill is also not maintained: https://pypi.org/project/papermill/#history. They haven't updated it in over a year. MLFlow for recording is an interesting idea, the only problem would be that we would add another dependency. One of the reasons to do this from scratch is to reduce dependencies.

This code doesn't add any new dependency and it will allow us to do the same functionality we had. If in the future, there is an appetite to change the recording of the data with MLFlow, we can add it.

Copy link
Collaborator

@loomlike loomlike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes. These are huge changes and I really appreciate your hard work! I left few comments that are not critical, so feel free to fix them or leave it for later.

I think at some point, we'll want to split this "notebook util" into a separate project/package because of two reasons: 1) it's not relevant to "recommenders" 2) this utility is super useful for any DS projects that has notebook examples and it will be very beneficial for them to use the utility.

recommenders/utils/notebook_utils.py Outdated Show resolved Hide resolved
recommenders/utils/notebook_utils.py Outdated Show resolved Hide resolved
recommenders/utils/notebook_utils.py Show resolved Hide resolved
miguelgfierro and others added 6 commits November 3, 2023 22:10
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com>
@SimonYansenZhao
Copy link
Collaborator

SimonYansenZhao commented Nov 13, 2023

@miguelgfierro I think the pattern matching is incorrect. See the example below that uses the pattern matching in execute_notebook():

>>> import re
>>> pattern = re.compile(rf"\bmy_param\s*=\s*([^#\n]+)(?:#.*$)?", re.MULTILINE)
>>> cell_source = "\"my_param = 'abc'\n\", \"another_param = 'abc'\n\""
>>> matches = re.findall(pattern, "\"my_param = 'abc'\n\", \"another_param = 'abc'\n\"")
>>> matches
["'abc'"]
>>> cell_source.replace(matches[0].strip(), '10')
'"my_param = 10\n", "another_param = 10\n"'

All parameters whose value is 'abc' above are changed.

Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com>
@SimonYansenZhao
Copy link
Collaborator

@miguelgfierro I fixed the pattern matching bug. Now a new error is catched. I'll take a look the day after tomorrow.

@loomlike
Copy link
Collaborator

loomlike commented Nov 14, 2023

@miguelgfierro I think the pattern matching is incorrect. See the example below that uses the pattern matching in execute_notebook():

>>> import re
>>> pattern = re.compile(rf"\bmy_param\s*=\s*([^#\n]+)(?:#.*$)?", re.MULTILINE)
>>> cell_source = "\"my_param = 'abc'\n\", \"another_param = 'abc'\n\""
>>> matches = re.findall(pattern, "\"my_param = 'abc'\n\", \"another_param = 'abc'\n\"")
>>> matches
["'abc'"]
>>> cell_source.replace(matches[0].strip(), '10')
'"my_param = 10\n", "another_param = 10\n"'

All parameters whose value is 'abc' above are changed.

@SimonYansenZhao can we modularize the parameter pattern matching & replace part to pull out from execute_notebook so that we can unit tests better?

Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com>
"import tensorflow as tf\n",
"tf.get_logger().setLevel(\"ERROR\") # only show error messages\n",
"tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)\n",
"\n",
"from recommenders.models.deeprec.deeprec_utils import download_deeprec_resources, prepare_hparams\n",
"from recommenders.models.deeprec.models.dkn import DKN\n",
"from recommenders.models.deeprec.io.dkn_iterator import DKNTextIterator\n",
"from recommenders.utils.notebook_utils import store_metadata\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a weird error in the DKN notebook. It's a timeout. I have never seen this error.

It might be related to a bad configuration of CUDA (see below)? Let me rerun that test.

@pytest.mark.notebooks
    @pytest.mark.gpu
    def test_dkn_quickstart(notebooks, output_notebook, kernel_name):
        notebook_path = notebooks["dkn_quickstart"]
>       execute_notebook(
            notebook_path,
            output_notebook,
            kernel_name=kernel_name,
            parameters=dict(EPOCHS=1, BATCH_SIZE=500),
        )

tests/unit/examples/test_notebooks_gpu.py:118: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
recommenders/utils/notebook_utils.py:107: in execute_notebook
    executed_notebook, _ = execute_preprocessor.preprocess(
/azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib/python3.8/site-packages/nbconvert/preprocessors/execute.py:102: in preprocess
    self.preprocess_cell(cell, resources, index)
/azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib/python3.8/site-packages/nbconvert/preprocessors/execute.py:123: in preprocess_cell
    cell = self.execute_cell(cell, index, store_history=True)
/azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib/python3.8/site-packages/jupyter_core/utils/__init__.py:173: in wrapped
    return loop.run_until_complete(inner)
/azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib/python3.8/asyncio/base_events.py:616: in run_until_complete
    return future.result()
/azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib/python3.8/site-packages/nbclient/client.py:1005: in async_execute_cell
    exec_reply = await self.task_poll_for_reply
/azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib/python3.8/site-packages/nbclient/client.py:806: in _async_poll_for_reply
    error_on_timeout_execute_reply = await self._async_handle_timeout(timeout, cell)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <nbconvert.preprocessors.execute.ExecutePreprocessor object at 0x152370dcb400>
timeout = 600
cell = ***'cell_type': 'code', 'execution_count': 7, 'metadata': ***'pycharm': ***'is_executing': False***, 'scrolled': True, 'execut...\x1b[49m\x1b[43m)\x1b[49m\n', '\x1b[0;31mKeyboardInterrupt\x1b[0m: ']***], 'source': 'model.fit(train_file, valid_file)'***

    async def _async_handle_timeout(
        self, timeout: int, cell: NotebookNode | None = None
    ) -> None | dict[str, t.Any]:
        self.log.error("Timeout waiting for execute reply (%is)." % timeout)
        if self.interrupt_on_timeout:
            self.log.error("Interrupting kernel")
            assert self.km is not None
            await ensure_async(self.km.interrupt_kernel())
            if self.error_on_timeout:
                execute_reply = ***"content": *****self.error_on_timeout, "status": "error"***
                return execute_reply
            return None
        else:
            assert cell is not None
>           raise CellTimeoutError.error_from_timeout_and_cell(
                "Cell execution timed out", timeout, cell
            )
E           nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 600 seconds.
E           The message was: Cell execution timed out.
E           Here is a preview of the cell contents:
E           -------------------
E           model.fit(train_file, valid_file)
E           -------------------

/azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib/python3.8/site-packages/nbclient/client.py:856: CellTimeoutError
----------------------------- Captured stdout call -----------------------------
ERROR:traitlets:Timeout waiting for execute reply (600s).
----------------------------- Captured stderr call -----------------------------
2023-11-18 07:19:58.273399: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 07:19:58.273444: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-11-18 07:20:01.260672: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 07:20:01.260819: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 07:20:01.260909: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 07:20:01.260994: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 07:20:01.261077: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 07:20:01.261163: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 07:20:01.261246: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 07:20:01.261330: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 07:20:01.261341: W tensorflow/core/common_runtime/gpu/gpu_device.cc:[1850](https://github.com/recommenders-team/recommenders/actions/runs/6912445137/job/18808189826?pr=2031#step:3:1857)] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2023-11-18 07:20:02.067266: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-18 07:20:02.068875: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
------------------------------ Captured log call -------------------------------
ERROR    traitlets:client.py:845 Timeout waiting for execute reply (600s).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SimonYansenZhao this is the current error. I believe it is not related to the multiline problem

@SimonYansenZhao
Copy link
Collaborator

The pattern matching in the notebook utils now cannot extract multiline parameter values. For example the following test

def test_wide_deep(notebooks, output_notebook, kernel_name, tmp):
notebook_path = notebooks["wide_deep"]
# Simple test (train only 1 batch == 1 step)
model_dir = os.path.join(tmp, "wide_deep_0")
os.mkdir(model_dir)
params = {
"MOVIELENS_DATA_SIZE": "mock100",
"STEPS": 1,
"EVALUATE_WHILE_TRAINING": False,
"MODEL_DIR": model_dir,
"EXPORT_DIR_BASE": model_dir,
"RATING_METRICS": ["rmse"],
"RANKING_METRICS": ["ndcg_at_k"],
}
pm.execute_notebook(
notebook_path, output_notebook, kernel_name=kernel_name, parameters=params
)

when doing the value substitution for RANKING_METRICS in 00_quick_start/wide_deep_movielens.ipynb

RANKING_METRICS = [
    evaluator.ndcg_at_k.__name__,
    evaluator.precision_at_k.__name__,
]

will leads the following result:

RANKING_METRICS = ["ndcg_at_k"]
    evaluator.ndcg_at_k.__name__,
    evaluator.precision_at_k.__name__,
]

So the current solution is to rewrite all multiline parameters into one line. See the commit.

@SimonYansenZhao
Copy link
Collaborator

@miguelgfierro I think the pattern matching is incorrect. See the example below that uses the pattern matching in execute_notebook():

>>> import re
>>> pattern = re.compile(rf"\bmy_param\s*=\s*([^#\n]+)(?:#.*$)?", re.MULTILINE)
>>> cell_source = "\"my_param = 'abc'\n\", \"another_param = 'abc'\n\""
>>> matches = re.findall(pattern, "\"my_param = 'abc'\n\", \"another_param = 'abc'\n\"")
>>> matches
["'abc'"]
>>> cell_source.replace(matches[0].strip(), '10')
'"my_param = 10\n", "another_param = 10\n"'

All parameters whose value is 'abc' above are changed.

@SimonYansenZhao can we modularize the parameter pattern matching & replace part to pull out from execute_notebook so that we can unit tests better?

@loomlike Sure, but now we need to make all tests passed before refactoring.

@miguelgfierro
Copy link
Collaborator Author

miguelgfierro commented Nov 24, 2023

there is an error, the system doesn't install cuda 11, but 12:

2023-11-18 08:08:06.357546: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 08:08:06.357591: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-11-18 08:08:09.200050: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 08:08:09.200183: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 08:08:09.200272: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 08:08:09.200354: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 08:08:09.200437: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 08:08:09.200518: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 08:08:09.200624: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:
2023-11-18 08:08:09.200711: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_34c56b1c46d7f5ae137d78e9a4192235/lib:


This was installed:
INFO:submit_groupwise_azureml_pytest.py: nvidia-cublas-cu12:12.1.3.1
INFO:submit_groupwise_azureml_pytest.py: nvidia-cuda-cupti-cu12:12.1.105
INFO:submit_groupwise_azureml_pytest.py: nvidia-cuda-nvrtc-cu12:12.1.105
INFO:submit_groupwise_azureml_pytest.py: nvidia-cuda-runtime-cu12:12.1.105
INFO:submit_groupwise_azureml_pytest.py: nvidia-cudnn-cu12:8.9.2.26
INFO:submit_groupwise_azureml_pytest.py: nvidia-cufft-cu12:11.0.2.54
INFO:submit_groupwise_azureml_pytest.py: nvidia-curand-cu12:10.3.2.106
INFO:submit_groupwise_azureml_pytest.py: nvidia-cusolver-cu12:11.4.5.107
INFO:submit_groupwise_azureml_pytest.py: nvidia-cusparse-cu12:12.1.0.106
INFO:submit_groupwise_azureml_pytest.py: nvidia-ml-py3:7.352.0
INFO:submit_groupwise_azureml_pytest.py: nvidia-nccl-cu12:2.18.1
INFO:submit_groupwise_azureml_pytest.py: nvidia-nvjitlink-cu12:12.3.101
INFO:submit_groupwise_azureml_pytest.py: nvidia-nvtx-cu12:12.1.105

INFO:submit_groupwise_azureml_pytest.py: tensorboard:2.8.0
INFO:submit_groupwise_azureml_pytest.py: tensorboard-data-server:0.6.1
INFO:submit_groupwise_azureml_pytest.py: tensorboard-plugin-wit:1.8.1
INFO:submit_groupwise_azureml_pytest.py: tensorflow:2.8.4
INFO:submit_groupwise_azureml_pytest.py: tensorflow-estimator:2.8.0
INFO:submit_groupwise_azureml_pytest.py: tensorflow-io-gcs-filesystem:0.34.0

INFO:submit_groupwise_azureml_pytest.py: torch:2.1.1
INFO:submit_groupwise_azureml_pytest.py: torchvision:0.16.1

I tried to nvidia-ml-py3>=7.352.0,<12, but the CUDA is still 12. See: https://github.com/recommenders-team/recommenders/actions/runs/6980488739/job/18995849434

I tried to nvidia-ml-py3>=7.352.0,<11, and removed all test except the GPU and triggered PR gate. -> same error https://github.com/recommenders-team/recommenders/actions/runs/6981304867/job/18998574212

Tried remove nvidia-ml-py3 and comment transformers from base deps -> same error https://github.com/recommenders-team/recommenders/actions/runs/6982265055/job/19001085823 it is not clear who is installing the nvidia packages

Tried to comment pytorch ->still getting installed cuda 12 https://github.com/recommenders-team/recommenders/actions/runs/6987799139/job/19014709919

Tried commenting pytorch, fastai, tfslim and leave only tensorflow==2.8.4 -> I don´t get the cuda 12 here https://github.com/recommenders-team/recommenders/actions/runs/6988238260/job/19015619226 so one of them is installing it.

Tried with TF and torch ->Torch is installing cuda12 like nvidia-cublas-cu12. https://github.com/recommenders-team/recommenders/actions/runs/6991615620/job/19022370652

Trying tensorflow==2.8.4 and torch>=1.13.1,<2 ->It installs some nvidia libs but not all that are needed: These are installed nvidia-cublas-cu11:11.10.3.66, nvidia-cuda-nvrtc-cu11:11.7.99, nvidia-cuda-runtime-cu11:11.7.99, nvidia-cudnn-cu11:8.5.0.96, but some are not like Could not load dynamic library 'libcudart.so.11.0' see https://github.com/recommenders-team/recommenders/actions/runs/6993911589/job/19027085507

Trying tensorflow==2.8.4 and torch>=1.13.1,<2 and add all nvidia-cu11 deps "nvidia-cublas-cu11","nvidia-cuda-cupti-cu11","nvidia-cuda-nvrtc-cu11","nvidia-cuda-runtime-cu11","nvidia-cudnn-cu11","nvidia-cufft-cu11","nvidia-curand-cu11","nvidia-cusolver-cu11","nvidia-cusparse-cu11","nvidia-ml-py3","nvidia-nccl-cu11","nvidia-nvjitlink-cu11","nvidia-nvtx-cu11", -> error https://github.com/recommenders-team/recommenders/actions/runs/6994198154 nvidia-nvjitlink-cu11 doesn't exist. I removed it

Try again without nvidia-nvjitlink-cu11 -> I still get the time out error. See https://github.com/recommenders-team/recommenders/actions/runs/7007905771/job/19063048904 nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 600 seconds.

Try torch>=1.13.1,<2@https://download.pytorch.org/whl/cu118 -> error: error in recommenders setup command: 'extras_require' must be a dictionary whose values are strings or lists of strings containing valid project/version requirement specifiers

Installed in local with:

        "nvidia-cublas-cu11",
        "nvidia-cuda-cupti-cu11",
        "nvidia-cuda-nvrtc-cu11",
        "nvidia-cuda-runtime-cu11",
        "nvidia-cudnn-cu11",
        "nvidia-cufft-cu11",
        "nvidia-curand-cu11",
        "nvidia-cusolver-cu11",
        "nvidia-cusparse-cu11",
        "nvidia-ml-py3",
        "nvidia-nccl-cu11",
        "nvidia-nvtx-cu11",
        "tensorflow==2.8.4",  # FIXME: Temporarily pinned due to issue with TF version > 2.10.1 See #2018
        "torch>=1.13.1,<2",

Got an the same error:

~/MS/recommenders$ pytest tests/unit/examples/test_notebooks_gpu.py::test_dkn_quickstart
============================= test session starts ==============================platform linux -- Python 3.9.18, pytest-7.4.3, pluggy-1.3.0
rootdir: /home/u/MS/recommenders
configfile: pyproject.toml
plugins: cov-4.1.0, typeguard-4.1.5, anyio-4.1.0, mock-3.12.0, hypothesis-6.91.0collected 1 item

tests/unit/examples/test_notebooks_gpu.py F                              [100%]

=================================== FAILURES ===================================_____________________________ test_dkn_quickstart ______________________________
self = <nbconvert.preprocessors.execute.ExecutePreprocessor object at 0x7fa6432c7130>
msg_id = '5e3667f3-7c5d09fe769d0a7452444f4c_11851_8'
cell = {'cell_type': 'code', 'execution_count': 7, 'metadata': {'pycharm': {'is_executing': False}, 'scrolled': True, 'execut..., 'iopub.execute_input': '2023-11-28T12:00:46.180970Z'}}, 'outputs': [], 'source': 'model.fit(train_file, valid_file)'}
timeout = 600
task_poll_output_msg = <Task pending name='Task-37' coro=<NotebookClient._async_poll_output_msg() running at /home/u/anaconda/envs/test_reco/...da/envs/test_reco/lib/python3.9/site-packages/zmq/_future.py:412, <TaskWakeupMethWrapper object at 0x7fa642661e50>()]>>
task_poll_kernel_alive = <Task cancelled name='Task-36' coro=<NotebookClient._async_poll_kernel_alive() done, defined at /home/u/anaconda/envs/test_reco/lib/python3.9/site-packages/nbclient/client.py:821>>

    async def _async_poll_for_reply(
        self,
        msg_id: str,
        cell: NotebookNode,
        timeout: int | None,
        task_poll_output_msg: asyncio.Future[t.Any],
        task_poll_kernel_alive: asyncio.Future[t.Any],
    ) -> dict[str, t.Any]:
        msg: dict[str, t.Any]
        assert self.kc is not None
        new_timeout: float | None = None
        if timeout is not None:
            deadline = monotonic() + timeout
            new_timeout = float(timeout)
        error_on_timeout_execute_reply = None
        while True:
            try:
                if error_on_timeout_execute_reply:
                    msg = error_on_timeout_execute_reply  # type:ignore[unreachable]
                    msg["parent_header"] = {"msg_id": msg_id}
                else:
>                   msg = await ensure_async(self.kc.shell_channel.get_msg(timeout=new_timeout))

../../anaconda/envs/test_reco/lib/python3.9/site-packages/nbclient/client.py:782:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../anaconda/envs/test_reco/lib/python3.9/site-packages/jupyter_core/utils/__init__.py:189: in ensure_async
    result = await obj
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <jupyter_client.channels.AsyncZMQSocketChannel object at 0x7fa643014c40>
timeout = 600000.0

    async def get_msg(  # type:ignore[override]
        self, timeout: t.Optional[float] = None
    ) -> t.Dict[str, t.Any]:
        """Gets a message if there is one that is ready."""
        assert self.socket is not None
        if timeout is not None:
            timeout *= 1000  # seconds to ms
        ready = await self.socket.poll(timeout)
        if ready:
            res = await self._recv()
            return res
        else:
>           raise Empty
E           _queue.Empty

../../anaconda/envs/test_reco/lib/python3.9/site-packages/jupyter_client/channels.py:315: Empty

During handling of the above exception, another exception occurred:

notebooks = {'als_deep_dive': '/home/u/MS/recommenders/examples/02_model_collaborative_filtering/als_deep_dive.ipynb', 'als_pyspar...aseline_deep_dive.ipynb', 'benchmark_movielens': '/home/u/MS/recommenders/examples/06_benchmarks/movielens.ipynb', ...}
output_notebook = 'output.ipynb', kernel_name = 'python3'

    @pytest.mark.notebooks
    @pytest.mark.gpu
    def test_dkn_quickstart(notebooks, output_notebook, kernel_name):
        notebook_path = notebooks["dkn_quickstart"]
>       execute_notebook(
            notebook_path,
            output_notebook,
            kernel_name=kernel_name,
            parameters=dict(EPOCHS=1, BATCH_SIZE=500),
        )

tests/unit/examples/test_notebooks_gpu.py:118:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
recommenders/utils/notebook_utils.py:107: in execute_notebook
    executed_notebook, _ = execute_preprocessor.preprocess(
../../anaconda/envs/test_reco/lib/python3.9/site-packages/nbconvert/preprocessors/execute.py:102: in preprocess
    self.preprocess_cell(cell, resources, index)
../../anaconda/envs/test_reco/lib/python3.9/site-packages/nbconvert/preprocessors/execute.py:123: in preprocess_cell
    cell = self.execute_cell(cell, index, store_history=True)
../../anaconda/envs/test_reco/lib/python3.9/site-packages/jupyter_core/utils/__init__.py:173: in wrapped
    return loop.run_until_complete(inner)
../../anaconda/envs/test_reco/lib/python3.9/asyncio/base_events.py:647: in run_until_complete
    return future.result()
../../anaconda/envs/test_reco/lib/python3.9/site-packages/nbclient/client.py:1005: in async_execute_cell
    exec_reply = await self.task_poll_for_reply
../../anaconda/envs/test_reco/lib/python3.9/site-packages/nbclient/client.py:806: in _async_poll_for_reply
    error_on_timeout_execute_reply = await self._async_handle_timeout(timeout, cell)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <nbconvert.preprocessors.execute.ExecutePreprocessor object at 0x7fa6432c7130>
timeout = 600
cell = {'cell_type': 'code', 'execution_count': 7, 'metadata': {'pycharm': {'is_executing': False}, 'scrolled': True, 'execut..., 'iopub.execute_input': '2023-11-28T12:00:46.180970Z'}}, 'outputs': [], 'source': 'model.fit(train_file, valid_file)'}

    async def _async_handle_timeout(
        self, timeout: int, cell: NotebookNode | None = None
    ) -> None | dict[str, t.Any]:
        self.log.error("Timeout waiting for execute reply (%is)." % timeout)
        if self.interrupt_on_timeout:
            self.log.error("Interrupting kernel")
            assert self.km is not None
            await ensure_async(self.km.interrupt_kernel())
            if self.error_on_timeout:
                execute_reply = {"content": {**self.error_on_timeout, "status": "error"}}
                return execute_reply
            return None
        else:
            assert cell is not None
>           raise CellTimeoutError.error_from_timeout_and_cell(
                "Cell execution timed out", timeout, cell
            )
E           nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 600 seconds.
E           The message was: Cell execution timed out.
E           Here is a preview of the cell contents:
E           -------------------
E           model.fit(train_file, valid_file)
E           -------------------

../../anaconda/envs/test_reco/lib/python3.9/site-packages/nbclient/client.py:856: CellTimeoutError
----------------------------- Captured stderr call -----------------------------2023-11-28 13:00:18.703631: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-11-28 13:00:18.774212: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2023-11-28 13:00:18.786748: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2023-11-28 13:00:19.809460: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-28 13:00:19.815773: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-11-28 13:00:19.815805: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2023-11-28 13:00:21.593795: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 300000000 exceeds 10% of free system memory.
2023-11-28 13:00:23.448917: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 300000000 exceeds 10% of free system memory.
2023-11-28 13:00:25.435549: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 300000000 exceeds 10% of free system memory.
2023-11-28 13:00:26.060217: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 300000000 exceeds 10% of free system memory.
2023-11-28 13:00:26.659435: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 300000000 exceeds 10% of free system memory.
------------------------------ Captured log call -------------------------------ERROR    traitlets:client.py:845 Timeout waiting for execute reply (600s).
=============================== warnings summary ===============================../../anaconda/envs/test_reco/lib/python3.9/site-packages/jupyter_client/connect.py:22
  /home/u/anaconda/envs/test_reco/lib/python3.9/site-packages/jupyter_client/connect.py:22: DeprecationWarning: Jupyter is migrating its paths to use standard platformdirs
  given by the platformdirs library.  To remove this warning and
  see the appropriate new directories, set the environment variable
  `JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
  The use of platformdirs will be the default in `jupyter_core` v6
    from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================FAILED tests/unit/examples/test_notebooks_gpu.py::test_dkn_quickstart - nbclient.exceptions.CellTimeoutError: A cell timed out while it was being e...

Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com>
@miguelgfierro miguelgfierro merged commit b57cec2 into staging Dec 18, 2023
20 checks passed
@miguelgfierro miguelgfierro deleted the miguel/programmatic_execution_notebook branch December 18, 2023 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants