Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test doc review #1777

Merged
merged 5 commits into from
Jul 8, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 75 additions & 13 deletions tests/README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,24 @@
# Tests

In this document we show our test infrastructure and how to contribute test to the repository.

## Types of tests

This project uses unit, smoke and integration tests with Python files and notebooks:

* In the unit tests we just make sure the utilities and notebooks run.

* In the smoke tests, we run them with a small dataset or a small number of epochs to make sure that, apart from running, they provide reasonable metrics.
* In the smoke tests, we run them with a small dataset or a small number of epochs to make sure that, apart from running, they provide reasonable machine learning metrics. These can be run sequentially with integration tests to detect quickly simple errors, and should be fast.

* In the integration tests we use a bigger dataset for more epochs and we test that the machine learning metrics are what we expect.

These types of tests are integrated in the repo in two ways, via the PR gate, and the nightly builds.

* In the integration tests we use a bigger dataset for more epochs and we test that the metrics are what we expect.
The PR gate are the set of tests executed after doing a pull request and they should be quick. Here we include unit test that just check that the code doesn't have any errors.

For more information, see a [quick introduction to unit, smoke and integration tests](https://miguelgfierro.com/blog/2018/a-beginners-guide-to-python-testing/). To manually execute the unit tests in the different environments, first **make sure you are in the correct environment as described in the [SETUP.md](../SETUP.md)**.
The nightly builds tests are executed asynchronously and can take longer. Here we include the smoke and integration tests, and their objective is to not only make sure that there are not errors, but also to make sure that the machine learning solutions are doing what we expect.

For more information, see a [quick introduction to unit, smoke and integration tests](https://miguelgfierro.com/blog/2018/a-beginners-guide-to-python-testing/).

## Test infrastructure using AzureML

Expand All @@ -20,24 +28,48 @@ In the following figure we show a workflow on how the tests are executed via Azu

<img src="https://recodatasets.z20.web.core.windows.net/images/AzureML_tests.svg?sanitize=true">

GitHub workflows `azureml-unit-tests.yml`, `azureml-cpu-nightly.yml`, `azureml-gpu-nightly.yml` and `azureml-spark-nightly` located in `recommenders/.github/workflows/` are used to run the tests on AzureML and parameters to configure AzureML are defined in the workflow yml files. Tests are divided into groups and each workflow triggers execution of these test groups in parallel, which significantly reduces end-to-end execution time. There are three scripts used with each workflow:
GitHub workflows `azureml-unit-tests.yml`, `azureml-cpu-nightly.yml`, `azureml-gpu-nightly.yml` and `azureml-spark-nightly` located in [.github/workflows/](../.github/workflows/) are used to run the tests on AzureML. The parameters to configure AzureML are defined in the workflow yml files. Tests are divided into groups and each workflow triggers execution of these test groups in parallel, which significantly reduces end-to-end execution time.

* `ci/azureml_tests/submit_groupwise_azureml_pytest.py` - this script uses parameters in the workflow yml to set up the AzureML environment for testing using the AzureML SDK .
* `ci/azureml_tests/run_groupwise_pytest.py` - this script uses pytest to run tests on utilities or runs papermill to execute tests on notebooks. This script runs in an AzureML workspace with the environment created by the script above.
* `ci/azureml_tests/test_groups.py` - this script defines groups of tests.
There are three scripts used with each workflow, all of them are located in [test/ci/azureml_tests/](./ci/azureml_tests/):

* `submit_groupwise_azureml_pytest.py`: this script uses parameters in the workflow yml to set up the AzureML environment for testing using the AzureML SDK.
* `run_groupwise_pytest.py`: this script uses pytest to run the tests of the libraries and notebooks. This script runs in an AzureML workspace with the environment created by the script above.
* `test_groups.py`: this script defines groups of tests. If the tests are part of the unit tests, the total compute time of each group should be less than 15min. If the tests are part of the nightly builds, the total time of each group should be less than 35min.

## How to create tests

### How to add tests to the AzureML pipeline
In this section we show how to create tests and add them to the test pipeline. The steps you need to follow are:

1. Create your code in the library and/or notebooks.
1. Design the unit tests for the code.
1. If you have written a notebook, design the notebook tests and check that the metrics that it returns is what you expect.
1. Add the tests to the AzureML pipeline in the corresponding [test group](./ci/azureml_tests/test_groups.py). **Please note that if you don't add your tests to the pipeline, they will not be executed.**

To add a new test to the AzureML pipeline, add the test path to an appropriate test group listed in [test_groups.py](https://github.com/microsoft/recommenders/blob/main/tests/ci/azureml_tests/test_groups.py). Tests in `group_cpu_xxx` groups are executed on a CPU-only AzureML compute cluster node. Tests in `group_gpu_xxx` groups are executed on a GPU-enabled AzureML compute cluster node with GPU related dependencies added to the AzureML run environment. Tests in `group_pyspark_xxx` groups are executed on a CPU-only AzureML compute cluster node, with the PySpark related dependencies added to the AzureML run environment. Another thing to keep in mind while adding a new test is that the runtime of the test group should not exceed the specified threshold in [test_groups.py](tests/ci/azureml_tests/test_groups.py).
### How to create tests for the library code

You want to make sure that all your code works before you submit it to the repository. Here are guidelines for creating the unit tests:

* It is better to create multiple small tests than one large test that checks all the code.
* Use `@pytest.fixture` to create data in your tests.
* Use the mark `@pytest.mark.gpu` if you want the test to be executed in a GPU environment. Use `@pytest.mark.spark` if you want the test to be executed in a Spark environment.
* Use `@pytest.mark.smoke` and `@pytest.mark.integration` to mark the tests as smoke tests and integration tests.
* Use `@pytest.mark.notebooks` if you are testing a notebook.
* Avoid using `is` in the asserts, instead use the operator `==`.
* Follow the pattern `assert computation == value`, for example:
```python
assert results["precision"] == pytest.approx(0.330753)
```
* Check always the limits of your computations, for example, you want to check that the RMSE between two equal vectors is 0:
```python
assert rmse(rating_true, rating_true) == 0
assert rmse(rating_true, rating_pred) == pytest.approx(7.254309)
```

### How to create tests on notebooks with Papermill and scrapbook
### How to create tests on notebooks with Papermill and Scrapbook

In the notebooks of this repo, we use [Papermill](https://github.com/nteract/papermill) and [scrapbook](https://nteract-scrapbook.readthedocs.io/en/latest/) in unit, smoke and integration tests. Papermill is a tool that enables you to parameterize and execute notebooks. `scrapbook` is a library for recording a notebook’s data values and generated visual content as “scraps”. These recorded scraps can be read at a future time. We use `scrapbook` to collect the metrics in the notebooks.

#### Developing unit tests with Papermill and scrapbook
#### Developing unit tests with Papermill and Scrapbook

Executing a notebook with Papermill is easy, this is what we mostly do in the unit tests. Next we show just one of the tests that we have in [tests/unit/examples/test_notebooks_python.py](tests/unit/examples/test_notebooks_python.py).

Expand Down Expand Up @@ -107,9 +139,39 @@ For executing this test, first make sure you are in the correct environment as d
pytest tests/smoke/test_notebooks_python.py::test_sar_single_node_smoke
```

More details on how to integrate Papermill with notebooks can be found in their [repo](https://github.com/nteract/papermill).
More details on how to integrate Papermill with notebooks can be found in their [repo](https://github.com/nteract/papermill). Also, you can check the [Scrapbook repo](https://github.com/nteract/scrapbook).

### How to add tests to the AzureML pipeline

To add a new test to the AzureML pipeline, add the test path to an appropriate test group listed in [test_groups.py](https://github.com/microsoft/recommenders/blob/main/tests/ci/azureml_tests/test_groups.py).

Tests in `group_cpu_xxx` groups are executed on a CPU-only AzureML compute cluster node. Tests in `group_gpu_xxx` groups are executed on a GPU-enabled AzureML compute cluster node with GPU related dependencies added to the AzureML run environment. Tests in `group_pyspark_xxx` groups are executed on a CPU-only AzureML compute cluster node, with the PySpark related dependencies added to the AzureML run environment.

It's important to keep in mind while adding a new test that the runtime of the test group should not exceed the specified threshold in [test_groups.py](tests/ci/azureml_tests/test_groups.py).

Example of adding a new test:

1. In the environment that you are running your code, first see if there is a group whose total runtime is less than the threshold
```python
"group_spark_001": [ # Total group time: 271.13s
"tests/smoke/recommenders/dataset/test_movielens.py::test_load_spark_df", # 4.33s
"tests/integration/recommenders/datasets/test_movielens.py::test_load_spark_df", # 25.58s + 101.99s + 139.23s
],
```
2. Add the test to the group, add the time it takes to compute, and update the total group time.
```python
"group_spark_001": [ # Total group time: 571.13s
"tests/smoke/recommenders/dataset/test_movielens.py::test_load_spark_df", # 4.33s
"tests/integration/recommenders/datasets/test_movielens.py::test_load_spark_df", # 25.58s + 101.99s + 139.23s
#
"tests/path/to/test_new.py::test_new_function", # 300s
],
```
3. If all the groups of your environment are above the threshold, add a new group.

## How to execute tests in your local environment

## How to execute tests
To manually execute the tests in the CPU, GPU or Spark environments, first **make sure you are in the correct environment as described in the [SETUP.md](../SETUP.md)**.

*Click on the following menus* to see more details on how to execute the unit, smoke and integration tests:

Expand Down
14 changes: 7 additions & 7 deletions tests/ci/azureml_tests/test_groups.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@
"tests/smoke/examples/test_notebooks_gpu.py::test_npa_smoke", # 366.22s
"tests/integration/examples/test_notebooks_gpu.py::test_npa_quickstart_integration", # 810.92s
],
"group_gpu_007": [ # Total group time:
"group_gpu_007": [ # Total group time: 620.89s
"tests/unit/examples/test_notebooks_gpu.py::test_gpu_vm", # 0.76s (Always the first test to check the GPU works)
"tests/smoke/examples/test_notebooks_gpu.py::test_naml_smoke", # 620.13s
# FIXME: Reduce test time https://github.com/microsoft/recommenders/issues/1731
Expand Down Expand Up @@ -178,19 +178,19 @@
"tests/unit/recommenders/evaluation/test_spark_evaluation.py::test_distributional_coverage",
"tests/unit/recommenders/datasets/test_spark_splitter.py::test_min_rating_filter",
],
# TODO: This is a flaky test, skip for now, to be fixed in future iterations.
# Refer to the issue: https://github.com/microsoft/recommenders/issues/1770
# "group_notebooks_pyspark_001": [ # Total group time: 746.53s
# "tests/unit/examples/test_notebooks_pyspark.py::test_spark_tuning", # 212.29s+190.02s+180.13s+164.09s (flaky test, it rerun several times)
# ],
"group_notebooks_pyspark_002": [ # Total group time: 728.43s
"group_notebooks_pyspark_001": [ # Total group time: 728.43s
"tests/unit/examples/test_notebooks_pyspark.py::test_als_deep_dive_runs",
"tests/unit/examples/test_notebooks_pyspark.py::test_data_split_runs",
"tests/unit/examples/test_notebooks_pyspark.py::test_evaluation_runs",
"tests/unit/examples/test_notebooks_pyspark.py::test_als_pyspark_runs",
"tests/unit/examples/test_notebooks_pyspark.py::test_evaluation_diversity_runs",
"tests/unit/examples/test_notebooks_pyspark.py::test_mmlspark_lightgbm_criteo_runs", # 56.55s
],
# TODO: This is a flaky test, skip for now, to be fixed in future iterations.
# Refer to the issue: https://github.com/microsoft/recommenders/issues/1770
# "group_notebooks_pyspark_002": [ # Total group time: 746.53s
# "tests/unit/examples/test_notebooks_pyspark.py::test_spark_tuning", # 212.29s+190.02s+180.13s+164.09s (flaky test, it rerun several times)
# ],
"group_gpu_001": [ # Total group time: 492.62s
"tests/unit/examples/test_notebooks_gpu.py::test_gpu_vm", # 0.76s (Always the first test to check the GPU works)
"tests/unit/recommenders/models/test_deeprec_model.py::test_xdeepfm_component_definition",
Expand Down
14 changes: 14 additions & 0 deletions tests/integration/examples/test_notebooks_gpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ def test_gpu_vm():


@pytest.mark.gpu
@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.parametrize(
"size, epochs, expected_values, seed",
Expand Down Expand Up @@ -64,6 +65,7 @@ def test_ncf_integration(


@pytest.mark.gpu
@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.parametrize(
"size, epochs, batch_size, expected_values, seed",
Expand Down Expand Up @@ -118,6 +120,7 @@ def test_ncf_deep_dive_integration(


@pytest.mark.gpu
@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.parametrize(
"size, epochs, expected_values",
Expand Down Expand Up @@ -158,6 +161,7 @@ def test_fastai_integration(


@pytest.mark.gpu
@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.parametrize(
"syn_epochs, criteo_epochs, expected_values, seed",
Expand Down Expand Up @@ -207,6 +211,7 @@ def test_xdeepfm_integration(


@pytest.mark.gpu
@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.parametrize(
"size, steps, expected_values, seed",
Expand Down Expand Up @@ -255,6 +260,7 @@ def test_wide_deep_integration(


@pytest.mark.gpu
@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.parametrize(
"yaml_file, data_path, epochs, batch_size, expected_values, seed",
Expand Down Expand Up @@ -306,6 +312,7 @@ def test_slirec_quickstart_integration(


@pytest.mark.gpu
@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.parametrize(
"epochs, batch_size, seed, MIND_type, expected_values",
Expand Down Expand Up @@ -367,6 +374,7 @@ def test_nrms_quickstart_integration(


@pytest.mark.gpu
@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.parametrize(
"epochs, batch_size, seed, MIND_type, expected_values",
Expand Down Expand Up @@ -428,6 +436,7 @@ def test_naml_quickstart_integration(


@pytest.mark.gpu
@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.parametrize(
"epochs, batch_size, seed, MIND_type, expected_values",
Expand Down Expand Up @@ -489,6 +498,7 @@ def test_lstur_quickstart_integration(


@pytest.mark.gpu
@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.parametrize(
"epochs, batch_size, seed, MIND_type, expected_values",
Expand Down Expand Up @@ -550,6 +560,7 @@ def test_npa_quickstart_integration(


@pytest.mark.gpu
@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.parametrize(
"yaml_file, data_path, size, epochs, batch_size, expected_values, seed",
Expand Down Expand Up @@ -607,6 +618,7 @@ def test_lightgcn_deep_dive_integration(


@pytest.mark.gpu
@pytest.mark.notebooks
@pytest.mark.integration
def test_dkn_quickstart_integration(notebooks, output_notebook, kernel_name):
notebook_path = notebooks["dkn_quickstart"]
Expand All @@ -627,6 +639,7 @@ def test_dkn_quickstart_integration(notebooks, output_notebook, kernel_name):


@pytest.mark.gpu
@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.parametrize(
"size, expected_values",
Expand Down Expand Up @@ -654,6 +667,7 @@ def test_cornac_bivae_integration(


@pytest.mark.gpu
@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.parametrize(
"data_dir, num_epochs, batch_size, model_name, expected_values, seed",
Expand Down
2 changes: 2 additions & 0 deletions tests/integration/examples/test_notebooks_pyspark.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
# This is a flaky test that can fail unexpectedly
@pytest.mark.flaky(reruns=5, reruns_delay=2)
@pytest.mark.spark
@pytest.mark.notebooks
@pytest.mark.integration
def test_als_pyspark_integration(notebooks, output_notebook, kernel_name):
notebook_path = notebooks["als_pyspark"]
Expand All @@ -44,6 +45,7 @@ def test_als_pyspark_integration(notebooks, output_notebook, kernel_name):
# This is a flaky test that can fail unexpectedly
@pytest.mark.flaky(reruns=5, reruns_delay=2)
@pytest.mark.spark
@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.skip(reason="It takes too long in the current test machine")
@pytest.mark.skipif(sys.platform == "win32", reason="Not implemented on Windows")
Expand Down
11 changes: 10 additions & 1 deletion tests/integration/examples/test_notebooks_python.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
ABS_TOL = 0.05


@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.parametrize(
"size, expected_values",
Expand Down Expand Up @@ -57,6 +58,7 @@ def test_sar_single_node_integration(
assert results[key] == pytest.approx(value, rel=TOL, abs=ABS_TOL)


@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.parametrize(
"size, expected_values",
Expand Down Expand Up @@ -91,6 +93,7 @@ def test_baseline_deep_dive_integration(
assert results[key] == pytest.approx(value, rel=TOL, abs=ABS_TOL)


@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.parametrize(
"size, expected_values",
Expand Down Expand Up @@ -129,6 +132,7 @@ def test_surprise_svd_integration(
assert results[key] == pytest.approx(value, rel=TOL, abs=ABS_TOL)


@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.parametrize(
"size, expected_values",
Expand Down Expand Up @@ -167,7 +171,7 @@ def test_vw_deep_dive_integration(
assert results[key] == pytest.approx(value, rel=TOL, abs=ABS_TOL)


# @pytest.mark.skipif(sys.platform == "win32", reason="nni not installable on windows")
@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.skip(reason="NNI pip package has installation incompatibilities")
def test_nni_tuning_svd(notebooks, output_notebook, kernel_name, tmp):
Expand All @@ -188,6 +192,7 @@ def test_nni_tuning_svd(notebooks, output_notebook, kernel_name, tmp):
)


@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.skip(reason="Wikidata API is unstable")
def test_wikidata_integration(notebooks, output_notebook, kernel_name, tmp):
Expand All @@ -208,6 +213,7 @@ def test_wikidata_integration(notebooks, output_notebook, kernel_name, tmp):
assert results["length_result"] >= 1


@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.parametrize(
"size, expected_values",
Expand All @@ -234,6 +240,7 @@ def test_cornac_bpr_integration(
assert results[key] == pytest.approx(value, rel=TOL, abs=ABS_TOL)


@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.parametrize(
"size, epochs, expected_values",
Expand Down Expand Up @@ -268,6 +275,7 @@ def test_lightfm_integration(
assert results[key] == pytest.approx(value, rel=TOL, abs=ABS_TOL)


@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.experimental
@pytest.mark.parametrize(
Expand All @@ -285,6 +293,7 @@ def test_geoimc_integration(notebooks, output_notebook, kernel_name, expected_va
assert results[key] == pytest.approx(value, rel=TOL, abs=ABS_TOL)


@pytest.mark.notebooks
@pytest.mark.integration
@pytest.mark.experimental
def test_xlearn_fm_integration(notebooks, output_notebook, kernel_name):
Expand Down
2 changes: 2 additions & 0 deletions tests/smoke/examples/test_notebooks_pyspark.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
@pytest.mark.flaky(reruns=5, reruns_delay=2)
@pytest.mark.smoke
@pytest.mark.spark
@pytest.mark.notebooks
def test_als_pyspark_smoke(notebooks, output_notebook, kernel_name):
notebook_path = notebooks["als_pyspark"]
pm.execute_notebook(
Expand Down Expand Up @@ -46,6 +47,7 @@ def test_als_pyspark_smoke(notebooks, output_notebook, kernel_name):
@pytest.mark.flaky(reruns=5, reruns_delay=2)
@pytest.mark.smoke
@pytest.mark.spark
@pytest.mark.notebooks
@pytest.mark.skipif(sys.platform == "win32", reason="Not implemented on Windows")
def test_mmlspark_lightgbm_criteo_smoke(notebooks, output_notebook, kernel_name):
notebook_path = notebooks["mmlspark_lightgbm_criteo"]
Expand Down
Loading