Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Use AzureML compute target to execute test #995

Closed
6 tasks
miguelgfierro opened this issue Dec 3, 2019 · 5 comments
Closed
6 tasks

[FEATURE] Use AzureML compute target to execute test #995

miguelgfierro opened this issue Dec 3, 2019 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@miguelgfierro
Copy link
Collaborator

Description

@bethz created a pipeline that uses AzureML to programmatically start a virtual machine, execute the tests, gather the results in Azure DevOps and shut down the machine.

At the moment the pipeline is inactive but the code is in the repo here: https://github.com/microsoft/recommenders/tree/7c0a6a3e23dee047b9afbf8564bd236a8300454e/tests/ci
The files are:

  • cpu_unit_tests.yml
  • env-setup.yml
  • gpu_unit_test.yml
  • nightly_cpu.yml
  • nightly_gpu.yml
  • notebooks_gpu_unit_tests.yml
  • notebooks_unit_tests.yml
  • run_pytest.py
  • submit_azureml_pytest.py

Expected behavior with the suggested feature

The steps that are missing to activate this again are:

  • move all the code under a new folder azureml_compute_target
  • rename each file following the convention that we have in azure_pipeline_test, there we have dsvm_nightly_linux_cpu.yml for these tests it would be compute_target_nightly_linux_cpu.yml
  • activate the pipelines from DevOps
  • make sure that all the tests are working correctly
  • update the readmes
  • do a comparative benchmark of tests running on a dsvm vs tests running on AzureML compute target, the check whether there is a benefit of using AzureML or not

Other Comments

related to #450

@miguelgfierro
Copy link
Collaborator Author

miguelgfierro commented Feb 1, 2022

we are retaking this work item, now with @pradnyeshjoshi.

  • Creating a simple project that executes a simple test, like assert 1==1 and assert 1==0 and execute it via AzureML
  • Make sure that the logs of AzureML can be retrieved to GitHub so people that do the PR can analize the logs
  • Try to do the complete process installing recommenders with pip install .[dev] and execute one of the CPU tests
  • Repeat and execute all the CPU tests
  • Create a GitbHub pipeline that execute the tests with AzureML every time there is a PR to staging branch
  • Make sure that the status of the test can be visualized in the PR checks
  • Try the complete process installing the GPU dependencies
  • Try the complete process installing the Spark dependencies
  • Understand how to set up async tests that execute every night (with a CRON task or similar)
  • Set up the nightly builds of CPU
  • Set up the nightly builds of GPU
  • Set up the nightly builds of Spark
  • Decide how to divide the 300 different tests to maximize the machine utilization (check with @anargyri and @miguelgfierro )
  • Repeat the whole process with python 3.6, 3.8 and 3.9
  • Report the gains: how long the tests took before, how long it is taking now
  • Replace the release pipeline that we currently have in ADO https://github.com/microsoft/recommenders/blob/main/tests/ci/azure_pipeline_test/release_pipeline.yml
  • After making sure that the new tests have been running for a while, remove the other tests
  • Remove the old code for the repo that we are no longer using

@miguelgfierro
Copy link
Collaborator Author

@simonzhao we are seeing an error with PySpark. Right now, we have "pyspark>=2.4.5,<4.0.0" see setup.py. Are we using Spark 2 in SAR+?

@miguelgfierro
Copy link
Collaborator Author

miguelgfierro commented Mar 1, 2022

GPU nightly on a DSVM Standard NC6s v2 (6 vcpus, 112 GiB memory) Python 3.7.12:

============================== slowest durations ===============================
2432.76s call     tests/integration/examples/test_notebooks_gpu.py::test_naml_quickstart_integration[6-42-demo-expected_values0]
1854.21s call     tests/integration/examples/test_notebooks_gpu.py::test_wide_deep_integration[1m-50000-expected_values0-42]
1470.45s call     tests/integration/examples/test_notebooks_gpu.py::test_nrms_quickstart_integration[8-42-demo-expected_values0]
1168.31s call     tests/integration/examples/test_notebooks_gpu.py::test_dkn_quickstart_integration
1138.31s call     tests/integration/examples/test_notebooks_gpu.py::test_npa_quickstart_integration[6-42-demo-expected_values0]
1060.53s call     tests/integration/examples/test_notebooks_gpu.py::test_ncf_integration[1m-10-expected_values0-42]
990.63s call     tests/integration/examples/test_notebooks_gpu.py::test_lstur_quickstart_integration[5-40-demo-expected_values0]
668.17s call     tests/integration/examples/test_notebooks_gpu.py::test_fastai_integration[1m-10-expected_values0]
561.53s call     tests/integration/examples/test_notebooks_gpu.py::test_sasrec_quickstart_integration[tests/recsys_data/RecSys/SASRec-tf2/data-1-128-expected_values0-42]
481.27s call     tests/integration/examples/test_notebooks_gpu.py::test_xdeepfm_integration[15-10-expected_values0-42]
399.52s call     tests/integration/examples/test_notebooks_gpu.py::test_cornac_bivae_integration[1m-expected_values0]
351.76s call     tests/integration/examples/test_notebooks_gpu.py::test_ncf_deep_dive_integration[100k-10-512-expected_values0-42]
176.33s call     tests/integration/examples/test_notebooks_gpu.py::test_slirec_quickstart_integration[recommenders/models/deeprec/config/sli_rec.yaml-tests/resources/deeprec/slirec-10-400-expected_values0-42]
20.66s call     tests/integration/examples/test_notebooks_gpu.py::test_lightgcn_deep_dive_integration[recommenders/models/deeprec/config/lightgcn.yaml-tests/resources/deeprec/lightgcn-100k-5-1024-expected_values0-42]
0.75s call     tests/integration/examples/test_notebooks_gpu.py::test_gpu_vm
0.11s teardown tests/integration/examples/test_notebooks_gpu.py::test_wide_deep_integration[1m-50000-expected_values0-42]


@miguelgfierro
Copy link
Collaborator Author

After this feature is added, we need to update the wiki: https://github.com/Microsoft/Recommenders/wiki/Test-Strategy

@pradnyeshjoshi I add this info here so we don't forget

@miguelgfierro
Copy link
Collaborator Author

miguelgfierro commented Apr 21, 2022

Related to #1706, we added a new test for LightFM. There is a very inefficient transformation that the author of the notebook did and currently the test take 38min.

2320.18s call     tests/integration/examples/test_notebooks_python.py::test_lightfm_integration[100k-10-expected_values0]
====================================== 1 passed, 2 warnings in 2322.77s (0:38:42) =

We need to add this test to the AzureML groups after #1706 is merged.

The main reason is because the function prepare_all_predictions is very slow, I added an issue so we can fix it whenever we have time: #1707

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants