-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nightly build #1503
Add nightly build #1503
Conversation
.github/workflows/nightly.yml
Outdated
test-kind: ['unit', 'smoke'] | ||
# pytest markers configured in tox.ini. See https://docs.pytest.org/en/6.2.x/example/markers.html | ||
test-marker: ['not spark and not gpu'] | ||
include: | ||
# Integration tests needs a more powermachine with more memory. GitHub-hosted agents only have 7GB memory. | ||
- os: self-hosted | ||
python: 3.6 | ||
test-kind: 'integration' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one question about this, is this pipeline running unit, smoke and integration test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! I know this extra include
clause is looking unusual. At the moment, the integration tests cannot run on the smaller SKUs (Standard_DS2_v2: 7GB) provided by GitHub because there isn't enough memory to run the training on the full dataset.
Ideally, all our tests shouldn't be that compute intensive and they can be ran on the default SKUs provided by the CI (unless we need to test for compatibility with GPU enabled machines). Thus the matrix should ideally look like:
test-kind: ['unit', 'smoke', 'integration']
test-marker: [....]
....
I think we can have a discussion over the objective of the different unit/smoke/integration tests and whether doing training on the full dataset achieve those goals. Added a new issue for the discussion: #1507.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I don't understand, you mentioned that this pipeline runs unit, smoke and integration but later that the integration cannot run on the smaller SKUs. Some questions:
- Where are the integration tests running?
- Is there a UI where I can see an example on how the tests are running?
- there are 3 tests running for CPU, unit, smoke and integration, are they running in parallel or sequentially?
- Do we know if GitHub has plans to support bigger SKUs?
- Related to 4, Azure Pipelines also has SKUs for testing, are they the same as GitHub, or are they bigger? (if they are bigger, then GitHub might catch up eventually)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries!
- The integration tests are currently running in the self-hosted gpu agent. This is configured with the
self-hosted
keyword.
include:
- os: self-hosted
- Please find all of the GitHub Action runs here, especially the
pr-gate
andnightly
pipeline. - Currently all of them are running in parallel. However because we set up only 2 self-hosted but there are about 5 nightly step needing to run on self-hosted machine, we haven't scale up the resource to run them in true parallel.
- As far as I know, I don't think there's a plan. Maybe the SKU offerings are different with GitHub Enterprise, but I haven't gotten too much info on that.
- Azure Pipeline and GitHub Action both use the same SKU: Standard_DS2_v2.
I hope that helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it thanks for the clarification
- would you mind explaining in the file that some test are being run on the hosted machine and that some tests on the self-hosted one?
- For nightly, in the current pipeline we are running only smoke and integration, see https://github.com/microsoft/recommenders/blob/main/tests/ci/azure_pipeline_test/dsvm_nightly_linux_cpu.yml#L30, and these two are sequential. The reason why they are sequential is that the smoke tests are just a small version of the integration tests. Is it possible to replicate this behavior in github?
We are making so many changes in test, I think we need also to review the test readme, there might be info that is not up to date: https://github.com/microsoft/recommenders/blob/main/tests/README.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great feedbacks!
- I already added a comment on these mixed runs for integration tests and link to a new issue (Tests runnable on default CI agent #1507) on whether we should optimize/trim these integration tests and make them fit to run on hosted VMs:
include:
# Integration tests need a powerful machine with more memory. GitHub-hosted agents only have 7GB memory.
# TODO: Optimization/refactoring should be done to ensure that these test can run in the default CI agent (#1507)
- os: self-hosted
@miguelgfierro is there anywhere else you would like me to comment?
- If I understand correctly, in the existing pipeline, the smoke tests are running before integration tests so we can "fail fast" and avoid running the full integration test which is destined to fail. In the similar manner, github action has the fail-fast keyword which will cancel running jobs in the matrix if any failed. If this isn't what you are going after and we need to run both of them sequentially, I think we would need to refactor the CI structure because is designed to run everything in parallel as much as possible.
We are making so many changes in test, I think we need also to review the test readme
This is a great document when I was onboarding :). All of the CI changes is compatible with the what's written in the document in terms of test execution. One thing that we can change is that we no longer need the --duration 0
flag from, for example, pytest tests/integration -m "integration and not spark and not gpu" --durations 0
, because these flags are stored in tox.ini and pytest will pick the flags up from tox.ini without you specifying it. Of course, it doesn't hurt to repeat yourself add them in when running. 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miguelgfierro is there anywhere else you would like me to comment?
For new users, particularly when the pipeline is complex like this one, it helps save them time to have a short summary at the top of the file. Here is an example: https://github.com/microsoft/recommenders/blob/main/tests/ci/azure_artifact_feed.yaml
.github/workflows/nightly.yml
Outdated
test-kind: ['unit', 'smoke'] | ||
# pytest markers configured in tox.ini. See https://docs.pytest.org/en/6.2.x/example/markers.html | ||
test-marker: ['not spark and not gpu'] | ||
include: | ||
# Integration tests needs a more powermachine with more memory. GitHub-hosted agents only have 7GB memory. | ||
- os: self-hosted | ||
python: 3.6 | ||
test-kind: 'integration' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I don't understand, you mentioned that this pipeline runs unit, smoke and integration but later that the integration cannot run on the smaller SKUs. Some questions:
- Where are the integration tests running?
- Is there a UI where I can see an example on how the tests are running?
- there are 3 tests running for CPU, unit, smoke and integration, are they running in parallel or sequentially?
- Do we know if GitHub has plans to support bigger SKUs?
- Related to 4, Azure Pipelines also has SKUs for testing, are they the same as GitHub, or are they bigger? (if they are bigger, then GitHub might catch up eventually)
@laserprec do you have the time saving between doing the old pipeline vs this one? |
@miguelgfierro, please interpret the numbers with a grain of salt. It's not an apples-to-apples comparison: This new nightly build on GitHub Action is about 4hrs40min to run the 3 builds and the full test suites (unit/smoke/integration) in partially parallel. As you see the GH action is bottlenecked by the longest running thread which is the GPU integration tests (4hr40min) and is considerably longer than the same run in ADO. This is because we are not using the same Azure SKUs to run them. On ADO we are using NC6s_v2 (which has P100 GPU but we no longer have any quota to allocate more) and in GitHub we are using NC12_promo (which has two P40 GPUs). So at this point, we don't have much immediate gains, but with the new CIs in GitHub Action we are in better grounds to scale up the infra for distributed loads when we have more environments to support and centralized interface to interpret the run and optimize the individual jobs in the next step. |
I can try to fix that and get you more quota, for the comparison we should have the same SKU, please ping me privately and I'll try to unblock you. Another point I see is that, as you said this is not apples, in the old ADO pipeline we are not running the unit tests. Do we want to have the unit test in the nightly build? The unit tests for spark take 30min? whereas in the ADO pipeline they take around 14, see this and this, do you know why? I also noticed that this PR is triggering the nightly builds with GitHub, they shouldn't be triggered in the PR but scheduled, can you review? One way to scale would be to use azureml VMs to execute the tests, this was an initiative that was tried in the past but we didn't put in production. We have some code here that might help you to understand what we did. Can you estimate how much work and how difficult it would be to have azureml VMs instead of a hosted VM on github? |
@miguelgfierro I think it's typical to run the full test suites on nightly build. This way we can get the full code coverage report. Relatively speaking the time to run the full unit test is probably a small fraction of that of the integrations. I don't think it hurts to run 😄.
This is because the unit tests are running in the default SKU, with only 7GB memory. It took longer because we were running with less compute.
Yup, the manual trigger will not be registered in the UI until these changes are checked in into main branch. I am just adding a custom PR trigger to make sure the nightly build runs. Now that we confirm it can run fine, I can remove it (todo).
I think that's a good idea for managed compute pools for model training. At the moment, I am not aware of any good integration plugins available for azureml and Github Action. I think this is not a trivial change in terms of secret managements, complexity introduced involving aml in the pipeline, etc. The gains, I assume, is relieving the team from managing the self-hosted VMs and potentially introduce AML as the training scenario in our NB? I think we can investigate this direction. @miguelgfierro what would be the best way to backlog this item for us to come back to? |
Got it, makes sense, really good suggestions, and thought processes.
We have an open issue about this #995, we can either reuse this one or create a new one and close this, whatever you prefer |
Awesome! We can reuse this issue and I already assigned this to me. I think we can look into this after we have finished in migration. Btw, @miguelgfierro, is there anything else I haven't address? I would love to get your approval before merging this PR 😄. |
there are still some questions that it would be good to have clarity on:
|
I hope I gave a better explanation :p. If not, I can hop into a call.
I feel like this support for 3.7 can be its own PR. Trying to keep this PR modular. I have created a new issue for this #1511. I can create like a shadow CI job to try the build in 3.7 (but would not fail the build when related jobs fail) in upcoming changes.
I updated the comments with more details commit in 97faf6341d. Let me know if we need more context. |
Codecov Report
@@ Coverage Diff @@
## staging #1503 +/- ##
===========================================
- Coverage 61.96% 61.95% -0.01%
===========================================
Files 84 84
Lines 8368 8369 +1
===========================================
Hits 5185 5185
- Misses 3183 3184 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really good job @laserprec
Description
To improve developer experience working with this repo, we will begin migrating our DevOps pipelines into Github Action, leveraging popular DevOps tools like
tox
andflake8
. This will be a sequence of updates to our DevOps infrastructures:Propose and draft the CI pipeline in Github ActionSetup self-hosted machines to run our GPU workloadsIn this PR:
Add a nightly build pipeline that runs the full test suites: unit, integration and smoke tests:
Related Issues
#1498
Checklist:
staging branch
and not tomain branch
.