-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix crash in podgroup when runLauncherAsWorker is true #669
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for fixing this! Could you add a unit test case in
mpi-operator/pkg/controller/podgroup_test.go
Line 369 in c738a83
func TestCalculatePGMinResources(t *testing.T) { |
@GonzaloSaez Could you sign DCO (https://github.com/kubeflow/mpi-operator/pull/669/checks?check_run_id=32329697546), and address CI errors? |
Sure will do. Are these tests flaky? It seems that only IntelMPI is failing and I don't see why my changes would make those tests fail |
Uhm interesting. Let me restart Jobs. |
@GonzaloSaez I restarted the CI three times, all trials failed. So, this PR seems to bring any kind of additional bug. Could you fix that? |
@tenzen-y I ran these tests locally and they are passing
I'm still looking in to why this is happening in CI. |
Interesting. Let's see if the master branch is health here: #671 |
It seems the master branch E2E has been broken ... |
Signed-off-by: GonzaloSaez <11050889+GonzaloSaez@users.noreply.github.com>
Signed-off-by: GonzaloSaez <11050889+GonzaloSaez@users.noreply.github.com>
Signed-off-by: GonzaloSaez <11050889+GonzaloSaez@users.noreply.github.com>
Signed-off-by: GonzaloSaez <11050889+GonzaloSaez@users.noreply.github.com>
When runLauncherAsWorker is true and there is no worker, the MPI controller will crash in
calPGMinResource