Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify that Trials were successfully deleted #1288

Merged

Conversation

andreyvelich
Copy link
Member

When I tried to update parallelTrialCount to less value, I got errors in controller.

In deleteTrials we should verify that Trials were successfully deleted before running reconcile loop again and update suggestion requests and status.suggestions.

Otherwise, updateTrialsSummary is resynchronized with Trial list and addCount for ReconcileSuggestion is incorrect.

/assign @gaocegege @johnugeorge

@kubeflow-bot
Copy link

This change is Reviewable

@andreyvelich andreyvelich force-pushed the change-status-when-delete-trials branch from 04d78c2 to 3af762f Compare August 3, 2020 18:44
@k8s-ci-robot k8s-ci-robot added size/L and removed size/M labels Aug 3, 2020
@andreyvelich
Copy link
Member Author

I updated test, unit tests locally works good, I don't know why it fails sometime in CI.

@gaocegege @johnugeorge @sperlingxx Do you have any ideas why we have 2 Travis in our pre-submits and post-submits?
First is on travis-ci.org https://travis-ci.org/github/kubeflow/katib/builds.
Second is on travis-ci.com https://travis-ci.com/github/kubeflow/katib/builds.

@andreyvelich
Copy link
Member Author

@gaocegege
Copy link
Member

Should we disable Katib on travis-ci.org since it is outdated: https://blog.travis-ci.com/2018-05-02-open-source-projects-on-travis-ci-com-with-github-apps ?

SGTM

BTW, I am not sure if I understand the PR, can you give me an example about why we need it?

@andreyvelich
Copy link
Member Author

@gaocegege If you tried to edit Experiment, e.g change parallelTrialCount from 4 to 2, 2 Trial has to be deleted.
Here we execute Delete:

if err := r.Delete(context.TODO(), &trialSlice[i]); err != nil {

After that, k8s objet will not be deleted immediately, finalizers run after Trial deletion, because of that I added check before finish Experiment reconciler.
If we don't do it, on the next Experiment reconciler loop List:
if err := r.List(context.TODO(), lo, trials); err != nil {

returns Trial list with 2 deleting Trials.
Then, UpdateExperimentStatus overrides Experiment status incorrect:
if err := util.UpdateExperimentStatus(r.collector, instance, trials); err != nil {
.
And addCount will be not correct:
addCount := requiredActiveCount - activeCount
.

Suggestion status also needs to be updated since we store Trials assignment there.

@andreyvelich
Copy link
Member Author

andreyvelich commented Aug 4, 2020

@jlewi Could you help us to disable travis-ci.org for Katib, please?
I don't have permission to disable travis-ci.org from here: https://travis-ci.org/organizations/kubeflow/repositories, or from Katib repo settings.

@andreyvelich
Copy link
Member Author

/retest

@gaocegege
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot removed the lgtm label Aug 4, 2020
Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Member Author

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@andreyvelich
Copy link
Member Author

/retest

1 similar comment
@andreyvelich
Copy link
Member Author

/retest

@k8s-ci-robot k8s-ci-robot merged commit 33832a7 into kubeflow:master Aug 6, 2020
@andreyvelich andreyvelich deleted the change-status-when-delete-trials branch October 6, 2021 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants