-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trials Pods are completed but never successful neither reused, metrics are not shown #1949
Comments
Can you check #1795 (comment) ? |
Dear @johnugeorge, thank you very much for your reply. I have checked the two points of your comment:
So even if the first pings failed, I understand everything is fine with
Btw, when logging the pod
I think my error has its origin in the way the certificates are generated, but I am not sure neither how to solve it. |
Sorry for late reply. Is it a fresh installation? Is it stale web hook configurations? /cc @tenzen-y |
Maybe, it occurred by the old WebhookConfigurations or Secret embedded certs. As @johnugeorge says, can you re-install katib after running the below commands to clean up the old katib? kubectl delete -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=v0.13.0" |
I have the same phenomenon. However, I am not a katib-controller problem, even though everything was done normally and the job was done normally, the data does not accumulate on the db. For your information, I am connecting to a mysql server other than mysql provided by katib. The connection to the corresponding db is normal, and the observe_logs table is also normally created. However, the data is not accumulated in the db. What's the problem? Is there a solution? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it. |
/kind bug
What steps did you take and what happened:
[A clear and concise description of what the bug is.]
I have tried to run the Hyperparameter Tuning v1beta1 examples from the official Github of Katib. https://github.com/kubeflow/katib/tree/master/examples/v1beta1/hp-tuning. The only thing I have changed has been the repository name (from kubeflow to joaquin-garcia), and I have tried both keeping enable and disable the sidecar injection (our cluster uses istio), as detailed in Step 3 in https://www.kubeflow.org/docs/components/katib/hyperparameter/ .
The problem is that each pod executes one Trial (one combination of parameters), and the trial is marked as completed but never as successful (in the Terminal neither in the UI), so the goal of the tool is not reached. I have checked that the algorithm is carried out in each pod, as the different epochs and metrics are shown in the terminal, but nothing is shown in the UI.
What did you expect to happen:
I expected each pod to be rerun with a different combination of values for each of the parameters under study / tuning.
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Environment:
kubectl version
): Client v1.25.0 | Server v1.21.13uname -a
): Linux microsoft-standard-WSL2 x86_64 x86_64 x86_64 GNU/LinuxImpacted by this bug? Give it a 👍 We prioritize the issues with the most 👍
The text was updated successfully, but these errors were encountered: