-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check error in OpenSQLConnection #588
Check error in OpenSQLConnection #588
Conversation
pkg/db/v1alpha1/interface.go
Outdated
@@ -111,6 +111,9 @@ func openSQLConn(driverName string, dataSourceName string, interval time.Duratio | |||
if err = db.Ping(); err == nil { | |||
return db, nil | |||
} | |||
klog.Infof("Ping to Katib db failed: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use klog.Errorf
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed.
pkg/db/v1alpha2/interface.go
Outdated
@@ -95,6 +95,9 @@ func openSQLConn(driverName string, dataSourceName string, interval time.Duratio | |||
if err = db.Ping(); err == nil { | |||
return db, nil | |||
} | |||
klog.Infof("Ping to Katib db failed: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return nil, err here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gaocegege This is in a for-select loop. Shouldn't it wait till timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich Just wondering. Have you seen this log https://github.com/kubeflow/katib/blob/master/cmd/manager/v1alpha2/main.go#L199 after the timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh gotcha, yeah we should wait instead of returning an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnugeorge I didn't check it, because after the timeout my manager container was automatically restarting.
/retest |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: johnugeorge The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I faced with the same error as here: #340.
Problem was with my Coredns in Kubernetes cluster. After I changed pod network from
Flannel
toCalico
, it was fixed.I didn't see any logs from the katib-manager and it was hard to found this problem.
I think, we should print these error messages to the user, then it will be easier to find the problem.
/cc @gaocegege @johnugeorge @hougangliu
This change is