-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[errors] Add distinct error codes for docker not running #4914
[errors] Add distinct error codes for docker not running #4914
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4914 +/- ##
==========================================
+ Coverage 71.96% 72.17% +0.20%
==========================================
Files 357 358 +1
Lines 12327 12397 +70
==========================================
+ Hits 8871 8947 +76
+ Misses 2799 2796 -3
+ Partials 657 654 -3
Continue to review full report at Codecov.
|
Testing
|
8992c54
to
4decbeb
Compare
regexp *regexp.Regexp | ||
description string | ||
description func(error) string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we're doing the same thing in two different ways: we match regexp
against error to filter down to the problem
but now we match error again to filter down to a description.
We can leave this as is and rather split DockerConnectionFailedPrefix
into DockerConnectionFailed
and DockerConnectionFailedWithUnknownHost
for the case with known and unknown host address respectively.
That'll keep problem
struct at lowest granularity and easily referenceable.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above approach still won't solve the problem. The reason why i added the description
as a function was to extract the "address" where docker daemon should be running from the error message.
e.g. if the error is
exiting dev mode because first build failed: docker build: Cannot connect to the Docker daemon at tcp://127.0.0.1:32770. Is the docker daemon running?
I want to extract the "tcp://127.0.0.1:32770." and trim the error message as
Build failed. Cannot connect to the Docker daemon at tcp://127.0.0.1:32770. Is the docker daemon running?
When deploy context is not minikube, we could just return the error we receive as is.
When deploy context is minikube, the error we receive is
creating runner: creating builder: getting docker client: getting minikube env: running [/Users/tejaldesai/Downloads/google-cloud-sdk2/bin/minikube docker-env --shell none -p minikube]
- stdout: "\n\n"
- stderr: "! Executing \"docker container inspect minikube --format={{.State.Status}}\" took an unusually long time: 7.36540945s\n* Restarting the docker service may improve performance.\nX Exiting due to GUEST_STATUS: state: unknown state \"minikube\": docker container inspect minikube --format=: exit status 1\nstdout:\n\n\nstderr:\nCannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?\n\n* \n* If the above advice does not help, please let us know: \n - https://github.com/kubernetes/minikube/issues/new/choose\n"
- cause: exit status 80
which needs to be translated to
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Another way to do this, if we use parse the "stdout" when we get this error from minikube and raise the error as "Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
maybe this is long term approach we want.
Or, else we use a static string for both cases.
Cannot connect to the Docker daemon. Is the docker daemon running?
I hope it's clear.
In general, i feel we need a function to massage/trim the error we received to get rid of all the text added due to errors.Wrap
.
Or, we could add another field to struct problem
type problem struct {
regexp *regexp.Regexp
description func(error) string
errCode proto.StatusCode
suggestion func(opts config.SkaffoldOptions) []*proto.Suggestion
userMessage func(error) string
}
pkg/skaffold/event/event.go
Outdated
@@ -310,7 +310,7 @@ func BuildCanceled(imageName string) { | |||
func BuildFailed(imageName string, err error) { | |||
aiErr := sErrors.ActionableErr(sErrors.Build, err) | |||
handler.stateLock.Lock() | |||
handler.state.BuildState.StatusCode = aiErr.ErrCode | |||
handler.state.BuildState.StatusCode = ignoreCancelled(handler.state.BuildState.StatusCode, aiErr.ErrCode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ignoreCancelled(handler.state.BuildState.StatusCode, aiErr.ErrCode)
This makes me think that we're updating the BuildState.StatusCode
in the wrong place since it seems to denote the status of the build sequence of several artifacts as a whole whereas BuildFailed
is called for every artifact build failure.
Can we split out:
// BuildFailed notifies that a build has failed.
func BuildFailed(imageName string, err error) {
aiErr := sErrors.ActionableErr(sErrors.Build, err)
handler.handleBuildEvent(&proto.BuildEvent{
Artifact: imageName,
Status: Failed,
Err: err.Error(),
ErrCode: aiErr.ErrCode,
ActionableErr: aiErr})
}
// BuildSequenceFailed notifies that the build sequence has failed.
func BuildSequenceFailed(err error) {
aiErr := sErrors.ActionableErr(sErrors.Build, err)
handler.stateLock.Lock()
handler.state.BuildState.StatusCode = aiErr.ErrCode
handler.stateLock.Unlock()
}
and call BuildSequenceFailed
in scheduler.go
line 70
if err := g.Wait(); err != nil {
event.BuildSequenceFailed(err)
return nil, err
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. i think with the error group approach it would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL at the comments. Looks good otherwise.
Testing Notes: As per @gsquared94 suggestion, moving updating build status code in
|
TODO: |
Thanks a lot @gsquared94 I addressed all your comment except for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I didn't see this TODO and merged on all green. Can you please add the tests in another PR? |
will do. :) |
will do. I am thinking if it should be an integration test or just a unit test. Let me know if you have any thoughts or if you think you have already covered this case in |
I only said it because you had a comment above, otherwise I don't have any thoughts about it 🙂. |
Relates to #4692
This PR, tries to reduce BUILD_UNKNOWN for
docker
builder type when docker is not running.Description of changes:
BuildState
is represent right now, there is just one or most recent error code stored inhandler.state.BuildState.StatusCode
. From the event log, you can see the actual build error happens in the first build. The second build is cancelled as first build is failed. TheDevLoopEventFailed
uses this build state to propagate the error.Ignoring cancellation errors, also make sure the build state has the error code with last legit build failure.
Before this PR, on master
skaffold dev
With minikube deploy context skaffold fails when creating runner.
with GKE deploy context
With this PR
Minikube Deploy context
GKE deploy context
Event API testing: