Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds error code for context deadline exceeded #2008

Merged
merged 6 commits into from
May 19, 2021

Conversation

vishalkuo
Copy link
Contributor

I came across this issue and wanted to take a stab adding a code for context deadline exceeded errors since http timeouts are near and dear to my heart. I know the task is tagged as refactor so I'm not sure if a larger overhaul is due, but I figure this is a small change to cover a frequent error code and might be worth getting in sooner.

A few notes / questions about this:

  • How do we pick error codes? I have arbitrarily selected 1800 but don't feel strongly.
  • Is there a way to test for the context.DeadlineExceeded error inside the switch case? Unfortunately, the go stdlib defines it as follows (doesn't export the struct) so I'm not sure:
// DeadlineExceeded is the error returned by Context.Err when the context's
// deadline passes.
var DeadlineExceeded error = deadlineExceededError{}

type deadlineExceededError struct{}

@imiric
Copy link
Contributor

imiric commented May 7, 2021

Hi, thanks for proposing this.

I think this might need a deeper refactor though. context deadline exceeded is a very generic error and could be returned by any process that uses WithTimeout() contexts, not just request timeouts. Instead, I think we should return a specific internal error for request timeouts and then catch it in the switch to return an associated error code for it. And probably also do the same for other cases where timeouts are used, as we shouldn't work with DeadlineExceeded errors across internal packages anyway. But that might take some work to track down, of course.

As for your questions:

How do we pick error codes?

I think it's mostly arbitrary, with some distance between each to give us room to expand if needed. 1800 sounds good to me for this case.

Is there a way to test for the context.DeadlineExceeded error inside the switch case?

Apparently not 😞 But like I mentioned above, we should probably wrap it in our own internal errors and use those in the switch, so that we don't have to deal with it directly.

As you know from your other PR, I'll have to discuss this with the other team members, and we surely won't merge this for this upcoming release, so we have some time to figure out the details.

Thanks again!

@vishalkuo
Copy link
Contributor Author

vishalkuo commented May 7, 2021

I understand. If that's the case, how do we feel about extending this PR a little bit to also create a requestTimeout error to handle and wrap the context deadline exceeded case? A longer term refactor makes sense but generally passing around more descriptive errors is probably something that's useful in the short term?

My gut says we can handle that with a small change here but let me know if that doesn't make sense.

@imiric
Copy link
Contributor

imiric commented May 7, 2021

I think that would be OK, and that location seems right to me. But you should probably wait for feedback from @na-- or @mstoykov sometime next week before starting, they might have a better idea.

BTW, I was wrong about error codes being arbitrary 🤦‍♂️ There's actually some logic to the grouping I forgot about: https://k6.io/docs/javascript-api/error-codes/

Though I'm not sure where to place this request timeout error, since it's not an HTTP or TCP error, just a context timeout... Maybe under General, so 1050?

@na-- na-- added this to the v0.33.0 milestone May 10, 2021
@na--
Copy link
Member

na-- commented May 10, 2021

1050 seems better than 1800 👍

My gut says we can handle that with a small change here but let me know if that doesn't make sense.

Yeah, that seems about right. You should also be able to directly use the K6Error type for this easily enough: https://github.com/k6io/k6/blob/baa5a8af1591ef3c8aa5d5cb1540bceb4a550b6e/lib/netext/httpext/error_codes.go#L222-L224

@@ -243,6 +243,9 @@ func (t *transport) RoundTrip(req *http.Request) (*http.Response, error) {
reqWithTracer := req.WithContext(httptrace.WithClientTrace(ctx, tracer.Trace()))
resp, err := t.state.Transport.RoundTrip(reqWithTracer)

if typErr, ok := err.(net.Error); ok && typErr.Timeout() {
err = NewK6Error(requestTimeoutErrorCode, requestTimeoutErrorCodeMsg, err)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up having to put this in transport instead of request.go. Since transport actually takes care of reporting the error here we need to wrap it goes any further - I tested this with a built CLI to confirm that the error code does propagate all the way to the JS response object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm we already have something like this in here, but it seems only for dial timeouts: https://github.com/k6io/k6/blob/baa5a8af1591ef3c8aa5d5cb1540bceb4a550b6e/lib/netext/httpext/error_codes.go#L143-L146

And it seems likely that your current fix masks that branch, i.e. we'd no longer be able to distinguish between a dial timeout and a timeout after the TCP connection was established. If you move your check in this if body, it will probably also work, without masking dial timeout detection, and while keeping the error detection code in the same place: https://github.com/k6io/k6/blob/baa5a8af1591ef3c8aa5d5cb1540bceb4a550b6e/lib/netext/httpext/error_codes.go#L132

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misunderstanding here but I think the err type returned from RoundTrip is a context.deadlineExceededError. This means that it'll never end up making it into errorCodeForNetOpError here. This is also shown by the fact that context deadline exceeded errors return a code of 1000 which is not returned anywhere from errorCodeForNetOpError.

I think that given that these errors will be completely separate times, it should prevent us from masking the dial related errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error you create here and then get set in the struct below is the one that then goes through the error_code matching code and net.OpError (the one that is matched in there) does implement net.Error so it will definitely mask it ;).

On the other hand, I am not really certain it is needed, the original code (written by me) was just me trying to match as many errors as we have seen "somehow" which was over a year ago and so the current niceties in the errors package weren't there.

I probably should've had just one timeout error as it is here and had it do everything.

I am not even certain we have had dial timeout error at any point in time or have I just by reading the net code found that this will match something "hopefully" and added it, I certainly tried this for a lot of errors that turned out later weren't matched correctly, because I didn't get the exact wrapping correct and there wasn't errors.As :(.

So I am fine with just dropping the dial timeout error code (and code) and having the rest of the change as it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - if we think that creating a generic "timeout" error code is more useful then I've removed the dial conditional of the netOp error branch. Let me know if this isn't what you had in mind.

@codecov-commenter
Copy link

codecov-commenter commented May 18, 2021

Codecov Report

Merging #2008 (90fcbe7) into master (6b635e9) will increase coverage by 0.01%.
The diff coverage is 100.00%.

❗ Current head 90fcbe7 differs from pull request most recent head 1675211. Consider uploading reports for the commit 1675211 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2008      +/-   ##
==========================================
+ Coverage   71.72%   71.73%   +0.01%     
==========================================
  Files         179      179              
  Lines       14085    14086       +1     
==========================================
+ Hits        10102    10105       +3     
+ Misses       3347     3345       -2     
  Partials      636      636              
Flag Coverage Δ
ubuntu 71.67% <100.00%> (+<0.01%) ⬆️
windows 71.37% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
lib/netext/httpext/error_codes.go 98.63% <ø> (-0.04%) ⬇️
lib/netext/httpext/transport.go 97.22% <100.00%> (+0.07%) ⬆️
js/runner.go 81.76% <0.00%> (+0.56%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6b635e9...1675211. Read the comment docs.

Copy link
Contributor

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the PR 🎉

After looking at some more data it seems that dial timeout does happen quite often, and it's probably going to be better if we keep it so we can differentiate between them. But also it will be nice to have a test so I am for just merging this PR as is and I will try to add it back in #2025 and write a test for it

WDYT @na-- @imiric

Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah, the current PR is better than before, Request timeout is a much better error message than context deadline exceeded.

That said, it would be even better if we could differentiate between dial timeouts (i.e. k6 times out before it even could establish a network connection to the remote host) and between timeouts (the connection was established, but the request didn't finish before the specified timeout duration).

So I am fine with merging this PR as it is and trying to fix that in #2025

Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this! Let's just squash it when we merge/rebase.

One tiny nitpick is that I would make "Request timeout" lowercase to match the other errors, but we can fix that later.

@mstoykov mstoykov merged commit 9fca276 into grafana:master May 19, 2021
mstoykov added a commit that referenced this pull request May 19, 2021
This was dropped as part of #2008 but it will be better if we
differentiate between different timeouts
mstoykov added a commit that referenced this pull request May 19, 2021
This was dropped as part of #2008 but it will be better if we
differentiate between different timeouts
@vishalkuo vishalkuo deleted the feature/error-code branch May 19, 2021 16:40
mstoykov added a commit that referenced this pull request May 20, 2021
This was dropped as part of #2008 but it will be better if we
differentiate between different timeouts

Also add test for the `request timeout` error code in `httpext`
mstoykov added a commit that referenced this pull request May 20, 2021
This was dropped as part of #2008 but it will be better if we
differentiate between different timeouts

Also add test for the `request timeout` error code in `httpext`
mstoykov added a commit that referenced this pull request May 20, 2021
This was dropped as part of #2008 but it will be better if we
differentiate between different timeouts

Also add test for the `request timeout` error code in `httpext`
mstoykov added a commit that referenced this pull request May 21, 2021
This was dropped as part of #2008 but it will be better if we
differentiate between different timeouts

Also add test for the `request timeout` error code in `httpext`
imiric pushed a commit to grafana/k6-docs that referenced this pull request Jun 29, 2021
imiric pushed a commit to grafana/k6-docs that referenced this pull request Jun 29, 2021
imiric pushed a commit to grafana/k6-docs that referenced this pull request Jun 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants