Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http/httputil: ReverseProxy no longer uses io.ReaderFrom/WriterTo #21814

Closed
Zariel opened this issue Sep 8, 2017 · 18 comments
Closed

net/http/httputil: ReverseProxy no longer uses io.ReaderFrom/WriterTo #21814

Zariel opened this issue Sep 8, 2017 · 18 comments
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.

Comments

@Zariel
Copy link

Zariel commented Sep 8, 2017

Before go 1.8 httputil.ReverseProxy would use io.CopyBuffer which would use io.ReaderFrom or io.WriterTo if possible to avoid allocation. In fixing #16659 it removed using io.CopyBuffer and implements its own without the optimisations of io.CopyBuffer to add an extra log.

Please answer these questions before submitting your issue. Thanks!

What did you do?

https://play.golang.org/p/4GjT-hwI3t

What did you expect to see?

WriteTo

What did you see instead?

Read

System details

go version go1.9 darwin/amd64
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/chris/dev/go"
GORACE=""
GOROOT="/usr/local/opt/go/libexec"
GOTOOLDIR="/usr/local/opt/go/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOROOT/bin/go version: go version go1.9 darwin/amd64
GOROOT/bin/go tool compile -V: compile version go1.9
uname -v: Darwin Kernel Version 15.3.0: Thu Dec 10 18:40:58 PST 2015; root:xnu-3248.30.4~1/RELEASE_X86_64
ProductName:	Mac OS X
ProductVersion:	10.11.3
BuildVersion:	15D21
lldb --version: lldb-340.4.110.1
gdb --version: GNU gdb (GDB) 8.0
@Zariel
Copy link
Author

Zariel commented Sep 8, 2017

I suggest either reverting to calling io.CopyBuffer and logging the error returned from that or adding the interface checks to copyResponse. The former makes more sense to me as would also log write errors which adds more symmetry. I have a patch available if its an acceptable change.

@Zariel Zariel changed the title net/http/httputil: ReverseProxy no longer users io.ReaderFrom/WriterTo net/http/httputil: ReverseProxy no longer uses io.ReaderFrom/WriterTo Sep 11, 2017
@tombergan tombergan self-assigned this Sep 11, 2017
@tomwans
Copy link
Contributor

tomwans commented Sep 19, 2017

The former makes more sense to me as would also log write errors which adds more symmetry.

Looking at the CL that introduced this, it seems that the intent was to log only read errors: https://go-review.googlesource.com/c/go/+/30692/5/src/net/http/httputil/reverseproxy.go#264. That is why io.CopyBuffer was copied in. Whatever solution we use here should probably retain that property.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/65690 mentions this issue: net/http/httputil: respect WriterTo and ReadFrom by using io.CopyBuffer

@tombergan
Copy link
Contributor

tombergan commented Sep 27, 2017

I'm not sure how to resolve this without breaking #16659 or making errors spammy, which @bradfitz objected to. The issue: when WriteTo/ReadFrom returns an error, it's impossible to know if the error came from reading from src or writing to dst. That detail is hidden inside the WriteTo/ReadFrom implementation. Options are:

  1. Leave the code as-is (don't support WriteTo/ReadFrom).
  2. Add WriteTo/ReadFrom and log all errors from those methods, possibly making logs spammy.
  3. Add WriteTo/ReadFrom but don't log any errors from those methods, possibly breaking net/http/httputil: ReverseProxy does not log errors during CopyBuffer #16659.

My inclination is to go with option 1. I don't believe that WriteTo or ReadFrom can use sendfile because resp.Body is never a simple file descriptor (not even when the incoming response is HTTP/1 without chunked encoding). At best WriteTo/ReadFrom might provide slightly more optimal buffering. I would want to see evidence of improved performance from WriteTo/ReadFrom over Read/Write before actually moving forward.

/cc @fraenkel, @Zariel

@tombergan tombergan removed their assignment Sep 27, 2017
@tombergan tombergan added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Sep 27, 2017
@fraenkel
Copy link
Contributor

@tombergan Its more or less how we ended up with the current implementation. Since the http package doesn't provide any optimized ReadFrom/WriteTo, I would be ok with option 3.

@tombergan
Copy link
Contributor

tombergan commented Sep 27, 2017

Actually, net/http's ResponseWriter implements ReaderFrom, so I think option 3 would break #16659.

To clarify: the WaitinfForInfo state is waiting for some justification that WriteTo/ReadFrom provide an actual (measurable) performance gain.

@gopherbot
Copy link
Contributor

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

@Zariel
Copy link
Author

Zariel commented May 7, 2018

Now that #10948 has landed Go's ReverseProxy will not be able to use splice due to #16659

@jameshartig
Copy link
Contributor

Now that we have splice being used it's unfortunate that it's not used in ReverseProxy. I vote for using io.CopyBuffer here so there isn't a need to consistantly keep copying functionality over and any future improvements in the io package will benefit the ReverseProxy.

We currently log over 16,000 "http: proxy error: context canceled" errors per hour but I haven't seen any of the "read error" logs. Now that we have ErrorHandler if a user wants to filter out specific spammy errors (like the one above) they can do so in their own ErrorHandler.

@fraenkel, what are your thoughts?

@fraenkel
Copy link
Contributor

@fastest963 I understand that you have never seen the issue but the reason why the code was changed was because we did. It was far too common to have downstream clients disappear and have no knowledge in the proxy as to who was at fault.
We can't just switch to io.Copy or io.CopyBuffer because we now lose this detail.

Do you have any data that shows using splice provides value? Which is what @tombergan was asking for above.

@jameshartig
Copy link
Contributor

@fraenkel I was saying that if you added your own ErrorHandler you'd be able to capture both writing and reading errors and handle them however you like. You can ignore write errors and log reader errors. So it sounds like 2 is the best option since by default it will log all errors but you can handle it however you like if you want.

@fraenkel
Copy link
Contributor

@fastest963 That is the problem, you cannot distinguish between read and write errors, its all errors. In #21814 (comment), the point was to ignore write errors. If I could have done that without splitting CopyBuffer apart I would have.
If you can tell me how to detect read errors and ignore write errors, we can easily fix this.

@jameshartig
Copy link
Contributor

jameshartig commented Aug 25, 2018

@fraenkel I would argue that you might want to care about some write errors. I don't know the history of why you only care about read errors but you could do something like:

func(err error) {
  if nerr, ok := err.(*net.OpError); ok && nerr.Op == "write" {
    return
  }
  // do some logging of err
}

What I'm proposing (#2 in @tombergan's comment) is that using io.CopyBuffer would end up logging all errors but if that is too noisy for you (which was @bradfitz's concern) then you can just define your own ErrorHandler method and ignore whatever errors are too noisy. We're already logging 10s of thousands of errors right now and I can't imagine that adding write errors to that would be an order of magnitude more. We already need to add filtering for context canceled errors so we'll just add whatever other ones come up that are too annoying for us. But I'd rather the default be to log more than less especially since you can always define your own handler to customize it.

Additionally, I have not tested splice yet but the fact that ReverseProxy has its own "gimped" copy of io.CopyBuffer just means it's going to get further and further behind on any improvements, bugfixes, changes, etc made to the io package method. It seems better to have it use the io package so any changes made to CopyBuffer don't need to be remembered to be copied over to the httputil package.

@fraenkel
Copy link
Contributor

I did add a benchmark to see what the improvement might be. Here are the results based on fixed size responses of 100, 1000 and 8000 bytes. But as expected there is no way to detect read errors.
So the question is whether there is value in offering a way to enable a potentially more optimal path where you will lose on read errors. Just so you are aware, on any short read, you will get no logged error. The error returned from the io.CopyBuffer() is "unexpected EOF" as a string. That gets dropped in the ReverseProxy.ServeHTTP.

name         old time/op    new time/op    delta
ServeHTTP-8     117µs ± 0%     111µs ± 1%   -5.60%  (p=0.000 n=10+10)

name         old alloc/op   new alloc/op   delta
ServeHTTP-8    43.1kB ± 0%    10.6kB ± 0%  -75.27%  (p=0.000 n=9+9)

name         old allocs/op  new allocs/op  delta
ServeHTTP-8       133 ± 0%       133 ± 0%     ~     (all equal)
name         old time/op    new time/op    delta
ServeHTTP-8     120µs ± 1%     113µs ± 1%   -5.59%  (p=0.000 n=10+10)

name         old alloc/op   new alloc/op   delta
ServeHTTP-8    45.5kB ± 0%    13.2kB ± 0%  -70.91%  (p=0.000 n=10+9)

name         old allocs/op  new allocs/op  delta
ServeHTTP-8       134 ± 1%       134 ± 0%     ~     (p=0.211 n=10+10)
name         old time/op    new time/op    delta
ServeHTTP-8     145µs ± 1%     139µs ± 1%   -3.94%  (p=0.000 n=10+10)
name         old alloc/op   new alloc/op   delta
ServeHTTP-8    81.7kB ± 0%    51.0kB ± 0%  -37.55%  (p=0.000 n=9+10)
name         old allocs/op  new allocs/op  delta
ServeHTTP-8       144 ± 0%       144 ± 0%     ~     (all equal)

@jameshartig
Copy link
Contributor

@fraenkel thanks for the benchmarks those definitely are useful for context. What do you mean by:

The error returned from the io.CopyBuffer() is "unexpected EOF" as a string.

It has to be an error type, right? Can you print out the type? Is it a *net.OpError?

When you say:

Just so you are aware, on any short read, you will get no logged error.

What was the change you made? Did you take the error and call getErrorHandler if there was an error from copyResponse?

@fraenkel
Copy link
Contributor

@fastest963 You are correct, I forgot to call logf. But here is the actual error that comes back from io.CopyBuffer, &errors.errorString{s:"unexpected EOF"}. Is that a read or write error?

@jameshartig
Copy link
Contributor

@fraenkel looks like that's an io.ErrUnexpectedEOF which I would assume is from a read. The docs:

EOF is the error returned by Read when no more input is available. Functions should return EOF only to signal a graceful end of input. If the EOF occurs unexpectedly in a structured data stream, the appropriate error is either ErrUnexpectedEOF or some other error giving more detail.

Though I'm not sure how you got that considering:

func (c *TCPConn) ReadFrom(r io.Reader) (int64, error) {
	if !c.ok() {
		return 0, syscall.EINVAL
	}
	n, err := c.readFrom(r)
	if err != nil && err != io.EOF {
		err = &OpError{Op: "readfrom", Net: c.fd.net, Source: c.fd.laddr, Addr: c.fd.raddr, Err: err}
	}
	return n, err
}

If the error is not io.EOF it wraps it in net.OpError so idk where that ErrUnexpectedEOF is coming from.

@fraenkel
Copy link
Contributor

Digging a bit deeper. Splice will never be used in reverseproxy. Splice requires a TCP conn on both sides which never happens. At best you see ReadFrom or WriteTo come into play which is what is happening. Again this is part of the issue, there are 3 possible paths, all which provide different results.
Using io.CopyBuffer, content is transferring between a *http.response and *http.bodyEOFSignal.
The error comes from transfer.go:823

@golang golang locked and limited conversation to collaborators Aug 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

6 participants