-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding/binary: fast-paths in Read and Write allocate despite attempted optimization #27403
Comments
I noticed in an `alloc_objects` heap profile on a 3-node cluster restoring tpcc that more than 46% of all allocations were made in `computeChecksumPostApply`. Specifically, these allocations were all made in `Replica.sha512`. 28% of allocations were due to protobuf marshaling of `hlc.LegacyTimestamp`. The other 18% was in `encoding/binary.Write`. This removes both of these sources of per-key allocations. The first allocation was avoided by sharing a byte buffer across protobuf marshals. The second was avoided by removing the call to `binary.Write` (see golang/go#27403). I confirmed that this is no longer an issue by looking at heap profiles from before and after in a test that performed a consistency check. I plan to follow up on golang/go#27403 and search for any other offenders in our codebase. I already see a few. Release note: None
@nvanbenschoten - For the record, it will be good to get your go version and go env info. |
AFAICT, Go interfaces as designed in Go 1 don't play well with escape analysis. Any object passed through them always escape, period, and there is no way for the Go compiler to fix this without a whole-program compilation which is unlikely to ever happen. In the specific case of So I think this is a good use case for seeing if Go 2 generics can help design a new byteorder package that uses less runtime constructs (reflections and interfaces) and more compile time constructs, to avoid allocations and achieve more speed. /cc @ianlancetaylor @rsc |
29419: storage: avoid per-kv allocations during consistency checks r=nvanbenschoten a=nvanbenschoten I noticed in an `alloc_objects` heap profile on a 3-node cluster restoring tpcc that more than 46% of all allocations were made in `computeChecksumPostApply`. Specifically, these allocations were all made in `Replica.sha512`. 28% of allocations were due to protobuf marshaling of `hlc.LegacyTimestamp`. The other 18% was in `encoding/binary.Write`. This removes both of these sources of per-key allocations. The first allocation was avoided by sharing a byte buffer across protobuf marshals. The second was avoided by removing the call to `binary.Write` (see golang/go#27403). I confirmed that this is no longer an issue by looking at heap profiles from before and after in a test that performed a consistency check. I plan to follow up on golang/go#27403 and search for any other offenders in our codebase. I already see a few. Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Thanks for the responses. This was in go1.10, but like @rasky mentioned, it's a general problem with Go 1 and its interaction between interfaces and escape analysis. With the current design of With that in mind, would you be open to me sending a pull request to rip out this attempted optimization?
I completely agree and the most recent draft of generics is looking promising. Note though that this will require that generics be implemented at least in part using compile-time specialization, which it sounds like is still an open question. |
Either that, or we must make sure the compiler is smart enough when instantiating a generic whose type is being used as part of a type switch (is that possible at all?) or a good subset of reflect calls that could get intrinsified. |
git blame suggests to me that this was not an optimization attempt but merely an oversight while the code grew: https://golang.org/cl/12694048 If simplifying the code doesn’t cause a performance regression (I don’t see why it would, but I am AFK), that would be a welcome change. |
Marking as suggested: the simplification appears straightforward, and the only other thing to do is confirm that there is no performance regression. |
I did a little testing on this and found that the use of |
Change https://golang.org/cl/133135 mentions this issue: |
@iand I'm not sure I follow. I tried it (CL 133135) and it appeared to work nicely. What am I missing? |
@josharian CL 133135 is what I was expecting to be the outcome of this issue. Thanks for beating me to it 😃. I assume @iand was attempting to specialize |
@josharian I was looking into the original suggestion of specialising by Endian to reduce allocations. I didn't look at the fallback of simplifying and accepting that it allocates. |
Ah, I see. Thanks, @iand. And yes, if there’s any performance value in specialization it is more likely to come from Writers (particularly Writers that buffer). |
I noticed in an `alloc_objects` heap profile on a 3-node cluster restoring tpcc that more than 46% of all allocations were made in `computeChecksumPostApply`. Specifically, these allocations were all made in `Replica.sha512`. 28% of allocations were due to protobuf marshaling of `hlc.LegacyTimestamp`. The other 18% was in `encoding/binary.Write`. This removes both of these sources of per-key allocations. The first allocation was avoided by sharing a byte buffer across protobuf marshals. The second was avoided by removing the call to `binary.Write` (see golang/go#27403). I confirmed that this is no longer an issue by looking at heap profiles from before and after in a test that performed a consistency check. I plan to follow up on golang/go#27403 and search for any other offenders in our codebase. I already see a few. Release note: None
encoding/binary.Read
andencoding/binary.Write
both include an attempt to avoid heap allocations by pointing a slice at a stack allocated array with sufficient capacity to fit most basic types.go/src/encoding/binary/binary.go
Lines 161 to 170 in 58e970e
go/src/encoding/binary/binary.go
Lines 260 to 269 in 58e970e
The "optimization" is fooling itself because the array escapes from the stack and is moved onto the heap anyway. This means that at best the optimization does nothing and at worst the optimization actually results in either an oversized allocation or an entirely unnecessary second allocation.
The reason the array is moved to the heap is because escape analysis cannot prove that the slice pointing at it doesn't itself escape. This is the case because the slice is passed to the provided
ByteOrder
interface:go/src/encoding/binary/binary.go
Lines 33 to 39 in 58e970e
Escape analysis can't prove that the slice won't escape when passed to an arbitrary interface implementation, even though it can and does know that the slice won't escape if passed to a
littleEndian
orbigEndian
:We could get tricky here and specialize the code for
LittleEndian
andBigEndian
ByteOrder
s so that when they are provided as the desiredorder
the allocation is avoided, as the optimizationintended. However, this would result in a significant amount of code duplication in order to convince escape analysis that the array is safe to leave on the stack. In lieu of that, we should remove the "optimization" entirely because it's not doing any good.
The text was updated successfully, but these errors were encountered: