-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch Thrift with Jaeger's fork #3050
Switch Thrift with Jaeger's fork #3050
Conversation
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
@@ -73,4 +73,4 @@ require ( | |||
honnef.co/go/tools v0.1.4 | |||
) | |||
|
|||
replace github.com/gogo/protobuf => github.com/gogo/protobuf v1.3.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The protobuf dependency has been removed, as the one used in the main section is the same.
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
@@ -225,7 +226,7 @@ func TestFormatBadBody(t *testing.T) { | |||
statusCode, resBodyStr, err := postBytes(server.URL+`/api/v1/spans`, []byte("not good"), createHeader("application/x-thrift")) | |||
assert.NoError(t, err) | |||
assert.EqualValues(t, http.StatusBadRequest, statusCode) | |||
assert.EqualValues(t, "Unable to process request body: Unknown data type 111\n", resBodyStr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was interesting, as the new error is this: Unable to process request body: size exceeded max allowed: 1869881447
. This seems to indicate that the fix is being picked up, but it did raise a warning in my head: are we really allocating more than 1GiB for this? Apparently, no. I used runtime.MemStats
to measure the memory consumption before and after the HTTP call, and the usage was minimal (2 Alloc vs. 3 Alloc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed interesting.
The old error message "Unknown data type" comes from Skip
function, which is called when either the field id is not defined in the struct, or when the type of the field id doesn't match what's defined in the struct.
The new error message "size exceeded max allowed" comes from size sanity checks, which could be called by a lot of TProtocol
functions (for example, ReadString
, ReadBinary
, ReadMapBegin
, ReadListBegin
, ReadSetBegin
). This means the new code actually passed the field type check and is no longer skipped (or maybe previously the first field happened to match and it's the second field failed the field type check and caused the error, while in the new code the first field passed field type check but failed the size sanity check).
runtime.MemStats
might be misleading. I believe it's the same as benchmark tests' ReportAllocs
, and this benchmark test reported 0 allocs/0 bytes which is certainly a lie (I have to use a much smaller size because the original one is too slow):
package main
import (
"testing"
)
// const size = 1869881447
const size = 1869
func BenchmarkAlloc(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
_ = make([]int, size)
}
}
$ go test -bench . -benchmem
goos: linux
goarch: amd64
pkg: foo
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkAlloc-12 1000000000 0.2487 ns/op 0 B/op 0 allocs/op
PASS
ok foo 0.278s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fishy, do you think this change raises flags? I wouldn't expect the payload for this test to cause such a huge message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think this change raises flags?
Not necessarily.
If you dig into the code on what this does, the first ever read from the thrift payload is to ReadListBegin
, which is one of the functions that would trigger the new error.
Let's first convert the payload ([]byte("not good")
) into raw bytes: [6e 6f 74 20 67 6f 6f 64]
.
In ReadListBegin
implementation, first it reads a byte, 6e
, then it tries to read an int32 as the size of the list. For the reading of the int32, it reads the next 4 bytes (6f 74 20 67
), decode with bigendian, which results in 1869881447.
So this is totally expected behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And following ReadListBegin
, jaeger's code didn't really do the pre-allocation. it just append spans to the slice and rely on append
to do the allocation and grow the slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
y'all even already have a comment on that :)
jaeger/model/converter/thrift/zipkin/deserialize.go
Lines 51 to 52 in ff32436
// We don't depend on the size returned by ReadListBegin to preallocate the array because it | |
// sometimes returns a nil error on bad input and provides an unreasonably large int for size |
Codecov Report
@@ Coverage Diff @@
## master #3050 +/- ##
==========================================
+ Coverage 96.00% 96.03% +0.03%
==========================================
Files 229 229
Lines 9937 9937
==========================================
+ Hits 9540 9543 +3
+ Misses 327 325 -2
+ Partials 70 69 -1
Continue to review full report at Codecov.
|
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Closes #2638 by using the Jaeger fork of Thrift, which contains the unreleased fix to the memory consumption issue that affects the Jaeger Agent.
Once Apache Thrift 0.15.0 or 0.14.2 is released, the replace directive should be removed.
Signed-off-by: Juraci Paixão Kröhling juraci@kroehling.de