Improve performance of trace id generation #977

ankon · 2024-10-21T18:31:37Z

Links

Related-to: #767

Details

Tried out various improvements for the linked issue, this PR improves the performance of the trace ID generation by doing the hex encoding in-place and avoiding an allocation. We see this function appearing in a high-throughput system, and hope that making it faster will also help us understand our own performance better.

benchstat for allocation avoidance:

goos: linux
goarch: amd64
pkg: github.com/newrelic/go-agent/v3/internal
cpu: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics     
                            │ before.txt  │              after.txt              │
                            │   sec/op    │   sec/op     vs base                │
TraceIDGenerator-16           43.33n ± 3%   30.85n ± 1%  -28.81% (p=0.000 n=10)
TraceIDGeneratorParallel-16   71.23n ± 4%   60.40n ± 2%  -15.22% (p=0.000 n=10)
geomean                       55.56n        43.16n       -22.31%

                            │ before.txt │             after.txt              │
                            │    B/op    │    B/op     vs base                │
TraceIDGenerator-16           32.00 ± 0%   16.00 ± 0%  -50.00% (p=0.000 n=10)
TraceIDGeneratorParallel-16   32.00 ± 0%   16.00 ± 0%  -50.00% (p=0.000 n=10)
geomean                       32.00        16.00       -50.00%

                            │ before.txt │             after.txt              │
                            │ allocs/op  │ allocs/op   vs base                │
TraceIDGenerator-16           2.000 ± 0%   1.000 ± 0%  -50.00% (p=0.000 n=10)
TraceIDGeneratorParallel-16   2.000 ± 0%   1.000 ± 0%  -50.00% (p=0.000 n=10)
geomean                       2.000        1.000       -50.00%

…urrently

CLAassistant · 2024-10-21T18:31:43Z

All committers have signed the CLA.

We can pre-allocate a large-enough buffer, and then in-place encode the bits to avoid the allocation done by hex.EncodeToString(). Effectively this shaves a couple of ns off the generation:

iamemilio · 2024-10-22T14:04:48Z

This is really helpful, thanks for putting so much effort into this! Out of curiosity, I wonder if just using the default seed and Rand source would be more efficient, since its already synchronized: https://cs.opensource.google/go/go/+/refs/tags/go1.23.2:src/math/rand/rand.go;l=378. It's not like we need to do anything to the seed or the way the rand elements get generated, and then the locking is localized to exactly where we need it: https://cs.opensource.google/go/go/+/refs/tags/go1.23.2:src/math/rand/rand.go;l=543

ankon · 2024-10-23T08:44:23Z

I'm actually wondering whether we need the seeding at all. It is used for tests (fine), and otherwise just given the current timestamp.

If that could be removed, one could start looking at other approaches here, for instance a pool of rand's, or a ringbuffer that gets filled independently and the generation just pulling out some bits ...

iamemilio · 2024-10-23T16:26:41Z

I think what you have proposed is a good incremental improvement. There are a few oddities about the design and implementation of this object IMO, and I think you're right about the seeding. If anything, seeding based on timestamps may actually be decreasing the randomness. Luckily, the odds of a duplicate are incredibly low. I don't see the point of testing a specific seed. This function needs to return a unique hex slice of either 8 or 16 bits in length.

If you wanted to keep playing with this, I don't mind reviewing another PR. Otherwise, we can create a risk item to address this. I think a buffered channel of pre-generated Transaction_IDs, and another for Span_IDs may do the trick. Creating a pool of generators may increase the risk of duplication depending on how they are seeded, and I am not sure the possible performance upside would be worth the added complexity of mitigating that.

Add a benchmark for invoking the TraceIDGenerator.GenerateSpanID conc…

40efd0b

…urrently

ankon marked this pull request as draft October 21, 2024 18:31

Avoid an additional allocation for the encoding

ce4def8

We can pre-allocate a large-enough buffer, and then in-place encode the bits to avoid the allocation done by hex.EncodeToString(). Effectively this shaves a couple of ns off the generation:

ankon force-pushed the fix/trace-id-lock-contention branch from 68e61a0 to ce4def8 Compare October 21, 2024 19:25

ankon mentioned this pull request Oct 22, 2024

Heavily contended lock in TraceIDGenerator.generateID #767

Open

ankon changed the title ~~Fix/trace id lock contention~~ Improve performance of trace id generation Oct 22, 2024

ankon marked this pull request as ready for review October 22, 2024 11:46

iamemilio approved these changes Oct 23, 2024

View reviewed changes

iamemilio merged commit a2f8e83 into newrelic:master Oct 23, 2024
56 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of trace id generation #977

Improve performance of trace id generation #977

ankon commented Oct 21, 2024 •

edited

Loading

CLAassistant commented Oct 21, 2024 •

edited

Loading

iamemilio commented Oct 22, 2024 •

edited

Loading

ankon commented Oct 23, 2024

iamemilio commented Oct 23, 2024 •

edited

Loading

Improve performance of trace id generation #977

Improve performance of trace id generation #977

Conversation

ankon commented Oct 21, 2024 • edited Loading

Links

Details

CLAassistant commented Oct 21, 2024 • edited Loading

iamemilio commented Oct 22, 2024 • edited Loading

ankon commented Oct 23, 2024

iamemilio commented Oct 23, 2024 • edited Loading

ankon commented Oct 21, 2024 •

edited

Loading

CLAassistant commented Oct 21, 2024 •

edited

Loading

iamemilio commented Oct 22, 2024 •

edited

Loading

iamemilio commented Oct 23, 2024 •

edited

Loading