Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of trace id generation #977

Merged
merged 2 commits into from
Oct 23, 2024

Conversation

ankon
Copy link
Contributor

@ankon ankon commented Oct 21, 2024

Links

Related-to: #767

Details

Tried out various improvements for the linked issue, this PR improves the performance of the trace ID generation by doing the hex encoding in-place and avoiding an allocation. We see this function appearing in a high-throughput system, and hope that making it faster will also help us understand our own performance better.

benchstat for allocation avoidance:

goos: linux
goarch: amd64
pkg: github.com/newrelic/go-agent/v3/internal
cpu: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics     
                            │ before.txt  │              after.txt              │
                            │   sec/op    │   sec/op     vs base                │
TraceIDGenerator-16           43.33n ± 3%   30.85n ± 1%  -28.81% (p=0.000 n=10)
TraceIDGeneratorParallel-16   71.23n ± 4%   60.40n ± 2%  -15.22% (p=0.000 n=10)
geomean                       55.56n        43.16n       -22.31%

                            │ before.txt │             after.txt              │
                            │    B/op    │    B/op     vs base                │
TraceIDGenerator-16           32.00 ± 0%   16.00 ± 0%  -50.00% (p=0.000 n=10)
TraceIDGeneratorParallel-16   32.00 ± 0%   16.00 ± 0%  -50.00% (p=0.000 n=10)
geomean                       32.00        16.00       -50.00%

                            │ before.txt │             after.txt              │
                            │ allocs/op  │ allocs/op   vs base                │
TraceIDGenerator-16           2.000 ± 0%   1.000 ± 0%  -50.00% (p=0.000 n=10)
TraceIDGeneratorParallel-16   2.000 ± 0%   1.000 ± 0%  -50.00% (p=0.000 n=10)
geomean                       2.000        1.000       -50.00%

@ankon ankon marked this pull request as draft October 21, 2024 18:31
@CLAassistant
Copy link

CLAassistant commented Oct 21, 2024

CLA assistant check
All committers have signed the CLA.

We can pre-allocate a large-enough buffer, and then in-place encode the bits to avoid
the allocation done by hex.EncodeToString().

Effectively this shaves a couple of ns off the generation:
@ankon ankon force-pushed the fix/trace-id-lock-contention branch from 68e61a0 to ce4def8 Compare October 21, 2024 19:25
@ankon ankon changed the title Fix/trace id lock contention Improve performance of trace id generation Oct 22, 2024
@ankon ankon marked this pull request as ready for review October 22, 2024 11:46
@iamemilio
Copy link
Contributor

iamemilio commented Oct 22, 2024

This is really helpful, thanks for putting so much effort into this! Out of curiosity, I wonder if just using the default seed and Rand source would be more efficient, since its already synchronized: https://cs.opensource.google/go/go/+/refs/tags/go1.23.2:src/math/rand/rand.go;l=378. It's not like we need to do anything to the seed or the way the rand elements get generated, and then the locking is localized to exactly where we need it: https://cs.opensource.google/go/go/+/refs/tags/go1.23.2:src/math/rand/rand.go;l=543

@ankon
Copy link
Contributor Author

ankon commented Oct 23, 2024

I'm actually wondering whether we need the seeding at all. It is used for tests (fine), and otherwise just given the current timestamp.

If that could be removed, one could start looking at other approaches here, for instance a pool of rand's, or a ringbuffer that gets filled independently and the generation just pulling out some bits ...

@iamemilio
Copy link
Contributor

iamemilio commented Oct 23, 2024

I think what you have proposed is a good incremental improvement. There are a few oddities about the design and implementation of this object IMO, and I think you're right about the seeding. If anything, seeding based on timestamps may actually be decreasing the randomness. Luckily, the odds of a duplicate are incredibly low. I don't see the point of testing a specific seed. This function needs to return a unique hex slice of either 8 or 16 bits in length.

If you wanted to keep playing with this, I don't mind reviewing another PR. Otherwise, we can create a risk item to address this. I think a buffered channel of pre-generated Transaction_IDs, and another for Span_IDs may do the trick. Creating a pool of generators may increase the risk of duplication depending on how they are seeded, and I am not sure the possible performance upside would be worth the added complexity of mitigating that.

@iamemilio iamemilio merged commit a2f8e83 into newrelic:master Oct 23, 2024
56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants