-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slight improvement to walk package #287
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
emcfarlane
approved these changes
Apr 22, 2024
jhump
added a commit
that referenced
this pull request
Apr 22, 2024
After measuring the impact of #286, #287, and #290 and seeing it to be too modest. I decided to use a memory profiler, and it found "the good stuff". These changes had the largest impact on allocations and performance. When linking inputs that come from descriptor protos (as opposed to inputs that are compiled from sources and have ASTs), this resulted in a 23% reduction in latency and 70% reduction in allocations. This change features the following improvements: 1. `ast.NoSourceNode` now has a pointer receiver, so wrapping one in an `ast.Node` interface value doesn't incur an allocation to put the value on the heap. This also updates `parser.ParseResult` to refer to a single `*ast.NoSourceNode` when it has no AST, instead of allocating one in each call to get a node value. The `NoSourceNode`'s underlying type is now `ast.FileInfo` so that it can be allocation-free, even for the `NodeInfo` method (which previously was allocating a new `FileInfo` each time). 3. Don't allocate a slice to hold the set of checked files for each element being resolved. Instead, we allocate a single slice up front, and re-use that throughout. 4. Don't pro-actively allocate strings that only are used for error messages; instead defer construction of the change to the construction of the error.
kralicky
pushed a commit
to kralicky/protocompile
that referenced
this pull request
May 19, 2024
I added benchmarks, in the hopes of finding a more efficient way to traverse descriptor protos, from `walk.DescriptorProtos`. The main cost is the way the fully-qualified names are computed/allocated as it traverses the descriptor hierarchy. While I did not come up with any meaningful improvements there, I was able to improve the other walk function (`walk.Descriptors`), by making fewer interface method calls, memoizing the results of the various accessors. This improves throughput, consistently taking about 15% less time per operation. (cherry picked from commit 63736ac)
kralicky
pushed a commit
to kralicky/protocompile
that referenced
this pull request
May 19, 2024
After measuring the impact of bufbuild#286, bufbuild#287, and bufbuild#290 and seeing it to be too modest. I decided to use a memory profiler, and it found "the good stuff". These changes had the largest impact on allocations and performance. When linking inputs that come from descriptor protos (as opposed to inputs that are compiled from sources and have ASTs), this resulted in a 23% reduction in latency and 70% reduction in allocations. This change features the following improvements: 1. `ast.NoSourceNode` now has a pointer receiver, so wrapping one in an `ast.Node` interface value doesn't incur an allocation to put the value on the heap. This also updates `parser.ParseResult` to refer to a single `*ast.NoSourceNode` when it has no AST, instead of allocating one in each call to get a node value. The `NoSourceNode`'s underlying type is now `ast.FileInfo` so that it can be allocation-free, even for the `NodeInfo` method (which previously was allocating a new `FileInfo` each time). 3. Don't allocate a slice to hold the set of checked files for each element being resolved. Instead, we allocate a single slice up front, and re-use that throughout. 4. Don't pro-actively allocate strings that only are used for error messages; instead defer construction of the change to the construction of the error. (cherry picked from commit 016b009)
kralicky
pushed a commit
to kralicky/protocompile
that referenced
this pull request
Jun 8, 2024
I added benchmarks, in the hopes of finding a more efficient way to traverse descriptor protos, from `walk.DescriptorProtos`. The main cost is the way the fully-qualified names are computed/allocated as it traverses the descriptor hierarchy. While I did not come up with any meaningful improvements there, I was able to improve the other walk function (`walk.Descriptors`), by making fewer interface method calls, memoizing the results of the various accessors. This improves throughput, consistently taking about 15% less time per operation. (cherry picked from commit 63736ac)
kralicky
pushed a commit
to kralicky/protocompile
that referenced
this pull request
Jun 8, 2024
After measuring the impact of bufbuild#286, bufbuild#287, and bufbuild#290 and seeing it to be too modest. I decided to use a memory profiler, and it found "the good stuff". These changes had the largest impact on allocations and performance. When linking inputs that come from descriptor protos (as opposed to inputs that are compiled from sources and have ASTs), this resulted in a 23% reduction in latency and 70% reduction in allocations. This change features the following improvements: 1. `ast.NoSourceNode` now has a pointer receiver, so wrapping one in an `ast.Node` interface value doesn't incur an allocation to put the value on the heap. This also updates `parser.ParseResult` to refer to a single `*ast.NoSourceNode` when it has no AST, instead of allocating one in each call to get a node value. The `NoSourceNode`'s underlying type is now `ast.FileInfo` so that it can be allocation-free, even for the `NodeInfo` method (which previously was allocating a new `FileInfo` each time). 3. Don't allocate a slice to hold the set of checked files for each element being resolved. Instead, we allocate a single slice up front, and re-use that throughout. 4. Don't pro-actively allocate strings that only are used for error messages; instead defer construction of the change to the construction of the error. (cherry picked from commit 016b009)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I added the benchmarks, in the hopes of find a more efficient way to traverse descriptor protos, from
walk.DescriptorProtos
. The main cost is the way the fully-qualified names are computed/allocated as it traverses the descriptor hierarchy.Ultimately, I did not come up with any better formulation. While I was able to reduce allocations slightly (from 316 allocs/op to 293 allocs/op), that approach actually yielded slightly worse throughput (consistently needing 5-7% more time per operation than the approach w/ slightly more allocations).
But surprisingly, the very simple changes to other walk function (
walk.Descriptors
), which just memoize the results of the various accessors, did actually result in a throughput improvement, consistently taking about 15% less time per operation.Benchmark results below.
before:
after: