Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slight improvement to walk package #287

Merged
merged 3 commits into from
Apr 22, 2024
Merged

Slight improvement to walk package #287

merged 3 commits into from
Apr 22, 2024

Conversation

jhump
Copy link
Member

@jhump jhump commented Apr 22, 2024

I added the benchmarks, in the hopes of find a more efficient way to traverse descriptor protos, from walk.DescriptorProtos. The main cost is the way the fully-qualified names are computed/allocated as it traverses the descriptor hierarchy.

Ultimately, I did not come up with any better formulation. While I was able to reduce allocations slightly (from 316 allocs/op to 293 allocs/op), that approach actually yielded slightly worse throughput (consistently needing 5-7% more time per operation than the approach w/ slightly more allocations).

But surprisingly, the very simple changes to other walk function (walk.Descriptors), which just memoize the results of the various accessors, did actually result in a throughput improvement, consistently taking about 15% less time per operation.

Benchmark results below.

before:

BenchmarkDescriptors-10         	  352,322	      3,414 ns/op	       0 B/op	       0 allocs/op
BenchmarkDescriptorProtos-10    	  118,027	     10,055 ns/op	   16,232 B/op	     316 allocs/op

after:

BenchmarkDescriptors-10         	  409,730	      2,928 ns/op	       0 B/op	       0 allocs/op
BenchmarkDescriptorProtos-10    	  118,323	     10,008 ns/op	   16,232 B/op	     316 allocs/op

@jhump jhump requested a review from emcfarlane April 22, 2024 15:39
@jhump jhump enabled auto-merge (squash) April 22, 2024 16:31
@jhump jhump merged commit 63736ac into main Apr 22, 2024
8 checks passed
@jhump jhump deleted the jh/walk-updates branch April 22, 2024 16:34
jhump added a commit that referenced this pull request Apr 22, 2024
After measuring the impact of #286, #287, and #290 and seeing it to be
too modest. I decided to use a memory profiler, and it found "the good
stuff".

These changes had the largest impact on allocations and performance.
When linking inputs that come from descriptor protos (as opposed to
inputs that are compiled from sources and have ASTs), this resulted in
a 23% reduction in latency and 70% reduction in allocations.

This change features the following improvements:
1. `ast.NoSourceNode` now has a pointer receiver, so wrapping one in an
`ast.Node` interface value doesn't incur an allocation to put the value
on the heap. This also updates `parser.ParseResult` to refer to a single
`*ast.NoSourceNode` when it has no AST, instead of allocating one in
each call to get a node value. The `NoSourceNode`'s underlying type is
now `ast.FileInfo` so that it can be allocation-free, even for the
`NodeInfo` method (which previously was allocating a new `FileInfo` each
time).
3. Don't allocate a slice to hold the set of checked files for each
element being resolved. Instead, we allocate a single slice up front,
and re-use that throughout.
4. Don't pro-actively allocate strings that only are used for error
messages; instead defer construction of the change to the construction
of the error.
kralicky pushed a commit to kralicky/protocompile that referenced this pull request May 19, 2024
I added benchmarks, in the hopes of finding a more efficient way to
traverse descriptor protos, from `walk.DescriptorProtos`. The main cost
is the way the fully-qualified names are computed/allocated as it
traverses the descriptor hierarchy.

While I did not come up with any meaningful improvements there, I was
able to improve the other walk function (`walk.Descriptors`), by making
fewer interface method calls, memoizing the results of the various
accessors. This improves throughput, consistently taking about 15%
less time per operation.

(cherry picked from commit 63736ac)
kralicky pushed a commit to kralicky/protocompile that referenced this pull request May 19, 2024
After measuring the impact of bufbuild#286, bufbuild#287, and bufbuild#290 and seeing it to be
too modest. I decided to use a memory profiler, and it found "the good
stuff".

These changes had the largest impact on allocations and performance.
When linking inputs that come from descriptor protos (as opposed to
inputs that are compiled from sources and have ASTs), this resulted in
a 23% reduction in latency and 70% reduction in allocations.

This change features the following improvements:
1. `ast.NoSourceNode` now has a pointer receiver, so wrapping one in an
`ast.Node` interface value doesn't incur an allocation to put the value
on the heap. This also updates `parser.ParseResult` to refer to a single
`*ast.NoSourceNode` when it has no AST, instead of allocating one in
each call to get a node value. The `NoSourceNode`'s underlying type is
now `ast.FileInfo` so that it can be allocation-free, even for the
`NodeInfo` method (which previously was allocating a new `FileInfo` each
time).
3. Don't allocate a slice to hold the set of checked files for each
element being resolved. Instead, we allocate a single slice up front,
and re-use that throughout.
4. Don't pro-actively allocate strings that only are used for error
messages; instead defer construction of the change to the construction
of the error.

(cherry picked from commit 016b009)
kralicky pushed a commit to kralicky/protocompile that referenced this pull request Jun 8, 2024
I added benchmarks, in the hopes of finding a more efficient way to
traverse descriptor protos, from `walk.DescriptorProtos`. The main cost
is the way the fully-qualified names are computed/allocated as it
traverses the descriptor hierarchy.

While I did not come up with any meaningful improvements there, I was
able to improve the other walk function (`walk.Descriptors`), by making
fewer interface method calls, memoizing the results of the various
accessors. This improves throughput, consistently taking about 15%
less time per operation.

(cherry picked from commit 63736ac)
kralicky pushed a commit to kralicky/protocompile that referenced this pull request Jun 8, 2024
After measuring the impact of bufbuild#286, bufbuild#287, and bufbuild#290 and seeing it to be
too modest. I decided to use a memory profiler, and it found "the good
stuff".

These changes had the largest impact on allocations and performance.
When linking inputs that come from descriptor protos (as opposed to
inputs that are compiled from sources and have ASTs), this resulted in
a 23% reduction in latency and 70% reduction in allocations.

This change features the following improvements:
1. `ast.NoSourceNode` now has a pointer receiver, so wrapping one in an
`ast.Node` interface value doesn't incur an allocation to put the value
on the heap. This also updates `parser.ParseResult` to refer to a single
`*ast.NoSourceNode` when it has no AST, instead of allocating one in
each call to get a node value. The `NoSourceNode`'s underlying type is
now `ast.FileInfo` so that it can be allocation-free, even for the
`NodeInfo` method (which previously was allocating a new `FileInfo` each
time).
3. Don't allocate a slice to hold the set of checked files for each
element being resolved. Instead, we allocate a single slice up front,
and re-use that throughout.
4. Don't pro-actively allocate strings that only are used for error
messages; instead defer construction of the change to the construction
of the error.

(cherry picked from commit 016b009)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants