Slight improvement to walk package #287

jhump · 2024-04-22T14:52:22Z

I added the benchmarks, in the hopes of find a more efficient way to traverse descriptor protos, from walk.DescriptorProtos. The main cost is the way the fully-qualified names are computed/allocated as it traverses the descriptor hierarchy.

Ultimately, I did not come up with any better formulation. While I was able to reduce allocations slightly (from 316 allocs/op to 293 allocs/op), that approach actually yielded slightly worse throughput (consistently needing 5-7% more time per operation than the approach w/ slightly more allocations).

But surprisingly, the very simple changes to other walk function (walk.Descriptors), which just memoize the results of the various accessors, did actually result in a throughput improvement, consistently taking about 15% less time per operation.

Benchmark results below.

before:

BenchmarkDescriptors-10         	  352,322	      3,414 ns/op	       0 B/op	       0 allocs/op
BenchmarkDescriptorProtos-10    	  118,027	     10,055 ns/op	   16,232 B/op	     316 allocs/op

after:

BenchmarkDescriptors-10         	  409,730	      2,928 ns/op	       0 B/op	       0 allocs/op
BenchmarkDescriptorProtos-10    	  118,323	     10,008 ns/op	   16,232 B/op	     316 allocs/op

After measuring the impact of #286, #287, and #290 and seeing it to be too modest. I decided to use a memory profiler, and it found "the good stuff". These changes had the largest impact on allocations and performance. When linking inputs that come from descriptor protos (as opposed to inputs that are compiled from sources and have ASTs), this resulted in a 23% reduction in latency and 70% reduction in allocations. This change features the following improvements: 1. `ast.NoSourceNode` now has a pointer receiver, so wrapping one in an `ast.Node` interface value doesn't incur an allocation to put the value on the heap. This also updates `parser.ParseResult` to refer to a single `*ast.NoSourceNode` when it has no AST, instead of allocating one in each call to get a node value. The `NoSourceNode`'s underlying type is now `ast.FileInfo` so that it can be allocation-free, even for the `NodeInfo` method (which previously was allocating a new `FileInfo` each time). 3. Don't allocate a slice to hold the set of checked files for each element being resolved. Instead, we allocate a single slice up front, and re-use that throughout. 4. Don't pro-actively allocate strings that only are used for error messages; instead defer construction of the change to the construction of the error.

I added benchmarks, in the hopes of finding a more efficient way to traverse descriptor protos, from `walk.DescriptorProtos`. The main cost is the way the fully-qualified names are computed/allocated as it traverses the descriptor hierarchy. While I did not come up with any meaningful improvements there, I was able to improve the other walk function (`walk.Descriptors`), by making fewer interface method calls, memoizing the results of the various accessors. This improves throughput, consistently taking about 15% less time per operation. (cherry picked from commit 63736ac)

After measuring the impact of bufbuild#286, bufbuild#287, and bufbuild#290 and seeing it to be too modest. I decided to use a memory profiler, and it found "the good stuff". These changes had the largest impact on allocations and performance. When linking inputs that come from descriptor protos (as opposed to inputs that are compiled from sources and have ASTs), this resulted in a 23% reduction in latency and 70% reduction in allocations. This change features the following improvements: 1. `ast.NoSourceNode` now has a pointer receiver, so wrapping one in an `ast.Node` interface value doesn't incur an allocation to put the value on the heap. This also updates `parser.ParseResult` to refer to a single `*ast.NoSourceNode` when it has no AST, instead of allocating one in each call to get a node value. The `NoSourceNode`'s underlying type is now `ast.FileInfo` so that it can be allocation-free, even for the `NodeInfo` method (which previously was allocating a new `FileInfo` each time). 3. Don't allocate a slice to hold the set of checked files for each element being resolved. Instead, we allocate a single slice up front, and re-use that throughout. 4. Don't pro-actively allocate strings that only are used for error messages; instead defer construction of the change to the construction of the error. (cherry picked from commit 016b009)

I added benchmarks, in the hopes of finding a more efficient way to traverse descriptor protos, from `walk.DescriptorProtos`. The main cost is the way the fully-qualified names are computed/allocated as it traverses the descriptor hierarchy. While I did not come up with any meaningful improvements there, I was able to improve the other walk function (`walk.Descriptors`), by making fewer interface method calls, memoizing the results of the various accessors. This improves throughput, consistently taking about 15% less time per operation. (cherry picked from commit 63736ac)

After measuring the impact of bufbuild#286, bufbuild#287, and bufbuild#290 and seeing it to be too modest. I decided to use a memory profiler, and it found "the good stuff". These changes had the largest impact on allocations and performance. When linking inputs that come from descriptor protos (as opposed to inputs that are compiled from sources and have ASTs), this resulted in a 23% reduction in latency and 70% reduction in allocations. This change features the following improvements: 1. `ast.NoSourceNode` now has a pointer receiver, so wrapping one in an `ast.Node` interface value doesn't incur an allocation to put the value on the heap. This also updates `parser.ParseResult` to refer to a single `*ast.NoSourceNode` when it has no AST, instead of allocating one in each call to get a node value. The `NoSourceNode`'s underlying type is now `ast.FileInfo` so that it can be allocation-free, even for the `NodeInfo` method (which previously was allocating a new `FileInfo` each time). 3. Don't allocate a slice to hold the set of checked files for each element being resolved. Instead, we allocate a single slice up front, and re-use that throughout. 4. Don't pro-actively allocate strings that only are used for error messages; instead defer construction of the change to the construction of the error. (cherry picked from commit 016b009)

jhump added 3 commits April 22, 2024 10:46

add test and benchmarks

5a6a64f

slight cleanup in walk...

0a90285

Merge branch 'main' into jh/walk-updates

73dd4c3

jhump requested a review from emcfarlane April 22, 2024 15:39

emcfarlane approved these changes Apr 22, 2024

View reviewed changes

jhump enabled auto-merge (squash) April 22, 2024 16:31

jhump merged commit 63736ac into main Apr 22, 2024
8 checks passed

jhump deleted the jh/walk-updates branch April 22, 2024 16:34

jhump mentioned this pull request Apr 22, 2024

Use a profiler to improve linker performance #291

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slight improvement to walk package #287

Slight improvement to walk package #287

jhump commented Apr 22, 2024

Slight improvement to walk package #287

Slight improvement to walk package #287

Conversation

jhump commented Apr 22, 2024