Add lookup methods to linker.Symbols; improve performance/reduce allocations #286

jhump · 2024-04-22T14:33:16Z

This adds a lookup methods, which can be used to improve the efficiency of a resolver that is based on a linker.Files: instead of O(n) to query each file for an item, we can look the item up from the symbol table and directly determine the file it is in.

To further aid efficiency, this adds some benchmarks and then optimizes the Symbols.getPackage function so both symbol table construction and lookup are more efficient. In particular, instead of strings.Split, which must allocate a slice, we now use a new nameEnumerator type, which can be used to iterate through name components without allocation.

Benchmark improvements below.

before

BenchmarkSymbols-10    	 2,073,348	       568.7 ns/op	     240 B/op	       7 allocs/op

after

BenchmarkSymbols-10    	 3,570,336	       324.1 ns/op	       0 B/op	       0 allocs/op

…ages to not allocate slice and names

linker/symbols.go

jhump · 2024-04-22T15:34:31Z

Could the lookup be reversed? Is it needed to go from a -> a.b -> a.b.c and not just lookup a.b.c directly?

Maybe, but certainly not with the existing data structure: it is a trie, so you must traverse the path to find the node. There is no entry in the ancestor children maps for a.b.c; it only appears in its parent package, a.b.

So to do something like that, we'd need to change the structure and add some alternate index. I can look into it and see how that changes performance. It makes locking a little more complicated: the trie is thread-safe because the compiler is inherently multi-threaded. It's structured how it is to allow concurrent writers to the symbol table (as long as they are recording symbols for different packages). An alternate index would require separate synchronization.

After measuring the impact of #286, #287, and #290 and seeing it to be too modest. I decided to use a memory profiler, and it found "the good stuff". These changes had the largest impact on allocations and performance. When linking inputs that come from descriptor protos (as opposed to inputs that are compiled from sources and have ASTs), this resulted in a 23% reduction in latency and 70% reduction in allocations. This change features the following improvements: 1. `ast.NoSourceNode` now has a pointer receiver, so wrapping one in an `ast.Node` interface value doesn't incur an allocation to put the value on the heap. This also updates `parser.ParseResult` to refer to a single `*ast.NoSourceNode` when it has no AST, instead of allocating one in each call to get a node value. The `NoSourceNode`'s underlying type is now `ast.FileInfo` so that it can be allocation-free, even for the `NodeInfo` method (which previously was allocating a new `FileInfo` each time). 3. Don't allocate a slice to hold the set of checked files for each element being resolved. Instead, we allocate a single slice up front, and re-use that throughout. 4. Don't pro-actively allocate strings that only are used for error messages; instead defer construction of the change to the construction of the error.

…cations (bufbuild#286) This adds a lookup methods, which can be used to improve the efficiency of a resolver that is based on a `linker.Files`: instead of O(n) to query each file for an item, we can look the item up from the symbol table and directly determine the file it is in. To further aid efficiency, this adds some benchmarks and then optimizes the `Symbols.getPackage` function so both symbol table construction and lookup are more efficient. In particular, instead of `strings.Split`, which must allocate a slice, we now use a new `nameEnumerator` type, which can be used to iterate through name components without allocation. (cherry picked from commit 792268a)

After measuring the impact of bufbuild#286, bufbuild#287, and bufbuild#290 and seeing it to be too modest. I decided to use a memory profiler, and it found "the good stuff". These changes had the largest impact on allocations and performance. When linking inputs that come from descriptor protos (as opposed to inputs that are compiled from sources and have ASTs), this resulted in a 23% reduction in latency and 70% reduction in allocations. This change features the following improvements: 1. `ast.NoSourceNode` now has a pointer receiver, so wrapping one in an `ast.Node` interface value doesn't incur an allocation to put the value on the heap. This also updates `parser.ParseResult` to refer to a single `*ast.NoSourceNode` when it has no AST, instead of allocating one in each call to get a node value. The `NoSourceNode`'s underlying type is now `ast.FileInfo` so that it can be allocation-free, even for the `NodeInfo` method (which previously was allocating a new `FileInfo` each time). 3. Don't allocate a slice to hold the set of checked files for each element being resolved. Instead, we allocate a single slice up front, and re-use that throughout. 4. Don't pro-actively allocate strings that only are used for error messages; instead defer construction of the change to the construction of the error. (cherry picked from commit 016b009)

…cations (bufbuild#286) This adds a lookup methods, which can be used to improve the efficiency of a resolver that is based on a `linker.Files`: instead of O(n) to query each file for an item, we can look the item up from the symbol table and directly determine the file it is in. To further aid efficiency, this adds some benchmarks and then optimizes the `Symbols.getPackage` function so both symbol table construction and lookup are more efficient. In particular, instead of `strings.Split`, which must allocate a slice, we now use a new `nameEnumerator` type, which can be used to iterate through name components without allocation. (cherry picked from commit 792268a)

After measuring the impact of bufbuild#286, bufbuild#287, and bufbuild#290 and seeing it to be too modest. I decided to use a memory profiler, and it found "the good stuff". These changes had the largest impact on allocations and performance. When linking inputs that come from descriptor protos (as opposed to inputs that are compiled from sources and have ASTs), this resulted in a 23% reduction in latency and 70% reduction in allocations. This change features the following improvements: 1. `ast.NoSourceNode` now has a pointer receiver, so wrapping one in an `ast.Node` interface value doesn't incur an allocation to put the value on the heap. This also updates `parser.ParseResult` to refer to a single `*ast.NoSourceNode` when it has no AST, instead of allocating one in each call to get a node value. The `NoSourceNode`'s underlying type is now `ast.FileInfo` so that it can be allocation-free, even for the `NodeInfo` method (which previously was allocating a new `FileInfo` each time). 3. Don't allocate a slice to hold the set of checked files for each element being resolved. Instead, we allocate a single slice up front, and re-use that throughout. 4. Don't pro-actively allocate strings that only are used for error messages; instead defer construction of the change to the construction of the error. (cherry picked from commit 016b009)

jhump added 2 commits April 22, 2024 10:36

add benchmark

c4cc0aa

add lookup operations to *Symbols; improve getPackages and importPack…

b347ffd

…ages to not allocate slice and names

jhump force-pushed the jh/symbol-lookup branch from 2846adc to b347ffd Compare April 22, 2024 14:37

jhump requested a review from emcfarlane April 22, 2024 14:38

emcfarlane approved these changes Apr 22, 2024

View reviewed changes

linker/symbols.go Outdated Show resolved Hide resolved

linker/symbols.go Show resolved Hide resolved

don't need a loop since getPackage(..., false) never returns nil

dd909c4

emcfarlane approved these changes Apr 22, 2024

View reviewed changes

jhump merged commit 792268a into main Apr 22, 2024
8 checks passed

jhump deleted the jh/symbol-lookup branch April 22, 2024 15:38

jhump mentioned this pull request Apr 22, 2024

Use a profiler to improve linker performance #291

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lookup methods to linker.Symbols; improve performance/reduce allocations #286

Add lookup methods to linker.Symbols; improve performance/reduce allocations #286

jhump commented Apr 22, 2024

jhump commented Apr 22, 2024

Add lookup methods to linker.Symbols; improve performance/reduce allocations #286

Add lookup methods to linker.Symbols; improve performance/reduce allocations #286

Conversation

jhump commented Apr 22, 2024

jhump commented Apr 22, 2024