Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a profiler to improve linker performance #291

Merged
merged 4 commits into from
Apr 22, 2024

Conversation

jhump
Copy link
Member

@jhump jhump commented Apr 22, 2024

After measuring the impact of #286, #287, and #290 and seeing it to be too modest. I decided to use a memory profiler, and it found "the good stuff".

These changes had the largest impact on allocations and performance. When linking inputs that come from descriptor protos (as opposed to inputs that are compiled from sources and have ASTs), this reduced the time per operation by 23% and the allocations by 70%. (Each operation consisted of linking all of the descriptors in the full googleapis module, which has 1000s of files. It reduced the runtime from 260ms to 200ms and reduced allocations from 3.4 million down to 1 million).

before:

BenchmarkNewResolver-10    	      45	 259,486,931 ns/op	254,811,876 B/op	 3,399,879 allocs/op

after:

BenchmarkNewResolver-10    	      55	 202,292,920 ns/op	170,419,086 B/op	 1,012,080 allocs/op

This change features the following improvements:

  1. ast.NoSourceNode now has a pointer receiver, so wrapping one in an ast.Node interface value doesn't incur an allocation to put the value on the heap. This also updates parser.ParseResult to refer to a single *ast.NoSourceNode when it has no AST, instead of allocating one in each call to get a node value. The NoSourceNode's underlying type is now ast.FileInfo so that it can be allocation-free, even for the NodeInfo method (which previously was allocating a new FileInfo each time).
    • Note that this doesn't help typical compile operations. For those, an AST is available and NoSourceNode is not used. But it does help when the input to the linker is a descriptor proto instead of source (i.e. a parser.ResultWithoutAST).
    • For just linking descriptor protos to descriptors, this change was by far the biggest/most impactful.
  2. Don't allocate a slice to hold the set of checked files for each element being resolved. Instead, we allocate a single slice up front, and use that throughout.
  3. Don't pro-actively allocate strings that only are used for error messages; instead defer construction of the change to the construction of the error.

Below are the latest benchmarks for this repo, before and after this branch and the previous three PRs. Like with the first bullet above, the net effect of these changes is quite small in the context of a normal, complete compile operation (which does a lot more than just linking) -- the difference is measurable but almost inconsequential. However, it is significant when just linking descriptor protos into descriptors (see stats in first paragraph).

before:

BenchmarkGoogleapisProtocompile-10                  	       2	  988,248,480 ns/op	2,134,934,592 B/op	32,956,803 allocs/op
BenchmarkGoogleapisProtocompileNoSourceInfo-10      	       2	  765,902,208 ns/op	1,470,990,984 B/op	22,673,097 allocs/op
BenchmarkGoogleapisProtocompileSingleThreaded-10    	       1	3,769,661,708 ns/op	2,135,339,152 B/op	32,959,866 allocs/op

after:

BenchmarkGoogleapisProtocompile-10                  	       2	  989,081,229 ns/op	2,109,230,420 B/op	32,399,672 allocs/op
BenchmarkGoogleapisProtocompileNoSourceInfo-10      	       2	  750,857,771 ns/op	1,445,389,236 B/op	22,116,776 allocs/op
BenchmarkGoogleapisProtocompileSingleThreaded-10    	       1	3,721,172,375 ns/op	2,109,203,136 B/op	32,398,941 allocs/op

1. ast.NoSourceNode has a pointer receiver, so wrapping one in an ast.Node interface
   value doesn't incur an allocation. Also updates parser.ParseResult to refer to a
   single *ast.NoSourceNode, instead of allocating one in each call to get a node
   value. The NoSourceNode's underlying type is now FileInfo so that it can be
   allocation-free, even for the NodeInfo method (which previously was allocating a
   new FileInfo each time).
2. Don't allocate a slice to hold the set of checked files for each element being
   resolved. Instead, we allocate a single slice up front, and use that throughout.
3. Don't pro-actively allocate strings that only are used for error messages; instead
   defer construction of the change to the construction of the error.
Copy link
Contributor

@emcfarlane emcfarlane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

linker/resolve.go Show resolved Hide resolved
@jhump jhump enabled auto-merge (squash) April 22, 2024 19:23
@jhump jhump merged commit 016b009 into main Apr 22, 2024
8 checks passed
@jhump jhump deleted the jh/linker-descriptor-optimizations branch April 22, 2024 19:24
kralicky pushed a commit to kralicky/protocompile that referenced this pull request May 19, 2024
After measuring the impact of bufbuild#286, bufbuild#287, and bufbuild#290 and seeing it to be
too modest. I decided to use a memory profiler, and it found "the good
stuff".

These changes had the largest impact on allocations and performance.
When linking inputs that come from descriptor protos (as opposed to
inputs that are compiled from sources and have ASTs), this resulted in
a 23% reduction in latency and 70% reduction in allocations.

This change features the following improvements:
1. `ast.NoSourceNode` now has a pointer receiver, so wrapping one in an
`ast.Node` interface value doesn't incur an allocation to put the value
on the heap. This also updates `parser.ParseResult` to refer to a single
`*ast.NoSourceNode` when it has no AST, instead of allocating one in
each call to get a node value. The `NoSourceNode`'s underlying type is
now `ast.FileInfo` so that it can be allocation-free, even for the
`NodeInfo` method (which previously was allocating a new `FileInfo` each
time).
3. Don't allocate a slice to hold the set of checked files for each
element being resolved. Instead, we allocate a single slice up front,
and re-use that throughout.
4. Don't pro-actively allocate strings that only are used for error
messages; instead defer construction of the change to the construction
of the error.

(cherry picked from commit 016b009)
kralicky pushed a commit to kralicky/protocompile that referenced this pull request Jun 8, 2024
After measuring the impact of bufbuild#286, bufbuild#287, and bufbuild#290 and seeing it to be
too modest. I decided to use a memory profiler, and it found "the good
stuff".

These changes had the largest impact on allocations and performance.
When linking inputs that come from descriptor protos (as opposed to
inputs that are compiled from sources and have ASTs), this resulted in
a 23% reduction in latency and 70% reduction in allocations.

This change features the following improvements:
1. `ast.NoSourceNode` now has a pointer receiver, so wrapping one in an
`ast.Node` interface value doesn't incur an allocation to put the value
on the heap. This also updates `parser.ParseResult` to refer to a single
`*ast.NoSourceNode` when it has no AST, instead of allocating one in
each call to get a node value. The `NoSourceNode`'s underlying type is
now `ast.FileInfo` so that it can be allocation-free, even for the
`NodeInfo` method (which previously was allocating a new `FileInfo` each
time).
3. Don't allocate a slice to hold the set of checked files for each
element being resolved. Instead, we allocate a single slice up front,
and re-use that throughout.
4. Don't pro-actively allocate strings that only are used for error
messages; instead defer construction of the change to the construction
of the error.

(cherry picked from commit 016b009)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants