Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fewer allocations when building descriptors in the linker package #290

Merged
merged 2 commits into from
Apr 22, 2024

Conversation

jhump
Copy link
Member

@jhump jhump commented Apr 22, 2024

The idea here is to use fewer, larger allocations. This change also eliminates many allocations related to computing fully-qualified-names.

Previously, when adding the file's elements to the symbol table, we had to traverse the hierarchy of descriptor protos, because the wrapper descriptor implementations hadn't yet been created. But this incurred redundantly computing the full name for every element an extra two times -- once for the first pass which checked for collisions and again in the second pass which commits the items to the symbol table.

The new flow instead instantiates the descriptor wrappers first. So we compute the fully-qualified names for that, but no more. Adding the names to the symbol table can then traverse those descriptor wrappers, which does not incur the cost of allocating/computing full names again. It wasn't originally implemented this way simply because the cost of iterating the descriptor protos wasn't clear -- it's turned out to be a leaky abstraction.

Another significant part of this branch is that when we create the descriptor wrappers, we allocate them in bulk using large, flattened slices. This means many fewer calls to the allocator.

I don't have benchmark results just for this change. I was working on benchmarks in the bufbuild/buf repo, and this was an optimization I happened to spot. While it did provide measurable improvements, they were quite small (both in terms of total number of allocations and in terms of throughput). I was hoping for a more dramatic improvement, but despite only minor gains this still seems like a keeper to me.

  create hierarchy descriptor first, so we only need a single traversal
    where fully-qualified names must be computed/allocated
  subsequent traversals to add to symbol table can use descriptors
    instead of descriptor protos (which is allocation-free)
  when creating hierarchy, first count all items and allocate slices
    in bulk instead of single allocation per element.
@jhump jhump requested a review from emcfarlane April 22, 2024 17:11
linker/resolve.go Outdated Show resolved Hide resolved
linker/pool.go Show resolved Hide resolved
@jhump jhump mentioned this pull request Apr 22, 2024
@jhump jhump merged commit 93923d2 into main Apr 22, 2024
8 checks passed
@jhump jhump deleted the jh/flattened-slices branch April 22, 2024 19:07
jhump added a commit that referenced this pull request Apr 22, 2024
After measuring the impact of #286, #287, and #290 and seeing it to be
too modest. I decided to use a memory profiler, and it found "the good
stuff".

These changes had the largest impact on allocations and performance.
When linking inputs that come from descriptor protos (as opposed to
inputs that are compiled from sources and have ASTs), this resulted in
a 23% reduction in latency and 70% reduction in allocations.

This change features the following improvements:
1. `ast.NoSourceNode` now has a pointer receiver, so wrapping one in an
`ast.Node` interface value doesn't incur an allocation to put the value
on the heap. This also updates `parser.ParseResult` to refer to a single
`*ast.NoSourceNode` when it has no AST, instead of allocating one in
each call to get a node value. The `NoSourceNode`'s underlying type is
now `ast.FileInfo` so that it can be allocation-free, even for the
`NodeInfo` method (which previously was allocating a new `FileInfo` each
time).
3. Don't allocate a slice to hold the set of checked files for each
element being resolved. Instead, we allocate a single slice up front,
and re-use that throughout.
4. Don't pro-actively allocate strings that only are used for error
messages; instead defer construction of the change to the construction
of the error.
kralicky pushed a commit to kralicky/protocompile that referenced this pull request May 19, 2024
…bufbuild#290)

This uses many fewer, larger allocations, to allocate all of the descendant
descriptors in a file in as few chunks as possible (by allocating slices of
flattened structs).

This change also eliminates many allocations related to computing
fully-qualified-names by moving the construction of the descriptor hierarchy
(which includes computing and storing full names) to *before* we import
the names into the symbol table (which previously had to compute/allocate
the full names twice: once when checking for collisions, and again in a
second pass to store the names when no collisions were found).

(cherry picked from commit 93923d2)
kralicky pushed a commit to kralicky/protocompile that referenced this pull request May 19, 2024
After measuring the impact of bufbuild#286, bufbuild#287, and bufbuild#290 and seeing it to be
too modest. I decided to use a memory profiler, and it found "the good
stuff".

These changes had the largest impact on allocations and performance.
When linking inputs that come from descriptor protos (as opposed to
inputs that are compiled from sources and have ASTs), this resulted in
a 23% reduction in latency and 70% reduction in allocations.

This change features the following improvements:
1. `ast.NoSourceNode` now has a pointer receiver, so wrapping one in an
`ast.Node` interface value doesn't incur an allocation to put the value
on the heap. This also updates `parser.ParseResult` to refer to a single
`*ast.NoSourceNode` when it has no AST, instead of allocating one in
each call to get a node value. The `NoSourceNode`'s underlying type is
now `ast.FileInfo` so that it can be allocation-free, even for the
`NodeInfo` method (which previously was allocating a new `FileInfo` each
time).
3. Don't allocate a slice to hold the set of checked files for each
element being resolved. Instead, we allocate a single slice up front,
and re-use that throughout.
4. Don't pro-actively allocate strings that only are used for error
messages; instead defer construction of the change to the construction
of the error.

(cherry picked from commit 016b009)
kralicky pushed a commit to kralicky/protocompile that referenced this pull request Jun 8, 2024
…bufbuild#290)

This uses many fewer, larger allocations, to allocate all of the descendant
descriptors in a file in as few chunks as possible (by allocating slices of
flattened structs).

This change also eliminates many allocations related to computing
fully-qualified-names by moving the construction of the descriptor hierarchy
(which includes computing and storing full names) to *before* we import
the names into the symbol table (which previously had to compute/allocate
the full names twice: once when checking for collisions, and again in a
second pass to store the names when no collisions were found).

(cherry picked from commit 93923d2)
kralicky pushed a commit to kralicky/protocompile that referenced this pull request Jun 8, 2024
After measuring the impact of bufbuild#286, bufbuild#287, and bufbuild#290 and seeing it to be
too modest. I decided to use a memory profiler, and it found "the good
stuff".

These changes had the largest impact on allocations and performance.
When linking inputs that come from descriptor protos (as opposed to
inputs that are compiled from sources and have ASTs), this resulted in
a 23% reduction in latency and 70% reduction in allocations.

This change features the following improvements:
1. `ast.NoSourceNode` now has a pointer receiver, so wrapping one in an
`ast.Node` interface value doesn't incur an allocation to put the value
on the heap. This also updates `parser.ParseResult` to refer to a single
`*ast.NoSourceNode` when it has no AST, instead of allocating one in
each call to get a node value. The `NoSourceNode`'s underlying type is
now `ast.FileInfo` so that it can be allocation-free, even for the
`NodeInfo` method (which previously was allocating a new `FileInfo` each
time).
3. Don't allocate a slice to hold the set of checked files for each
element being resolved. Instead, we allocate a single slice up front,
and re-use that throughout.
4. Don't pro-actively allocate strings that only are used for error
messages; instead defer construction of the change to the construction
of the error.

(cherry picked from commit 016b009)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants