-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memoized type matching #5026
Memoized type matching #5026
Conversation
ca96d6d
to
7361cd5
Compare
60c3b7b
to
9e12d28
Compare
BenchmarksParameters
See matching parameters
SummaryFound 3 performance improvements and 1 performance regressions! Performance is the same for 18 cases.
See unchanged results
|
179e420
to
aa41c87
Compare
b1bad12
to
ce7f127
Compare
…edly walk the type hierarchy
…, while keeping memory usage low
…-seed it when the application restarts
…f memoization when we know type-names are unique.
ce7f127
to
67715b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good (as far as I can tell). Have a few questions.
|
||
// memoize whether the type is a class | ||
static final ElementMatcher.Junction<TypeDescription> isClass = | ||
prepare(MatcherKind.CLASS, ElementMatchers.any(), true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After rereading this a couple of times I wasn't completely sure why the ElementMatchers.any()
was ok here. It looks like the memoization is driven from the other end, and we will only use the classMatcherIds
bit set on types that are not primitive and not an interface, which makes this ok. Is this somewhat correct @mcculls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct MatcherKind.CLASS
means this matcher will only be applied against not-primitive + not-interface types, so the matcher here can be simplified to any()
} | ||
|
||
private static int rehash(int oldHash) { | ||
return Integer.reverseBytes(oldHash * 0x9e3775cd) * 0x9e3775cd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good old Bagwell byte swapping hash.
* Computes a 32-bit 'class-code' that includes the length of the package prefix and simple name, | ||
* plus the first and last characters of the simple name (each truncated to fit into 8-bits.) | ||
*/ | ||
private static int classCode(String name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable to make collisions very unlikely combined with the normal hash code.
|
||
@Override | ||
protected boolean doMatch(TypeDescription target) { | ||
return ExcludeFilter.exclude(excludeType, target.getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a comment on this PR, but more of an idea. I assume that this isn't hit that often now with memoization, but would it make sense to let the exclude filter build up a trie and use that instead of the hash set lookup and string comparisons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, worth looking into
...ent-tooling/src/main/java/datadog/trace/agent/tooling/bytebuddy/memoize/HasContextField.java
Outdated
Show resolved
Hide resolved
final class HasContextField extends ElementMatcher.Junction.ForNonNullValues<TypeDescription> { | ||
private final ElementMatcher<TypeDescription> storeMatcher; | ||
private final ElementMatcher<TypeDescription> skipMatcher; | ||
private final BitSet skippableStoreIds = new BitSet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems less exact than the weak store id bit sets in ShouldInjectFieldsState
because of the maybeSkip
, if I read the code correctly. Is there a possibility that we use a weak store unnecessary, or are could we just add redundant code that never gets used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is potentially less exact - let me add some tests to confirm when it would make a difference, I can then make it more exact based on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out the java-completablefuture
tests already demonstrate the difference - and the new approach is actually more deterministic. For each context class that can be skipped/excluded we collect store ids associated with that context class (ie. the value side of all the context-value associations) and therefore might need to be 'skipped' when that excluded class is a super-class - by delegating to the weak map instead.
When we do find a context class that should be excluded and it's a super-class of the class we're injecting then we notify the field-injection code that super-class requests involving those stores should delegate to the weak map. That part of the injected code will only be used if someone tries to push/get values to those stores. If a store isn't used then it just means we injected an unused case block.
The old approach was more lazy in that it only recorded a store as needing weak map delegation at the time the match was made. Unfortunately variations in class-loading order can make this non-deterministic - for example with the java-completablefuture
tests the UniCompletion
class is processed before its superclasses (since both are not yet loaded at matching time) so it doesn't realize that it should delegate Runnable
/ ForkJoinTask
super-class requests to the weak map. This changes if the Completion
class is forcibly loaded first, because it now sees the skipped class before processing UniCompletion
.
In practice this doesn't make a difference because we add a last-resort catch
block to handle unexpected requests and this catch
block ends up being used by the old approach. Whereas with the new approach we always add those weak map case blocks (and avoid triggering the exceptional catch
block)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mcculls for the detailed explanation. Determinism FTW!
cd89962
to
88a4f4a
Compare
…d cache, we just need to avoid adding them to the simple 'no-match' filter
09f7b8f
to
a2e7761
Compare
What Does This Do
Enables memoization of hierarchical type matchers to reduce the number of times we need to sweep a given type hierarchy.
To turn off memoization add this JVM option:
or set this environment variable:
Motivation
For applications with deep type inheritance each sweep of a hierarchy might result in a complete churn of the type cache. When this happens types cached at the start of each sweep get replaced by others by the end, leading to repeated parsing of the same types for each sweep. This can lead to long startup times when resource lookup is relatively expensive, such as certain OSGi configurations.
Memoization Approach
Each hierarchy matcher is assigned a unique id as it is requested from
HierarchyMatchers
and a replacement matcher that triggers memoization is returned in its place. We do some limited de-duplication of requests for the same matcher using identity equivalence and a tiny lookup cache. (We cannot rely onequals
being implemented properly for every matcher.)When memoization is triggered for a type we first check for previously shared results. If none exist then memoization is triggered recursively for the super-class and any interfaces. Inherited matches are copied into the current result. Next we record matches for members declared in the type: annotations, fields, and methods. Finally we record matches for the type itself, before sharing the result.
Since all hierarchy matchers are requested before any are used we can store match results in a simple
BitSet
. The type of each match and whether the match can be inherited are spread over otherBitSets
, shared across all results like a schema."No Match" Filter
Memoization has an additional benefit: we can now easily tell when a type is not interesting to any hierarchical matcher. In most applications these "no match" results heavily outweigh any matches. Unfortunately recording "no matches" has the potential to use a lot of memory, especially if we need to take the class location/class-loader into account.
However if we assume that when a type is not interesting, all types with the same name are also not interesting then we can simplify the "no match" cache down to a simple
long
array. The first 32 bits of each entry are the hash of the class-name while the remaining 32-bits record extra characteristics of the name, such as the length of the package and simple names. These are used to further reduce the probability of collisions, compared to just using the name hash.Persisting "No Match" Results
Basing the "No Match" filter on just the class-name also means we can persist it between runs of the same application, because the
String.hashCode()
algorithm is well-defined and stable. This can provide a further performance boost because it can prune the matching space without needing to load/parse any types.To enable persistence of the "No Match" filter between runs add this JVM option:
or set this environment variable:
The Java tracer will then write a file at shutdown based on the following name pattern:
where the UUID is based on the configured service name and version, so each unique service+version has its own file.
If a
nomatch.filter
file already exists for a particular service+version it will be used to re-seed the "No Match" filter at startup. It won't then be updated at shutdown; to regenerate anomatch.filter
file, delete it and restart the application.Additional Notes
Here's a short history of the total transformation time for a sizeable, modular web application. Transformation time covers the accumulated time spent inside our
ClassFileTransformer
for both initial transformations and any retransformations during warmup of the application. I've included some of the more notable changes related to matching and transforming.LARGE
here refers to thedd.resolver.cache.config
preset, for example:-Ddd.resolver.cache.config=LARGE
The smallest overall transformation time is 6.3 seconds, by using the
LARGE
preset with a pre-seeded "No Match" filter. Less than half of that time is matching - the rest is actually transforming the bytecode. Of the 2.5s matching time, muzzle and class-loader checks take roughly a second as does hierarchy matching. The remaining 0.5s is composed of context-store matching and structural checks.