Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memoized type matching #5026

Merged
merged 8 commits into from
May 11, 2023
Merged

Memoized type matching #5026

merged 8 commits into from
May 11, 2023

Conversation

mcculls
Copy link
Contributor

@mcculls mcculls commented Apr 4, 2023

What Does This Do

Enables memoization of hierarchical type matchers to reduce the number of times we need to sweep a given type hierarchy.

To turn off memoization add this JVM option:

-Ddd.resolver.cache.config=NO_MEMOS

or set this environment variable:

DD_RESOLVER_CACHE_CONFIG=NO_MEMOS

Motivation

For applications with deep type inheritance each sweep of a hierarchy might result in a complete churn of the type cache. When this happens types cached at the start of each sweep get replaced by others by the end, leading to repeated parsing of the same types for each sweep. This can lead to long startup times when resource lookup is relatively expensive, such as certain OSGi configurations.

Memoization Approach

Each hierarchy matcher is assigned a unique id as it is requested from HierarchyMatchers and a replacement matcher that triggers memoization is returned in its place. We do some limited de-duplication of requests for the same matcher using identity equivalence and a tiny lookup cache. (We cannot rely on equals being implemented properly for every matcher.)

When memoization is triggered for a type we first check for previously shared results. If none exist then memoization is triggered recursively for the super-class and any interfaces. Inherited matches are copied into the current result. Next we record matches for members declared in the type: annotations, fields, and methods. Finally we record matches for the type itself, before sharing the result.

Since all hierarchy matchers are requested before any are used we can store match results in a simple BitSet. The type of each match and whether the match can be inherited are spread over other BitSets, shared across all results like a schema.

"No Match" Filter

Memoization has an additional benefit: we can now easily tell when a type is not interesting to any hierarchical matcher. In most applications these "no match" results heavily outweigh any matches. Unfortunately recording "no matches" has the potential to use a lot of memory, especially if we need to take the class location/class-loader into account.

However if we assume that when a type is not interesting, all types with the same name are also not interesting then we can simplify the "no match" cache down to a simple long array. The first 32 bits of each entry are the hash of the class-name while the remaining 32-bits record extra characteristics of the name, such as the length of the package and simple names. These are used to further reduce the probability of collisions, compared to just using the name hash.

Persisting "No Match" Results

Basing the "No Match" filter on just the class-name also means we can persist it between runs of the same application, because the String.hashCode() algorithm is well-defined and stable. This can provide a further performance boost because it can prune the matching space without needing to load/parse any types.

To enable persistence of the "No Match" filter between runs add this JVM option:

-Ddd.resolver.cache.dir=<writable-directory>

or set this environment variable:

DD_RESOLVER_CACHE_DIR=<writable-directory>

The Java tracer will then write a file at shutdown based on the following name pattern:

<writable-directory>/<uuid>-nomatch.filter

where the UUID is based on the configured service name and version, so each unique service+version has its own file.

If a nomatch.filter file already exists for a particular service+version it will be used to re-seed the "No Match" filter at startup. It won't then be updated at shutdown; to regenerate a nomatch.filter file, delete it and restart the application.

Additional Notes

Here's a short history of the total transformation time for a sizeable, modular web application. Transformation time covers the accumulated time spent inside our ClassFileTransformer for both initial transformations and any retransformations during warmup of the application. I've included some of the more notable changes related to matching and transforming.

Version Transformation time Notable changes
0.81.0 32.7s
0.100.0 29.8s global ignores trie
0.104.0 22.4s outline type parsing
0.109.1 17.2s class-loader masks + rework context-store matchers
0.115.1 16.7s reduce allocations when matching
1.2.0 16.3s reduce need for full-type parsing
1.3.0 16.0s use outline types in more situations
1.4.0 15.5s re-use bytecode during type resolution
1.6.0 13.3s single combining matcher + splitting transformer
1.11.0 12.7s replace map-based WeakCache with fixed-sized array approach
--------- --------------------- ----------------------------------------------------------------
1.14.0 9.6s memoization
1.14.0 8.9s memoization (LARGE)
1.14.0 8.5s memoization, previously persisted no-match filter
1.14.0 6.3s memoization, previously persisted no-match filter (LARGE)

LARGE here refers to the dd.resolver.cache.config preset, for example: -Ddd.resolver.cache.config=LARGE

The smallest overall transformation time is 6.3 seconds, by using the LARGE preset with a pre-seeded "No Match" filter. Less than half of that time is matching - the rest is actually transforming the bytecode. Of the 2.5s matching time, muzzle and class-loader checks take roughly a second as does hierarchy matching. The remaining 0.5s is composed of context-store matching and structural checks.

@mcculls mcculls added tag: do not merge Do not merge changes comp: core Tracer core tag: performance Performance related changes type: refactoring labels Apr 4, 2023
@mcculls mcculls force-pushed the mcculls/memoized-type-matching branch 3 times, most recently from ca96d6d to 7361cd5 Compare April 18, 2023 15:04
@mcculls mcculls force-pushed the mcculls/memoized-type-matching branch 5 times, most recently from 60c3b7b to 9e12d28 Compare April 24, 2023 23:04
@pr-commenter
Copy link

pr-commenter bot commented Apr 24, 2023

Benchmarks

Parameters

Baseline Candidate
commit 1.14.0-SNAPSHOT~041ca59785 1.14.0-SNAPSHOT~a2e7761be6
config baseline candidate
See matching parameters
Baseline Candidate
module Agent Agent
parent None None

Summary

Found 3 performance improvements and 1 performance regressions! Performance is the same for 18 cases.

scenario Δ mean execution_time
scenario:Startup-base-Telemetry better
[-0.288ms; -0.191ms] or [-4.770%; -3.165%]
scenario:Startup-iast-IAST better
[-0.743ms; -0.291ms] or [-6.276%; -2.457%]
scenario:Startup-iast-Telemetry better
[-0.319ms; -0.144ms] or [-5.189%; -2.348%]
scenario:Startup-waf-Remote Config worse
[+156.813µs; +340.914µs] or [+20.632%; +44.855%]
See unchanged results
scenario Δ mean execution_time
scenario:Startup-base-Agent same
scenario:Startup-base-Agent.start unsure
[+2.536ms; +12.172ms] or [+0.266%; +1.278%]
scenario:Startup-base-BytebuddyAgent unsure
[+3.633ms; +10.070ms] or [+0.619%; +1.716%]
scenario:Startup-base-GlobalTracer same
scenario:Startup-base-AppSec same
scenario:Startup-base-Remote Config unsure
[+4.793µs; +17.239µs] or [+0.769%; +2.765%]
scenario:Startup-iast-Agent unsure
[-0.166s; -0.028s] or [-1.713%; -0.289%]
scenario:Startup-iast-Agent.start same
scenario:Startup-iast-BytebuddyAgent same
scenario:Startup-iast-GlobalTracer same
scenario:Startup-iast-AppSec same
scenario:Startup-iast-Remote Config same
scenario:Startup-waf-Agent same
scenario:Startup-waf-Agent.start unsure
[+0.002s; +0.009s] or [+0.146%; +0.858%]
scenario:Startup-waf-BytebuddyAgent unsure
[+3.801ms; +7.708ms] or [+0.647%; +1.313%]
scenario:Startup-waf-GlobalTracer same
scenario:Startup-waf-AppSec unsure
[-2.292ms; -0.583ms] or [-1.252%; -0.319%]
scenario:Startup-waf-Telemetry same

@mcculls mcculls force-pushed the mcculls/memoized-type-matching branch 5 times, most recently from 179e420 to aa41c87 Compare April 28, 2023 10:13
@mcculls mcculls force-pushed the mcculls/memoized-type-matching branch 4 times, most recently from b1bad12 to ce7f127 Compare May 3, 2023 11:03
@mcculls mcculls removed the tag: do not merge Do not merge changes label May 3, 2023
@mcculls mcculls force-pushed the mcculls/memoized-type-matching branch from ce7f127 to 67715b3 Compare May 3, 2023 11:56
@mcculls mcculls changed the title [WIP] memoized type matching Memoized type matching May 3, 2023
@mcculls mcculls marked this pull request as ready for review May 3, 2023 16:56
@mcculls mcculls requested a review from a team as a code owner May 3, 2023 16:56
Copy link
Contributor

@bantonsson bantonsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good (as far as I can tell). Have a few questions.


// memoize whether the type is a class
static final ElementMatcher.Junction<TypeDescription> isClass =
prepare(MatcherKind.CLASS, ElementMatchers.any(), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After rereading this a couple of times I wasn't completely sure why the ElementMatchers.any() was ok here. It looks like the memoization is driven from the other end, and we will only use the classMatcherIds bit set on types that are not primitive and not an interface, which makes this ok. Is this somewhat correct @mcculls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct MatcherKind.CLASS means this matcher will only be applied against not-primitive + not-interface types, so the matcher here can be simplified to any()

}

private static int rehash(int oldHash) {
return Integer.reverseBytes(oldHash * 0x9e3775cd) * 0x9e3775cd;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good old Bagwell byte swapping hash.

* Computes a 32-bit 'class-code' that includes the length of the package prefix and simple name,
* plus the first and last characters of the simple name (each truncated to fit into 8-bits.)
*/
private static int classCode(String name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to make collisions very unlikely combined with the normal hash code.


@Override
protected boolean doMatch(TypeDescription target) {
return ExcludeFilter.exclude(excludeType, target.getName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a comment on this PR, but more of an idea. I assume that this isn't hit that often now with memoization, but would it make sense to let the exclude filter build up a trie and use that instead of the hash set lookup and string comparisons?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, worth looking into

final class HasContextField extends ElementMatcher.Junction.ForNonNullValues<TypeDescription> {
private final ElementMatcher<TypeDescription> storeMatcher;
private final ElementMatcher<TypeDescription> skipMatcher;
private final BitSet skippableStoreIds = new BitSet();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems less exact than the weak store id bit sets in ShouldInjectFieldsState because of the maybeSkip, if I read the code correctly. Is there a possibility that we use a weak store unnecessary, or are could we just add redundant code that never gets used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is potentially less exact - let me add some tests to confirm when it would make a difference, I can then make it more exact based on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out the java-completablefuture tests already demonstrate the difference - and the new approach is actually more deterministic. For each context class that can be skipped/excluded we collect store ids associated with that context class (ie. the value side of all the context-value associations) and therefore might need to be 'skipped' when that excluded class is a super-class - by delegating to the weak map instead.

When we do find a context class that should be excluded and it's a super-class of the class we're injecting then we notify the field-injection code that super-class requests involving those stores should delegate to the weak map. That part of the injected code will only be used if someone tries to push/get values to those stores. If a store isn't used then it just means we injected an unused case block.

The old approach was more lazy in that it only recorded a store as needing weak map delegation at the time the match was made. Unfortunately variations in class-loading order can make this non-deterministic - for example with the java-completablefuture tests the UniCompletion class is processed before its superclasses (since both are not yet loaded at matching time) so it doesn't realize that it should delegate Runnable / ForkJoinTask super-class requests to the weak map. This changes if the Completion class is forcibly loaded first, because it now sees the skipped class before processing UniCompletion.

In practice this doesn't make a difference because we add a last-resort catch block to handle unexpected requests and this catch block ends up being used by the old approach. Whereas with the new approach we always add those weak map case blocks (and avoid triggering the exceptional catch block)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mcculls for the detailed explanation. Determinism FTW!

@ygree ygree mentioned this pull request May 5, 2023
@mcculls mcculls force-pushed the mcculls/memoized-type-matching branch from cd89962 to 88a4f4a Compare May 10, 2023 14:37
@mcculls mcculls force-pushed the mcculls/memoized-type-matching branch from 09f7b8f to a2e7761 Compare May 11, 2023 09:25
@mcculls mcculls merged commit 553f2f4 into master May 11, 2023
@mcculls mcculls deleted the mcculls/memoized-type-matching branch May 11, 2023 12:06
@github-actions github-actions bot added this to the 1.14.0 milestone May 11, 2023
@mcculls mcculls mentioned this pull request Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp: core Tracer core tag: performance Performance related changes type: refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants