Memoized type matching #5026

mcculls · 2023-04-04T11:45:13Z

What Does This Do

Enables memoization of hierarchical type matchers to reduce the number of times we need to sweep a given type hierarchy.

To turn off memoization add this JVM option:

-Ddd.resolver.cache.config=NO_MEMOS

or set this environment variable:

DD_RESOLVER_CACHE_CONFIG=NO_MEMOS

Motivation

For applications with deep type inheritance each sweep of a hierarchy might result in a complete churn of the type cache. When this happens types cached at the start of each sweep get replaced by others by the end, leading to repeated parsing of the same types for each sweep. This can lead to long startup times when resource lookup is relatively expensive, such as certain OSGi configurations.

Memoization Approach

Each hierarchy matcher is assigned a unique id as it is requested from HierarchyMatchers and a replacement matcher that triggers memoization is returned in its place. We do some limited de-duplication of requests for the same matcher using identity equivalence and a tiny lookup cache. (We cannot rely on equals being implemented properly for every matcher.)

When memoization is triggered for a type we first check for previously shared results. If none exist then memoization is triggered recursively for the super-class and any interfaces. Inherited matches are copied into the current result. Next we record matches for members declared in the type: annotations, fields, and methods. Finally we record matches for the type itself, before sharing the result.

Since all hierarchy matchers are requested before any are used we can store match results in a simple BitSet. The type of each match and whether the match can be inherited are spread over other BitSets, shared across all results like a schema.

"No Match" Filter

Memoization has an additional benefit: we can now easily tell when a type is not interesting to any hierarchical matcher. In most applications these "no match" results heavily outweigh any matches. Unfortunately recording "no matches" has the potential to use a lot of memory, especially if we need to take the class location/class-loader into account.

However if we assume that when a type is not interesting, all types with the same name are also not interesting then we can simplify the "no match" cache down to a simple long array. The first 32 bits of each entry are the hash of the class-name while the remaining 32-bits record extra characteristics of the name, such as the length of the package and simple names. These are used to further reduce the probability of collisions, compared to just using the name hash.

Persisting "No Match" Results

Basing the "No Match" filter on just the class-name also means we can persist it between runs of the same application, because the String.hashCode() algorithm is well-defined and stable. This can provide a further performance boost because it can prune the matching space without needing to load/parse any types.

To enable persistence of the "No Match" filter between runs add this JVM option:

-Ddd.resolver.cache.dir=<writable-directory>

or set this environment variable:

DD_RESOLVER_CACHE_DIR=<writable-directory>

The Java tracer will then write a file at shutdown based on the following name pattern:

<writable-directory>/<uuid>-nomatch.filter

where the UUID is based on the configured service name and version, so each unique service+version has its own file.

If a nomatch.filter file already exists for a particular service+version it will be used to re-seed the "No Match" filter at startup. It won't then be updated at shutdown; to regenerate a nomatch.filter file, delete it and restart the application.

Additional Notes

Here's a short history of the total transformation time for a sizeable, modular web application. Transformation time covers the accumulated time spent inside our ClassFileTransformer for both initial transformations and any retransformations during warmup of the application. I've included some of the more notable changes related to matching and transforming.

Version	Transformation time	Notable changes
0.81.0	32.7s
0.100.0	29.8s	global ignores trie
0.104.0	22.4s	outline type parsing
0.109.1	17.2s	class-loader masks + rework context-store matchers
0.115.1	16.7s	reduce allocations when matching
1.2.0	16.3s	reduce need for full-type parsing
1.3.0	16.0s	use outline types in more situations
1.4.0	15.5s	re-use bytecode during type resolution
1.6.0	13.3s	single combining matcher + splitting transformer
1.11.0	12.7s	replace map-based WeakCache with fixed-sized array approach
---------	---------------------	----------------------------------------------------------------
1.14.0	9.6s	memoization
1.14.0	8.9s	memoization (LARGE)
1.14.0	8.5s	memoization, previously persisted no-match filter
1.14.0	6.3s	memoization, previously persisted no-match filter (LARGE)

LARGE here refers to the dd.resolver.cache.config preset, for example: -Ddd.resolver.cache.config=LARGE

The smallest overall transformation time is 6.3 seconds, by using the LARGE preset with a pre-seeded "No Match" filter. Less than half of that time is matching - the rest is actually transforming the bytecode. Of the 2.5s matching time, muzzle and class-loader checks take roughly a second as does hierarchy matching. The remaining 0.5s is composed of context-store matching and structural checks.

pr-commenter · 2023-04-24T23:22:23Z

Benchmarks

Parameters

	Baseline	Candidate
commit	1.14.0-SNAPSHOT~041ca59785	1.14.0-SNAPSHOT~a2e7761be6
config	baseline	candidate

See matching parameters

	Baseline	Candidate
module	Agent	Agent
parent	None	None

Summary

Found 3 performance improvements and 1 performance regressions! Performance is the same for 18 cases.

scenario	Δ mean execution_time
scenario:Startup-base-Telemetry	better [-0.288ms; -0.191ms] or [-4.770%; -3.165%]
scenario:Startup-iast-IAST	better [-0.743ms; -0.291ms] or [-6.276%; -2.457%]
scenario:Startup-iast-Telemetry	better [-0.319ms; -0.144ms] or [-5.189%; -2.348%]
scenario:Startup-waf-Remote Config	worse [+156.813µs; +340.914µs] or [+20.632%; +44.855%]

See unchanged results

scenario	Δ mean execution_time
scenario:Startup-base-Agent	same
scenario:Startup-base-Agent.start	unsure [+2.536ms; +12.172ms] or [+0.266%; +1.278%]
scenario:Startup-base-BytebuddyAgent	unsure [+3.633ms; +10.070ms] or [+0.619%; +1.716%]
scenario:Startup-base-GlobalTracer	same
scenario:Startup-base-AppSec	same
scenario:Startup-base-Remote Config	unsure [+4.793µs; +17.239µs] or [+0.769%; +2.765%]
scenario:Startup-iast-Agent	unsure [-0.166s; -0.028s] or [-1.713%; -0.289%]
scenario:Startup-iast-Agent.start	same
scenario:Startup-iast-BytebuddyAgent	same
scenario:Startup-iast-GlobalTracer	same
scenario:Startup-iast-AppSec	same
scenario:Startup-iast-Remote Config	same
scenario:Startup-waf-Agent	same
scenario:Startup-waf-Agent.start	unsure [+0.002s; +0.009s] or [+0.146%; +0.858%]
scenario:Startup-waf-BytebuddyAgent	unsure [+3.801ms; +7.708ms] or [+0.647%; +1.313%]
scenario:Startup-waf-GlobalTracer	same
scenario:Startup-waf-AppSec	unsure [-2.292ms; -0.583ms] or [-1.252%; -0.319%]
scenario:Startup-waf-Telemetry	same

…edly walk the type hierarchy

…, while keeping memory usage low

…-seed it when the application restarts

…f memoization when we know type-names are unique.

bantonsson

Looks good (as far as I can tell). Have a few questions.

bantonsson · 2023-05-04T10:48:50Z

...gent/agent-tooling/src/main/java/datadog/trace/agent/tooling/bytebuddy/memoize/Memoizer.java

+
+  // memoize whether the type is a class
+  static final ElementMatcher.Junction<TypeDescription> isClass =
+      prepare(MatcherKind.CLASS, ElementMatchers.any(), true);


After rereading this a couple of times I wasn't completely sure why the ElementMatchers.any() was ok here. It looks like the memoization is driven from the other end, and we will only use the classMatcherIds bit set on types that are not primitive and not an interface, which makes this ok. Is this somewhat correct @mcculls?

Correct MatcherKind.CLASS means this matcher will only be applied against not-primitive + not-interface types, so the matcher here can be simplified to any()

bantonsson · 2023-05-04T12:36:14Z

...agent-tooling/src/main/java/datadog/trace/agent/tooling/bytebuddy/memoize/NoMatchFilter.java

+  }
+
+  private static int rehash(int oldHash) {
+    return Integer.reverseBytes(oldHash * 0x9e3775cd) * 0x9e3775cd;


Good old Bagwell byte swapping hash.

bantonsson · 2023-05-04T12:38:49Z

...agent-tooling/src/main/java/datadog/trace/agent/tooling/bytebuddy/memoize/NoMatchFilter.java

+   * Computes a 32-bit 'class-code' that includes the length of the package prefix and simple name,
+   * plus the first and last characters of the simple name (each truncated to fit into 8-bits.)
+   */
+  private static int classCode(String name) {


Sounds reasonable to make collisions very unlikely combined with the normal hash code.

bantonsson · 2023-05-04T12:47:57Z

...ent-tooling/src/main/java/datadog/trace/agent/tooling/bytebuddy/memoize/HasContextField.java

+
+    @Override
+    protected boolean doMatch(TypeDescription target) {
+      return ExcludeFilter.exclude(excludeType, target.getName());


This is not a comment on this PR, but more of an idea. I assume that this isn't hit that often now with memoization, but would it make sense to let the exclude filter build up a trie and use that instead of the hash set lookup and string comparisons?

Good idea, worth looking into

...ent-tooling/src/main/java/datadog/trace/agent/tooling/bytebuddy/memoize/HasContextField.java

bantonsson · 2023-05-04T13:15:37Z

...ent-tooling/src/main/java/datadog/trace/agent/tooling/bytebuddy/memoize/HasContextField.java

+final class HasContextField extends ElementMatcher.Junction.ForNonNullValues<TypeDescription> {
+  private final ElementMatcher<TypeDescription> storeMatcher;
+  private final ElementMatcher<TypeDescription> skipMatcher;
+  private final BitSet skippableStoreIds = new BitSet();


This seems less exact than the weak store id bit sets in ShouldInjectFieldsState because of the maybeSkip, if I read the code correctly. Is there a possibility that we use a weak store unnecessary, or are could we just add redundant code that never gets used?

Yes this is potentially less exact - let me add some tests to confirm when it would make a difference, I can then make it more exact based on that.

It turns out the java-completablefuture tests already demonstrate the difference - and the new approach is actually more deterministic. For each context class that can be skipped/excluded we collect store ids associated with that context class (ie. the value side of all the context-value associations) and therefore might need to be 'skipped' when that excluded class is a super-class - by delegating to the weak map instead.

When we do find a context class that should be excluded and it's a super-class of the class we're injecting then we notify the field-injection code that super-class requests involving those stores should delegate to the weak map. That part of the injected code will only be used if someone tries to push/get values to those stores. If a store isn't used then it just means we injected an unused case block.

The old approach was more lazy in that it only recorded a store as needing weak map delegation at the time the match was made. Unfortunately variations in class-loading order can make this non-deterministic - for example with the java-completablefuture tests the UniCompletion class is processed before its superclasses (since both are not yet loaded at matching time) so it doesn't realize that it should delegate Runnable / ForkJoinTask super-class requests to the weak map. This changes if the Completion class is forcibly loaded first, because it now sees the skipped class before processing UniCompletion.

In practice this doesn't make a difference because we add a last-resort catch block to handle unexpected requests and this catch block ends up being used by the old approach. Whereas with the new approach we always add those weak map case blocks (and avoid triggering the exceptional catch block)

Thanks @mcculls for the detailed explanation. Determinism FTW!

…d cache, we just need to avoid adding them to the simple 'no-match' filter

mcculls added tag: do not merge Do not merge changes comp: core Tracer core tag: performance Performance related changes type: refactoring labels Apr 4, 2023

mcculls force-pushed the mcculls/memoized-type-matching branch 3 times, most recently from ca96d6d to 7361cd5 Compare April 18, 2023 15:04

mcculls force-pushed the mcculls/memoized-type-matching branch 5 times, most recently from 60c3b7b to 9e12d28 Compare April 24, 2023 23:04

mcculls force-pushed the mcculls/memoized-type-matching branch 5 times, most recently from 179e420 to aa41c87 Compare April 28, 2023 10:13

mcculls force-pushed the mcculls/memoized-type-matching branch 4 times, most recently from b1bad12 to ce7f127 Compare May 3, 2023 11:03

mcculls removed the tag: do not merge Do not merge changes label May 3, 2023

mcculls and others added 5 commits May 3, 2023 12:55

Rename ClassLoaderMatchers.reset to more descriptive resetState

9cd5242

Support memoization of hierarchical matches to avoid having to repeat…

7d3491b

…edly walk the type hierarchy

Add compact no-match filter to avoid re-memoizing uninteresting types…

c303777

…, while keeping memory usage low

Provide option to persist NoMatchFilter when application exits and re…

9c63e5c

…-seed it when the application restarts

Add 'dd.resolver.names.are.unique' property to allow simplification o…

67715b3

…f memoization when we know type-names are unique.

mcculls force-pushed the mcculls/memoized-type-matching branch from ce7f127 to 67715b3 Compare May 3, 2023 11:56

mcculls changed the title ~~[WIP] memoized type matching~~ Memoized type matching May 3, 2023

mcculls marked this pull request as ready for review May 3, 2023 16:56

mcculls requested a review from a team as a code owner May 3, 2023 16:56

bantonsson reviewed May 4, 2023

View reviewed changes

ygree mentioned this pull request May 5, 2023

Matcher Cache Builder #3546

Closed

weakStoreIds is local to the caller and doesn't require synchronization

88a4f4a

bantonsson approved these changes May 9, 2023

View reviewed changes

mcculls force-pushed the mcculls/memoized-type-matching branch from cd89962 to 88a4f4a Compare May 10, 2023 14:37

mcculls added 2 commits May 11, 2023 09:35

Avoid caching memoization results based on partial type information

64c752c

Optimization: we can still share partial results in the location-base…

a2e7761

…d cache, we just need to avoid adding them to the simple 'no-match' filter

mcculls force-pushed the mcculls/memoized-type-matching branch from 09f7b8f to a2e7761 Compare May 11, 2023 09:25

mcculls merged commit 553f2f4 into master May 11, 2023

mcculls deleted the mcculls/memoized-type-matching branch May 11, 2023 12:06

github-actions bot added this to the 1.14.0 milestone May 11, 2023

mcculls mentioned this pull request Aug 31, 2023

High Startup Resources #3704

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memoized type matching #5026

Memoized type matching #5026

mcculls commented Apr 4, 2023 •

edited

Loading

pr-commenter bot commented Apr 24, 2023 •

edited

Loading

bantonsson left a comment

bantonsson May 4, 2023

mcculls May 4, 2023

bantonsson May 4, 2023

bantonsson May 4, 2023

bantonsson May 4, 2023

mcculls May 4, 2023

bantonsson May 4, 2023

mcculls May 4, 2023

mcculls May 9, 2023

bantonsson May 9, 2023

Memoized type matching #5026

Memoized type matching #5026

Conversation

mcculls commented Apr 4, 2023 • edited Loading

What Does This Do

Motivation

Memoization Approach

"No Match" Filter

Persisting "No Match" Results

Additional Notes

pr-commenter bot commented Apr 24, 2023 • edited Loading

Benchmarks

Parameters

Summary

bantonsson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcculls commented Apr 4, 2023 •

edited

Loading

pr-commenter bot commented Apr 24, 2023 •

edited

Loading