-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make various perf improvements #805
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
kstich
approved these changes
May 17, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of bits that look like they add up, nice!
smithy-model/src/main/java/software/amazon/smithy/model/selector/InternalSelector.java
Outdated
Show resolved
Hide resolved
This commit makes various performance improvements that particularly help when validating models with 100k+ shapes. * Fixed JMH benchmarks so they work with the updated Gradle version. * Model now uses a synchronized IdentityHashMap now to back the blackboard cache rather than a ConcurrentHashMap. This will actually now prevent duplicate work in creating KnowledgeIndexes. An IdentityHashMap was used because it works well for the classes used to cache knowledge indexes. * HttpBindingIndex now uses a WeakReference to Model. This was previously an unnecessary cyclical reference. * HttpBindingIndex no longer throws when attempting to access the HTTP bindings of shapes that don't exist or aren't operations. This prevents it from having to store a List entry for every single operation. * Model#getShapesWithTraits was used everywhere possible rather than streaming over all shapes and looking for traits. * Made optimizations to NullableIndex to no longer need to traverse every shape. * Removed unnecessary streaming from OperationIndex. * Removed unnecessary streaming from PaginatedIndex. * Removed unnecessary streaming from NeighborVisitor. * Updated Node expect methods to also accept a Supplier to create error messages if the expectation fails. This prevents needing to evaluate String.format even for valid node assertions. * AttributeSelector no longer uses a BiFunction key supplier, and instead the attribute path is just passed in. This allows for the selector to also perform optimizations when determining if a shape has a trait by leveraging Model#getShapesWithTraits. * InternalSelectors used to implement Selectors now support more general optimizations. This was previously hardcoded to only support an optimization for selecting shapes by type, but now selecting shapes by trait is optimized too. * Minor optimization to structure and union loading so that when validating that members have a shape ID compatible with the container, an intermediate shape ID no longer is constructed. * The ShapeId cache was increased from 1024 to 8192. This helps significantly with large models. The ShapeId cache was also updated to implement the LRA cache inside of the computeIfAbsent method. * NodeValidationVisitor now has a Context object that supports caching selectors evaluated against a model. This helps significantly with IdRef validation. To make this caching reusable, the visitor is now mutable after it is constructed. * NodeValidationVisitor idRef now special cases "*" and uses the context cache. * TraitTargetValidator has been simplified, special cases "*", and now uses a cache to speed up evaluating traits that use the same selectors. * TraitValueValidator now reuses the same NodeValidationVisitor in order to reuse the same selector cache. * The `Model#toSet(Class)` method is now public, so getting a set of shapes of a specific type no longer has to always go through a `Stream`.
This commit deprecates Trait#flatMapStream since it results in harder to read code that relies on unnamed tuples. Instead, Model#getShapesWithTraits, Shape#hasTrait, and Shape#expectTrait should be used.
Model validation is disabled when loading the prelude because the prelude is validated during unit tests and the prelude is immutable. However, if the prelude is broken for whatever reason, ERROR events encountered when performing model validation that uses the prelude will still cause an error, meaning the prelude is still validated when actually loading and using other models.
This commit updates the Selector implementation to run each shape through the selector in parallel when working with larger models. This adds around a ~6-10% performance improvement when loading models with 100k+ shapes. I unscientifically chose the threshold to be 10,000 shapes.
kstich
approved these changes
May 18, 2021
Model now contains methods for every shape type to get a Set of shapes or that type or a set of shapes of a type that have a specific trait. These methods are simpler to use than `shapes(Class)` and `toSet(Class)`, and they hopefully encourage the use of the caches that Model uses. These methods also eliminated a lot of boilerplat that getShapesWithTrait previously required when only shapes of a certain type were needed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit makes various performance improvements that particularly help
when validating models with 100k+ shapes.
blackboard cache rather than a ConcurrentHashMap. This will actually
now prevent duplicate work in creating KnowledgeIndexes. An
IdentityHashMap was used because it works well for the classes
used to cache knowledge indexes.
previously an unnecessary cyclical reference.
HTTP bindings of shapes that don't exist or aren't operations.
This prevents it from having to store a List entry for every
single operation.
than streaming over all shapes and looking for traits.
traverse every shape.
error messages if the expectation fails. This prevents needing to
evaluate String.format even for valid node assertions.
the attribute path is just passed in. This allows for the selector
to also perform optimizations when determining if a shape has a trait
by leveraging Model#getShapesWithTraits.
optimizations. This was previously hardcoded to only support an optimization
for selecting shapes by type, but now selecting shapes by trait is
optimized too.
that members have a shape ID compatible with the container, an intermediate
shape ID no longer is constructed.
with large models. The ShapeId cache was also updated to implement the
LRA cache inside of the computeIfAbsent method.
selectors evaluated against a model. This helps significantly with IdRef
validation. To make this caching reusable, the visitor is now mutable
after it is constructed.
cache.
uses a cache to speed up evaluating traits that use the same selectors.
to reuse the same selector cache.
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.