-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use lazy indexing repeated expression caching to optimize predicates #715
Conversation
This only works for predicates of the form: ``` <path expression> = <path expression> ``` It's very likely there are untested edge cases that this implementation will break.
As discussed, JavaRosa should expose a way to add a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done a deep review yet but I think it would be helpful to get this in beta so we can get feedback on any crashes or issues that users notice.
passed.addAll(predicateCache.get(nodeSetRef, xpe, () -> { | ||
List<TreeReference> predicatePassed = new ArrayList<>(treeReferences.size()); | ||
for (int i = 0; i < treeReferences.size(); ++i) { | ||
//if there are predicates then we need to see if e.nextElement meets the standard of the predicate | ||
TreeReference treeRef = treeReferences.get(i); | ||
|
||
//test the predicate on the treeElement | ||
EvaluationContext evalContext = rescope(treeRef, (firstTimeCapture ? treeRef.getMultLast() : i)); | ||
EvaluationContext evalContext = rescope(treeRef, i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble convincing myself that the whole condition is ok to delete. Can you explain the reasoning please? Or is it just that there are no tests for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It had no tests and I couldn't explain the reasoning for it. Agreed it feels scary to delete, but it's also completely unexplained code. Do you feel like we need to add the logic back in? I couldn't come up with an example at the time, but I was heavily down in the weeds. Maybe we'd have more success reverse engineering a test now.
context = new EvaluationContext(evalContext, new IdempotentInMemPredicateCache()); | ||
List<PredicateFilter> filters = Stream.concat( | ||
customPredicateFilters.stream(), | ||
Stream.of(indexPredicateFilter, cachingPredicateFilter, new IdempotentPredicateCache()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first two caches are instantiated at form load time and the third is instantiated at the start of an evaluation cascade. This is really fundamental to how they're intended to work and it feels scary to me that there's no explanation of this distinction anywhere in the code. When I first saw this line, my reaction was that all three should be constructed in the same way.
The good news is that repeatsUsedInCalculatesStayUpToDate
and similarCmpAndEqExpressionsDoNotGetConfused
do fail if the last cache is instantiated globally rather than just for a single cascade.
Not sure there's action to take here but did want to flag it. Maybe the last cache type's header comment should include that it's only intended for use in a single evaluation cascade. Another idea would be to build a single cache and explicitly reset it at the beginning or end of each cascade evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note, why does similarCmpAndEqExpressionsDoNotGetConfused
fail in that case? Are there cases where the expressions could get confused within a single cascade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure there's action to take here but did want to flag it. Maybe the last cache type's header comment should include that it's only intended for use in a single evaluation cascade. Another idea would be to build a single cache and explicitly reset it at the beginning or end of each cascade evaluation.
There is currently a comment on IdempotentPredicateCache
:
Caches down stream evaluations (in the {@link PredicateFilter} chain) for "idempotent" (with respect to current form state) predicates. Can only be used for static instances or in cases where form state won't change - will cause clashes otherwise. Repeated evaluations are fetched in O(1) time.
Does that feel like enough or do you think it needs more details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note, why does similarCmpAndEqExpressionsDoNotGetConfused fail in that case? Are there cases where the expressions could get confused within a single cascade?
Not sure! The cache definitely isn't designed to work between evaluation runs, so I'd imagine that's messing with something. I believe the repeats test was one that drove out the cache only being used for single evaluations.
return filtered; | ||
} | ||
} | ||
return filterWithPredicate(sourceInstance, treeReference, predicate, children, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The iterative approach felt simper to me! I'd be interested to briefly discuss why this feels clearer to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a while now since I made the change, but I guess I was anticipating that we needed to "fall through" and then let values "bubble" back up through these filters. In CompareChildToAbsoluteExpressionFilter
for example, you need to let the lower parts of the chain evaluate and then use that to build the cache. For me, that felt easiest to model recursively. Does that make sense?
if (absoluteSide instanceof XPathPathExpr) { | ||
return ((XPathPathExpr) getAbsoluteSide()).eval(sourceInstance, evaluationContext).unpack(); | ||
} else { | ||
return absoluteSide.eval(sourceInstance, evaluationContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This accounts for literals? Yes, confirmed through debugging tests.
Closes #689
Closes #714
This adds two extra levels of caching for predicates. Both take advantage of expressions with the structure (in either direction):
The "absolute" expression can currently be a path expression, a numeric literal or a string literal. It's likely we could extend this to any "idempotent" (where
XPathExpression#isIdempotent
istrue
) expression though.For the first level of caching (
CompareChildToAbsoluteExpressionFilter
), we support these expressions with the operators=
,!=
,<
,>
,<=
and>=
. These expressions can be broken up into a "relative" part and an "absolute" part, the absolute part can then be evaluated and use a key (along with the predicate and some other variables) to cache the results for that expression when evaluated with for that particular evaluation. For example an expression likename = /data/choice
can be partly evaluated toname = "Bill"
and then we have enough information to cache the children wherename
is"Bill"
. This means that we evaluate repeated predicates of this form with a previously evaluated answer in O(1) rather than O(n) wheren
is the number children we are filtering.The second level of caching (
IndexPredicateFilter
) only works on=
. This uses a similar approach toPredicateCachingFilter
, but lazily builds an index using the for the "relative" side. From the example before, this would mean we end up creating an index for all the possiblename
values in a set of children we're filtering meaning when we evaluate the absolute side to"Bill"
, we can just grab the matching children out of the index. This means that we evaluate repeated predicates of this form where we've previously filtered on that relative path in O(1) rather than O(n) wheren
is the number children we are filtering. It's worth noting that this will probably increase loading times for forms as indexing will most likely happen then. I think we can probably extend this indexing approach for the other operators, but I'd need to consult a grimoire to work those out, and it may just not be worth it.There are a few caveats/edge cases here. I don't think any of these block us here, but it would be good to discuss whether these are things we should create issues for, or just leave for a potential future where we feel like they're worth it:
No caching is currently used in cases where there are multiple predicates likeWe now use caching when evaluating the first in a chain of predicates, but don't for any following ones as we discussed./data/children[<predicate>][<another predicate]
. This fixes Cascade-level filtered expression caching results in bad evaluations with multiple predicates #714. We'll most likely want to do extra work to support these expressions or add support for&&
predicates in the future. We might be able to add a simple compromise where we only use caching with the first predicate in a set of multiple predicates./data/children[<predicate>]/grandChild[<another predicate>]
. This causes problems for indexing as we need to index the whole possible nodeset rather than the nested one. Similar to the above caveat, there might be a nice compromise where just use caching with the first predicate in the path.name = nickname
. If we feel like this would be impactful, we can probably add it down the road, but the implementation would differ from what we have so far (as both sides need to be evaluated for every child).I've also added an API to allow clients to add a
PredicateFilter
:This should allow Collect to add caching that works between form sessions (speeding up form loading times).