-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent deep recursion in doc values #60318
Prevent deep recursion in doc values #60318
Conversation
This prevents loops and deep recursion in doc values. This is important because the doc values for runtime fields can depend on one another. We expect that mostly we can catch deep recursion and loops when we prepare the doc values, but we're not going to rely on that. This adds the test when looking up the doc values. It adds it to `SearchLookup` which means that it is automatically provided to all plugins that implement runtime fields. This is important because we mean for it to be a "last resort" to prevent stack overflows.
Pinging @elastic/es-search (:Search/Search) |
run elasticsearch-ci/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments, thanks for working on this!
this.fieldChain = fieldChain; | ||
this.mapperService = mapperService;; | ||
this.indexFieldDataService = indexFieldDataService; | ||
this.docMap = new DocLookup(mapperService, ft -> this.indexFieldDataService.apply(ft, () -> forField(ft.name()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the check is performed on DocLookup, would it make sense to move the tracking to DocLookup?
if (newFieldChain.size() > MAX_FIELD_CHAIN) { | ||
throw new IllegalArgumentException("field definition too deep: " + newFieldChain); | ||
} | ||
return new SearchLookup(unmodifiableList(newFieldChain), mapperService, indexFieldDataService, sourceLookup, fieldsLookup); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could avoide rebuilding DocLookup every time?
* that field looks up the values of a field named {@code bar} then | ||
* {@code bar}'s lookup will contain {@code ["foo", "bar"]}. | ||
*/ | ||
private final List<String> fieldChain; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a list only for faster lookup right? would it make sense to use a Set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about using a set but I figured we only check "contains" very few times so it isn't worth it. I also wanted the ordered output. I know I could get it with a LinkedHashSet
or something, but that just seems like overkill. I guess we could check it a bunch of times if you a very big number of sub-fields.....
* | ||
* @throws IllegalArgumentException if it detects a loop in the fields required to build doc values | ||
*/ | ||
public SearchLookup forField(String name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This goes along my other comments: I wonder if it is necessary to create a new SearchLookup for every field. Maybe we could add references tracking at a lower level so that callers don't need to be adapted, especially given that it's applied only to DocLookup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have a look at pushing the tracking into doc lookup. I'm not sure there is a way to avoid adapting the callers though. I'll check!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at pushing it down and I don't really like how it turned out, mostly because I need a reference to the SearchLookup
in the DocLookup
then which feels pretty circular.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I see it? curious how it turns out.
* Builds a new {@linkplain LeafSearchLookup} for the caller to handle | ||
* a {@linkplain LeafReaderContext}. This lookup shares the {@code _source} | ||
* lookup but doesn't share doc values or stored fields. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I don't understand what you meant here?
/** | ||
* Test a loop that only happens on <strong>some</strong> documents. | ||
*/ | ||
public void testDocValuesSometimesLoop() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this test really needed? could you think of reasons why you may have cycles only for some docs that could not be detected?
This is now part of #61430. |
This prevents loops and deep recursion in doc values. This is important
because the doc values for runtime fields can depend on one another. We
expect that mostly we can catch deep recursion and loops when we prepare
the doc values, but we're not going to rely on that. This adds the test
when looking up the doc values. It adds it to
SearchLookup
which meansthat it is automatically provided to all plugins that implement runtime
fields. This is important because we mean for it to be a "last resort"
to prevent stack overflows.