-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE][ML] Fetch from source when fields are more then docvalue limit #43204
[FEATURE][ML] Fetch from source when fields are more then docvalue limit #43204
Conversation
Pinging @elastic/ml-core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this adds bugs for TimeField
and GeoPointField
extractors.
@@ -86,6 +88,10 @@ public ExtractedFields detect() { | |||
if (extractedFields.getAllFields().isEmpty()) { | |||
throw ExceptionsHelper.badRequestException("No compatible fields could be detected in index [{}]", index); | |||
} | |||
if (extractedFields.getAllFields().size() > docValueFieldsLimit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be extractedFields.getDocValueFields().size()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few lines above we filter only doc_value fields, so it makes no difference. But I see how it'd be clearer if I made this change. I shall!
@@ -86,6 +88,10 @@ public ExtractedFields detect() { | |||
if (extractedFields.getAllFields().isEmpty()) { | |||
throw ExceptionsHelper.badRequestException("No compatible fields could be detected in index [{}]", index); | |||
} | |||
if (extractedFields.getAllFields().size() > docValueFieldsLimit) { | |||
extractedFields = new ExtractedFields(extractedFields.getAllFields().stream().map(ExtractedField::newFromSource) | |||
.collect(Collectors.toList())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that we only really need to transform as many as necessary to drop below the docValueFieldsLimit
, and then only those that that supportsFromSource()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we only really need to transform as many as necessary to drop below the docValueFieldsLimit
As long as we have to touch the source, we might as well fetch them all from there. Performance-wise it's going to be better.
and then only those that that supportsFromSource()
But this is a very good poing. In case we have fields that don't support from-source, we should insist taking them from doc_values. Then we'll also need a check to see if we're still over the limit.
|
||
@Override | ||
protected boolean supportsFromSource() { | ||
return getExtractionMethod() == ExtractionMethod.DOC_VALUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we are assuming that anything that is FromFields
that has ExtractionMethod.DOC_VALUE
also supports extraction via _source
? This seems OK to me for the most part, except for GeoPointField
, which may need to override supportsFromSource()
to always return false
.
I think we may want to do something similar with TimeField
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Will adjust.
02c056e
to
a3f79e0
Compare
a3f79e0
to
5c5819e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🏑 🌱
There is an index level setting called
index.max_docvalue_fields_search
which limits the number of doc_value fields that can be returned from a search. This commit changes behaviour of the extractor so that it's now reading the value of that setting and if there are more fields we switch to fetching the fields from_source
.