[FEATURE][ML] Fetch from source when fields are more then docvalue limit #43204

dimitris-athanasiou · 2019-06-13T15:25:56Z

There is an index level setting called index.max_docvalue_fields_search which limits the number of doc_value fields that can be returned from a search. This commit changes behaviour of the extractor so that it's now reading the value of that setting and if there are more fields we switch to fetching the fields from _source.

elasticmachine · 2019-06-13T15:25:58Z

Pinging @elastic/ml-core

benwtrent

I think this adds bugs for TimeField and GeoPointField extractors.

benwtrent · 2019-06-13T20:19:53Z

...ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/extractor/ExtractedFieldsDetector.java

@@ -86,6 +88,10 @@ public ExtractedFields detect() {
        if (extractedFields.getAllFields().isEmpty()) {
            throw ExceptionsHelper.badRequestException("No compatible fields could be detected in index [{}]", index);
        }
+        if (extractedFields.getAllFields().size() > docValueFieldsLimit) {


Shouldn't this be extractedFields.getDocValueFields().size()?

A few lines above we filter only doc_value fields, so it makes no difference. But I see how it'd be clearer if I made this change. I shall!

benwtrent · 2019-06-13T20:20:46Z

...ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/extractor/ExtractedFieldsDetector.java

@@ -86,6 +88,10 @@ public ExtractedFields detect() {
        if (extractedFields.getAllFields().isEmpty()) {
            throw ExceptionsHelper.badRequestException("No compatible fields could be detected in index [{}]", index);
        }
+        if (extractedFields.getAllFields().size() > docValueFieldsLimit) {
+            extractedFields = new ExtractedFields(extractedFields.getAllFields().stream().map(ExtractedField::newFromSource)
+                .collect(Collectors.toList()));


It seems to me that we only really need to transform as many as necessary to drop below the docValueFieldsLimit, and then only those that that supportsFromSource().

we only really need to transform as many as necessary to drop below the docValueFieldsLimit

As long as we have to touch the source, we might as well fetch them all from there. Performance-wise it's going to be better.

and then only those that that supportsFromSource()

But this is a very good poing. In case we have fields that don't support from-source, we should insist taking them from doc_values. Then we'll also need a check to see if we're still over the limit.

benwtrent · 2019-06-13T20:25:40Z

...in/ml/src/main/java/org/elasticsearch/xpack/ml/datafeed/extractor/fields/ExtractedField.java

+
+        @Override
+        protected boolean supportsFromSource() {
+            return getExtractionMethod() == ExtractionMethod.DOC_VALUE;


So, we are assuming that anything that is FromFields that has ExtractionMethod.DOC_VALUE also supports extraction via _source? This seems OK to me for the most part, except for GeoPointField, which may need to override supportsFromSource() to always return false.

I think we may want to do something similar with TimeField

Good catch! Will adjust.

benwtrent

🏑 🌱

dimitris-athanasiou added the :ml Machine learning label Jun 13, 2019

benwtrent reviewed Jun 13, 2019

View reviewed changes

dimitris-athanasiou force-pushed the fetch-from-source-when-too-many-fields branch from 02c056e to a3f79e0 Compare June 14, 2019 11:38

dimitris-athanasiou added 4 commits June 14, 2019 15:19

[FEATURE][ML] Fetch from source when fields are more then docvalue limit

ad1b02e

Remove unused import

b9b84d9

Catch IndexNotFoundException at first callback

0000a2d

Address review comments

5c5819e

dimitris-athanasiou force-pushed the fetch-from-source-when-too-many-fields branch from a3f79e0 to 5c5819e Compare June 14, 2019 12:19

benwtrent approved these changes Jun 14, 2019

View reviewed changes

dimitris-athanasiou merged commit eced353 into elastic:feature-ml-data-frame-analytics Jun 14, 2019

dimitris-athanasiou deleted the fetch-from-source-when-too-many-fields branch June 14, 2019 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE][ML] Fetch from source when fields are more then docvalue limit #43204

[FEATURE][ML] Fetch from source when fields are more then docvalue limit #43204

dimitris-athanasiou commented Jun 13, 2019

elasticmachine commented Jun 13, 2019

benwtrent left a comment

benwtrent Jun 13, 2019

dimitris-athanasiou Jun 14, 2019 •

edited

Loading

benwtrent Jun 13, 2019

dimitris-athanasiou Jun 14, 2019

benwtrent Jun 13, 2019

dimitris-athanasiou Jun 14, 2019

benwtrent left a comment

[FEATURE][ML] Fetch from source when fields are more then docvalue limit #43204

[FEATURE][ML] Fetch from source when fields are more then docvalue limit #43204

Conversation

dimitris-athanasiou commented Jun 13, 2019

elasticmachine commented Jun 13, 2019

benwtrent left a comment

Choose a reason for hiding this comment

benwtrent Jun 13, 2019

Choose a reason for hiding this comment

dimitris-athanasiou Jun 14, 2019 • edited Loading

Choose a reason for hiding this comment

benwtrent Jun 13, 2019

Choose a reason for hiding this comment

dimitris-athanasiou Jun 14, 2019

Choose a reason for hiding this comment

benwtrent Jun 13, 2019

Choose a reason for hiding this comment

dimitris-athanasiou Jun 14, 2019

Choose a reason for hiding this comment

benwtrent left a comment

Choose a reason for hiding this comment

dimitris-athanasiou Jun 14, 2019 •

edited

Loading