[ML][Inference] adding tree model #47044

benwtrent · 2019-09-24T18:52:23Z

This adds the base tree model that can be used for our future ensemble model.

elasticmachine · 2019-09-24T18:52:25Z

Pinging @elastic/ml-core

benwtrent · 2019-09-24T18:54:26Z

@valeriy42 Let me know what you think.

Additionally, I cannot find a use for BOTH split_feature and split_index. It seems to me we just need ONE of those because they will always point to the same thing, no?

hendrikmuhs

LGTM

hendrikmuhs · 2019-09-25T06:12:59Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/model/tree/Tree.java

+                if (treeNode.getRightChild() != null) {
+                    toVisit.add(treeNode.getRightChild());
+                }
+            }


you could add an easy check for disconnected nodes by checking visited.size() == nodes.size()

valeriy42 · 2019-09-25T11:51:07Z

Additionally, I cannot find a use for BOTH split_feature and split_index. It seems to me we just need ONE of those because they will always point to the same thing, no?

The difference is split_index is an index of the tree node, which is references e.g. by left_child or right_child, while split_feature refers to the index of the feature (after pre-processing) wrt which we are splitting.

valeriy42

Looks good. There is a mix-up between "nodeIndex" and "splitIndex". Also "model" is called "evaluation" in JSON schema. We need to synchronize the definitions.

valeriy42 · 2019-09-25T11:57:20Z

client/rest-high-level/src/main/java/org/elasticsearch/client/ml/inference/model/tree/Tree.java

+         * @param decisionThreshold The decision threshold
+         * @return The created node
+         */
+        public TreeNode.Builder addJunction(int nodeIndex, int featureIndex, boolean isDefaultLeft, double decisionThreshold) {


So the nodeIndex variable here is what is called split_index in JSON. We should homogenize the names.

I think node_index is a more friendly name. Honestly, the value will probably be passed down from the tree as the node array order needs to be guaranteed for serialization to work well.

valeriy42 · 2019-09-25T12:01:50Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/XPackClientPlugin.java

                new NamedWriteableRegistry.Entry(PreProcessor.class, FrequencyEncoding.NAME.getPreferredName(), FrequencyEncoding::new),
                new NamedWriteableRegistry.Entry(PreProcessor.class, OneHotEncoding.NAME.getPreferredName(), OneHotEncoding::new),
                new NamedWriteableRegistry.Entry(PreProcessor.class, TargetMeanEncoding.NAME.getPreferredName(), TargetMeanEncoding::new),
+                // ML - Inference models


I tried to avoid using "models" since there is a long tradition of overloading this term. In JSON schema the section is called evaluation. I am not particularly invested in the term, but we should stick to the same terminology everywhere.

This also may mean that we rename "evaluation" to "model" in the JSON schema.

I really like using model as it is a unified name with the ensemble and its nested models object.

It seems logical to me that a model that is an ensemble will have many models. If we choose to use evaluation I think ensemble should have an evaluations field.

* [ML][Inference] adding tree model * renaming features for updated schema

* [ML][Inference] adding tree model (#47044) * [ML][Inference] adding tree model * renaming features for updated schema * fixing 7.x compilation

[ML][Inference] adding tree model

06ab2ce

benwtrent added >non-issue :ml Machine learning v8.0.0 v7.5.0 labels Sep 24, 2019

hendrikmuhs approved these changes Sep 25, 2019

View reviewed changes

valeriy42 reviewed Sep 25, 2019

View reviewed changes

renaming features for updated schema

23472a5

benwtrent merged commit 85f1272 into elastic:master Sep 25, 2019

benwtrent deleted the feature/ml-inference-model-parsing branch September 25, 2019 20:02

benwtrent mentioned this pull request Sep 25, 2019

[7.x] [ML][Inference] adding tree model (#47044) #47141

Merged

benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Sep 25, 2019

[ML][Inference] adding tree model (elastic#47044)

bb4e206

* [ML][Inference] adding tree model * renaming features for updated schema

benwtrent added a commit that referenced this pull request Sep 25, 2019

[7.x] [ML][Inference] adding tree model (#47044) (#47141)

fcddaa9

* [ML][Inference] adding tree model (#47044) * [ML][Inference] adding tree model * renaming features for updated schema * fixing 7.x compilation

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML][Inference] adding tree model #47044

[ML][Inference] adding tree model #47044

benwtrent commented Sep 24, 2019

elasticmachine commented Sep 24, 2019

benwtrent commented Sep 24, 2019

hendrikmuhs left a comment

hendrikmuhs Sep 25, 2019

valeriy42 commented Sep 25, 2019

valeriy42 left a comment

valeriy42 Sep 25, 2019

benwtrent Sep 25, 2019

valeriy42 Sep 25, 2019

benwtrent Sep 25, 2019

[ML][Inference] adding tree model #47044

[ML][Inference] adding tree model #47044

Conversation

benwtrent commented Sep 24, 2019

elasticmachine commented Sep 24, 2019

benwtrent commented Sep 24, 2019

hendrikmuhs left a comment

Choose a reason for hiding this comment

hendrikmuhs Sep 25, 2019

Choose a reason for hiding this comment

valeriy42 commented Sep 25, 2019

valeriy42 left a comment

Choose a reason for hiding this comment

valeriy42 Sep 25, 2019

Choose a reason for hiding this comment

benwtrent Sep 25, 2019

Choose a reason for hiding this comment

valeriy42 Sep 25, 2019

Choose a reason for hiding this comment

benwtrent Sep 25, 2019

Choose a reason for hiding this comment