[ML] Remove noisy 'Could not find trained model' message #100760

davidkyle · 2023-10-12T11:14:53Z

ModelLoadingService watches for the creation of ingest pipelines that use the ingest processor and pre-loads the model referenced in the new pipeline. If the model does not exist because the pipeline has been created before the model an error message is logged, this can happen quite a lot producing spurious error messages. Often this will be because a solution creates ingest pipeline referencing the ELSER model before ELSER has been downloaded.

ModelLoadingService only handles DFA/Boosted tree models which are loaded on ingest nodes, it does not load PyTorch/NLP models but it first has to read the model config to know what type the model is.

If is very easy to cause the error message to be logged, just create a ingest processor with an unknown model id

PUT _ingest/pipeline/elser
{
  "processors": [
    {
      "inference": {
        "model_id": "not-a-model",
        "field_map": {
          "body": "text_field"
        },
        "target_field": "ml"
      }
    }
  ]
}

And the following is logged.

[2023-10-12T09:45:51,528][WARN ][o.e.x.m.i.l.ModelLoadingService] [host] [not-a-model] failed to load model configurationorg.elasticsearch.ResourceNotFoundException: Could not find trained model [not-a-model]
	at org.elasticsearch.ml@8.11.0/org.elasticsearch.xpack.ml.inference.persistence.TrainedModelProvider.lambda$getTrainedModel$22(TrainedModelProvider.java:670)
	at org.elasticsearch.server@8.11.0/org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:177)
	at org.elasticsearch.server@8.11.0/org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:32)
	at org.elasticsearch.server@8.11.0/org.elasticsearch.client.internal.node.NodeClient$SafelyWrappedActionListener.onResponse(NodeClient.java:161)
	at org.elasticsearch.server@8.11.0/org.elasticsearch.tasks.TaskManager$1.onResponse(TaskManager.java:203)

elasticsearchmachine · 2023-10-12T11:15:18Z

Hi @davidkyle, I've created a changelog YAML for you.

elasticsearchmachine · 2023-10-12T11:15:18Z

Pinging @elastic/ml-core (Team:ML)

droberts195 · 2023-10-12T11:54:51Z

...l/src/main/java/org/elasticsearch/xpack/ml/inference/loadingservice/ModelLoadingService.java

@@ -745,14 +750,13 @@ private void cacheEvictionListener(RemovalNotification<String, ModelAndConsumer>

    @Override
    public void clusterChanged(ClusterChangedEvent event) {
-        final boolean prefetchModels = event.state().nodes().getLocalNode().isIngestNode();
-        // If we are not prefetching models and there were no model alias changes, don't bother handling the changes
-        if ((prefetchModels == false)


Is this changing the behaviour beyond what the release note says?

Previously in the case where an ingest pipeline referenced a model that didn't exist, that model would be prefetched on ingest nodes as soon as it was created. But now that doesn't happen? Or am I misreading this change?

Maybe the old behaviour was unacceptably resource-intensive, because it was doing a lot of processing on ingest nodes on every cluster state update, and changing it is justified. But in that case I think the release note should say this rather than pretend the change is only about logging.

I'll make those changes in another PR

droberts195

LGTM

elasticsearchmachine · 2023-10-12T14:58:00Z

💚 Backport successful

Status	Branch	Result
✅	8.11

) A 'Could not find trained model' was logged when a new ingest pipeline using a inference processor was created but the referenced model could not be found. This isn't unusual as it is common to create a pipeline before the model is loaded. This change does not log the error message in this situation

…100775) A 'Could not find trained model' was logged when a new ingest pipeline using a inference processor was created but the referenced model could not be found. This isn't unusual as it is common to create a pipeline before the model is loaded. This change does not log the error message in this situation

Remove spam model loading message

0b87ea6

davidkyle added >bug :ml Machine learning v8.11.0 v8.12.0 labels Oct 12, 2023

elasticsearchmachine added the Team:ML Meta label for the ML team label Oct 12, 2023

Update docs/changelog/100760.yaml

60567a7

droberts195 reviewed Oct 12, 2023

View reviewed changes

remove extras

7dd4b35

droberts195 approved these changes Oct 12, 2023

View reviewed changes

davidkyle added the auto-backport-and-merge label Oct 12, 2023

davidkyle merged commit 31ecf04 into elastic:main Oct 12, 2023

davidkyle mentioned this pull request Oct 12, 2023

[8.11] [ML] Remove noisy 'Could not find trained model' message (#100760) #100775

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Remove noisy 'Could not find trained model' message #100760

[ML] Remove noisy 'Could not find trained model' message #100760

davidkyle commented Oct 12, 2023

elasticsearchmachine commented Oct 12, 2023

elasticsearchmachine commented Oct 12, 2023

droberts195 Oct 12, 2023

davidkyle Oct 12, 2023

droberts195 left a comment

elasticsearchmachine commented Oct 12, 2023

[ML] Remove noisy 'Could not find trained model' message #100760

[ML] Remove noisy 'Could not find trained model' message #100760

Conversation

davidkyle commented Oct 12, 2023

elasticsearchmachine commented Oct 12, 2023

elasticsearchmachine commented Oct 12, 2023

droberts195 Oct 12, 2023

Choose a reason for hiding this comment

davidkyle Oct 12, 2023

Choose a reason for hiding this comment

droberts195 left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Oct 12, 2023

💚 Backport successful