-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enrich documents with inference results at Fetch #53230
Enrich documents with inference results at Fetch #53230
Conversation
Pinging @elastic/es-search (:Search/Search) |
Pinging @elastic/ml-core (:ml) |
I think it would be beneficial to have a way to "deploy" models on to nodes. The downside is deployment vs. access race conditions would probably still result in a synchronous loading or something. |
I haven't taken a close look at the code yet, but have some high-level comments first.
I actually don't find this awkward, since the Building on this observation, I wanted to share another option for the API. We could model the inference as a new section similar to script fields, perhaps called
Note that the
Ack, will give this some thought. I think my API suggestion above would also look much more natural without being nested under
We've been quite unsure if it's better to run inference on the coordinating vs. data nodes. Is there a known set of investigations/ discussions we need to complete to reach clarity on this decision? Perhaps this would involve determining the types of models we want to support in a v1, and thinking through @benwtrent's idea about model deployment? It would be nice to have this list somewhere (I'm happy to move this conversation to an issue/ design doc to not make the PR too noisy). |
I like this syntax it looks like aggregations and fits better with the query DSL. The only problem is that the inference processor is configured differently and it would be obtuse to make it so the config cannot be copy and pasted. But we should consider the option. |
4c29d66
to
333ba97
Compare
run elasticsearch-ci/2 |
989c215
to
0d3aa04
Compare
0d3aa04
to
3088b85
Compare
public interface InferenceResults extends NamedWriteable { | ||
|
||
void writeResult(IngestDocument document, String parentResultField); | ||
|
||
Map<String, Object> writeResultToMap(String parentResultField); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should implement ToXContentObject
instead of having a bespoke method that creates a map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The map is converted to DocumentField
s we couldn't do that with ToXContentObject
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool :D
listener.onResponse(trainedModelDefinition.infer(fields, config)); | ||
} catch (Exception e) { | ||
listener.onFailure(e); | ||
public InferenceResults infer(Map<String, Object> fields, InferenceConfig config) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is backwards.
We should have the synchronous method call the asynchronous method. This is the prevelant pattern everywhere else. Also, it is possible to make something asynchronous -> synchronous, not really the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is backwards.
The function called by LocalModel::infer
is TrainedModelDefinition::infer
which does not have an async version. In this case we want to work to be done in the calling thread because the model is local to the call, for single threaded models I can't think of a situation where we would want to spawn another thread to do the work as we know inference is cheap. For models that could be parallelised and the work split over multiple threads then yes you would want to make it async.
Do we even need the async method right now? It is only called by TransportInternalInferModelAction and could easily be changed.
I removed the default method because it is backwards and wrong then implemented it in LocalModel
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we even need the async method right now?
Maybe not, but it is much more difficult to make things asynchronous once they are synchronous.
Assumptions are made about where the model lives when it is synchronous.
What if this was a natively loaded model?
Would we pause the calling thread for the data to be serialized down the native process?
I am not sure about this, but I did not want to paint us in a corner.
|
||
modelLoadingService.get().getModel(infBuilder.getModelId(), listener); | ||
try { | ||
// Eeek blocking on a latch we can't be doing that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed :D. We may want this call to fail if the model is not deployed in the provided model service. Especially since there is no way to load it just in time :/
Adds a FetchSubPhase which adds a new field to the search hits with the result of the model inference performed on the hit. There isn't a direct way of configuring FetchSubPhases so SearchExtSpec is used for the purpose.
Why here
Search hits can be modified at fetch with new fields added. Fetch sub phases run the on the data node so additional features used by the model can be extracted from Lucene.
Configuration
There isn't a direct way of configuring FetchSubPhases so I have commandeered SearchExtSpec for the purpose. The ext spec is accessible via the SearchContext passed to the fetch sub phase. Parsed here SearchExtSpecs come under the "ext" field forcing this rather clunky nested config upon us:
The usual config options apply.
Modifying the Search Hit
The goal is to append a field to each search hit with the inference result. I see 2 options for doing so:
DocumentField
. The new field will appear under thefields
section of the search hit as if it had been asked for in the search request via docvalue_fieldsI've opted for the 2nd choice as modifying the source seems a little underhand. Again this is awkward putting the result where we would expect doc value fields, depending on the outcome of #49028 the future may offer another way to add fields to the search hit.
The Problem
The
InferencePhase
class has access to the ModelLoadingService which neatly deals with the model caching problem but there is still a blocking call to load the model (which may or not be cached) the first timeInferencePhase.hitsExecute(SearchContext, SearchHit[])
is called.Wish List
Why Here (Reprise)
Executing locally on the data node has the advantage of being close to any shard level features we want extract and use in inference. But it now occurs to me that those features could be extracted in a fetch sub phase and returned with the hit. Inference would then run on the coordinating node and the blocking call to load the model could be dropped.
This PR is raised against the feature branch
feature/search-inference