-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Add prefix strings option to trained models #101978
Conversation
4bc50f5
to
c4d3327
Compare
Hi @davidkyle, I've created a changelog YAML for you. |
Pinging @elastic/ml-core (Team:ML) |
docs/reference/ml/trained-models/apis/put-trained-models.asciidoc
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but if we're going to have packaged models on GCS that use this feature then the new section will need to also go in this class:
Line 33 in 8d6ded3
public class ModelPackageConfig implements ToXContentObject, Writeable { |
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs LGTM, thanks!
@elasticmachine update branch |
merge conflict between base and head |
The initial state of the desired-balance allocator has `lastConvergedIndex` set to `-1`. This is not important to represent in the stats, so with this commit we map it to zero.
…er (elastic#101912) This commit addresses an issue in the passage formatter of the unified highlighter, where overlapping terms were not correctly expanded to be highlighted as a single object. The fix in this commit involves adjusting the expansion logic to consider the maximum end offset during the process, as matches are initially sorted by ascending start offset and then by ascending end offset.
It's in the title these two tests are failing near 100% of the time. for elastic#102000
Muting this one that keeps failing for elastic#102010
The title says the whole story
…ividual nodes (elastic#100230) (elastic#101599)" (elastic#102042) Reverting because the new action is not properly handled in a mixed cluster.
This fixes running caching issues for StandaloneRestIntegTest tasks as an aftermath of elastic#101923 This should fix elastic#102015
This reverts commit e9e948d.
Just like the other ones.
Part of the broader work covered in elastic#102030 Updates tests in: - HighlighterWithAnalyzersTests.java - TokenCountFieldMapperIntegrationIT - GeoIpDownloaderIT.java - DataStreamIT.java
We encountered a bad third-party S3 repository implementation which incorrectly rejects empty multipart uploads. This anomaly is detected by some repository analysis runs, but not by all of them. This commit adds a specific check for this incompatibility so that it can be reported reliably.
Adds the `?register_operation_count` parameter that allows to control the number of register operations separately from the number of regular blob operations.
* Add inference counts by NLP model to the machine learning usage stats. * Update docs/changelog/101915.yaml * Add inference_counts_by_model to yamlRestTest. * Strip leading dot from internal model IDs. * Add last access and task type to the stats by model. * Change stats_by_model for map to list * Simplify code. * Fix style
…2029) The title says the whole story part 2
Tests covered in this PR: * `org.elasticsearch.percolator.PercolatorQuerySearchIT`
Tests covered in this PR: * `org.elasticsearch.xpack.enrich.EnrichPolicyRunnerTests` * `org.elasticsearch.index.engine.frozen.FrozenIndexIT` * `org.elasticsearch.index.engine.frozen.FrozenIndexTests`
Closing as I've made a mess with git. #102089 is raised as a replacement |
Certain NLP models such as multilingual-e5-large require a prefix string to be applied to the input text. For asymmetric tasks such as information retrieval the prefix can be different when ingesting the data and when searching it. For example text embedding model can have a one prefix applied when the model is evaluated as part of an knn search and a different prefix when ingesting documents.
An example model configuration with prefix strings is:
Many files have been touched by this change but the bulk of the work is quite simple: define the configuration object an pass the prefix type (context) parameter down the inference calls.