-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Add embedding_size to text embedding config #95176
Conversation
Documentation preview: |
Hi @davidkyle, I've created a changelog YAML for you. |
Pinging @elastic/ml-core (Team:ML) |
) { | ||
this.vocabularyConfig = Optional.ofNullable(vocabularyConfig) | ||
.orElse(new VocabularyConfig(InferenceIndexConstants.nativeDefinitionStore())); | ||
this.tokenization = tokenization == null ? Tokenization.createDefault() : tokenization; | ||
this.resultsField = resultsField; | ||
this.embeddingSize = embeddingSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is serialized as strictly non-negative there should be a check in the public constructor that a negative number hasn't been supplied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I've added that check to the ctor, it makes no sense for the size to be <= 0
commit: 1fbbdde
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Adds an optional field
embedding_size
to the text embedding config for NLP models. The field should be set at model creation and cannot be modified later. If definedembedding_size
should be used to set the number of dimensions for the dense_vector field mapping the embedding will be indexed in.