Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default truncation to second for text similarity #713

Merged
merged 2 commits into from
Aug 5, 2024

Conversation

davidkyle
Copy link
Member

@davidkyle davidkyle commented Jul 25, 2024

NLP models have 3 truncation settings: FIRST, SECOND and NONE

FIRST means truncate the first input. In most cases there is only 1 input (e.g for text embeddings) so this is a sensible default.
SECOND means truncate the second input. Task types with 2 inputs are extractive question answering where the question is one input and the context the other. Text Similarity takes has 2 inputs.
NONE means don't truncate and window the input.

For text similarity the first input is usually the shorter input, for example it might be the query text in a rerank operation. In this situation it is better to truncate the second input. This change makes that the default.

@davidkyle davidkyle added bug Something isn't working topic:NLP Issue or PR about NLP model support and eland_import_hub_model labels Jul 25, 2024
@davidkyle davidkyle requested a review from pquentin July 26, 2024 10:13
Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM. Just need to remove the extra print.

eland/ml/pytorch/transformers.py Outdated Show resolved Hide resolved
Co-authored-by: Quentin Pradet <quentin.pradet@gmail.com>
@davidkyle davidkyle merged commit fd8886d into elastic:main Aug 5, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working topic:NLP Issue or PR about NLP model support and eland_import_hub_model
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants