Implement prompt/instruction templates #2439

tomaarsen · 2024-01-23T08:55:11Z

Hello!

Context

This issue describes a feature that I am planning to be included in a release before v3, or alternatively, in v3 of Sentence Transformers.

Details

Many recent works, e.g. Wang et al., 2024, Li & Li, 2023, Xiao et al., 2023, and many more, use instructions/prompts to improve their model performance and instruct the models on the specific task at hand.

Ideally, Sentence Transformers should support this more easily by allowing prompt/instruction templates to be stored in the model configuration. For example, we could include the following two options in the configuration (e.g. config_sentence_transformers.json):

{
    ...
    "prompts": {
        "classification": "Classify the following text:",
        "retrieval": "Retrieve semantically similar text:",
        "clustering": "Identify the topic or theme based on the text:",
    },
    "default_prompt_name": "classification",
}

And then the SentenceTransformers.encode method would also support prompt and prompt_name arguments:

# Using a custom prompt
embeddings = model.encode(texts, prompt="Identify the topics:")

# Using a prompt from the config
embeddings = model.encode(texts, prompt_name="clustering")

# Using the default config, if one is defined
embeddings = model.encode(texts)

I am still very unsure about the names of all of these arguments - I think they're not amazing. Additionally, I'm considering whether the prompt should include {}, which will be filled with prompt.format(text). This would allow "Classify this text: {}. That was all." or something, but then the end of the text will be cut off in the case of truncation, which is not great.

I'm definitely open to suggestions or ideas here!

cc @bwanglzu @ir2718 @johneckberg @aamir-s18 as I know you're interested in my TODO list.
cc @intfloat

Tom Aarsen

The text was updated successfully, but these errors were encountered:

arbi-dev · 2024-01-31T16:15:39Z

This would be an important feature, as currently users of these models either miss out on the instruction features or have to use their own template to concatenate instruction+query to take advantage of it.

For best results probably best to stick as closely as possible to the format used by the relevant model during training (including punctuation etc). E.g. BGE and Instructor seam to prepend the instruction.

tomaarsen · 2024-01-31T17:24:13Z

One option is to allow users to specify prompts with {} in them, e.g. "Please embed the sentence {} into a short text.". Model authors can specify their own prompts into their models. That way, model authors can ensure that the prompts always correspond to whatever was using during training.

Tom Aarsen

tomaarsen · 2024-02-06T14:53:04Z

We may also want to include some configuration options of whether the instruction must be included in the pooling output? For example, for INSTRUCTOR, these instructions are removed via attention masking when pooling.

Tom Aarsen

ShengYun-Peng · 2024-09-16T20:38:24Z

Hi @tomaarsen, thanks for adding this new prompt feature in the library! I'm curious if there's a way to use prompt along with the evaluator. Currently, I only find examples by passing the prompt_name or prompt to model.encode, but all evaluators don't seem to take any prompt arguments, thus prohibiting evaluation on a full test set with the required prompts.

tomaarsen mentioned this issue Jan 31, 2024

Query/Document/etc. templates #2362

Closed

tomaarsen mentioned this issue Feb 9, 2024

[feat] Add prompt templates #2477

Merged

3 tasks

tomaarsen closed this as completed in #2477 Feb 21, 2024

michaelfeil mentioned this issue Apr 6, 2024

support embedding with "instructions" michaelfeil/infinity#34

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement prompt/instruction templates #2439

Implement prompt/instruction templates #2439

tomaarsen commented Jan 23, 2024 •

edited

Loading

arbi-dev commented Jan 31, 2024

tomaarsen commented Jan 31, 2024

tomaarsen commented Feb 6, 2024

ShengYun-Peng commented Sep 16, 2024

Implement prompt/instruction templates #2439

Implement prompt/instruction templates #2439

Comments

tomaarsen commented Jan 23, 2024 • edited Loading

Context

Details

arbi-dev commented Jan 31, 2024

tomaarsen commented Jan 31, 2024

tomaarsen commented Feb 6, 2024

ShengYun-Peng commented Sep 16, 2024

tomaarsen commented Jan 23, 2024 •

edited

Loading