Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Documentation for new reranking feature #6359

Closed
4 tasks
HenryL27 opened this issue Feb 6, 2024 · 7 comments · Fixed by #6368
Closed
4 tasks

[DOC] Documentation for new reranking feature #6359

HenryL27 opened this issue Feb 6, 2024 · 7 comments · Fixed by #6368
Assignees
Labels
3 - Done Issue is done/complete v2.12.0

Comments

@HenryL27
Copy link
Contributor

HenryL27 commented Feb 6, 2024

What do you want to do?

  • Request a change to existing documentation
  • Add new documentation
  • Report a technical problem with the documentation
  • Other

Tell us about your request. Provide a summary of the request and all versions that are affected.
2.12 introduces a new feature for reranking. We need to add docs for this.

What other resources are available? Provide links to related issues, POCs, steps for testing, etc.

@martin-gaievski
Copy link
Member

martin-gaievski commented Feb 7, 2024

@HenryL27 Can you please add some info on re-ranker feature:
for new search processors can you please add all parameters with descriptions, required vs. optional, and the defaults

@tianjing-li
Copy link
Contributor

@HenryL27 thank you for tracking this - Could you also help outline how best to contribute when adding support for other models? Eg in my case I would like to add Cohere reranking with Opensearch

@HenryL27
Copy link
Contributor Author

HenryL27 commented Feb 7, 2024

Creating a rerank pipeline:

PUT /_search/pipeline/rerank_pipeline
{
  "response_processors": [
    {
      "rerank": {
        "ml_opensearch": {
          "model_id": id of TEXT_SIMILARITY model [required]
        },
        "context": {
          "document_fields": [ "title", "text_representation", ...]
        }
      }
    }
  ]
}

The rerank pipeline has two parameters: the context object and the rerank-type object.

The rerank-type object (keyed by name; here it's "ml-opensearch") provides the rerank processor static information needed across all reranking calls. For example, the id of the reranking model in ml-commons.

ml_opensearch rerank type parameters

field name required? description
model_id required unique id of a TEXT_SIMILARITY model (deployed via ml-commons)

Requires context.document_fields

The context object provides the rerank processor information necessary for generating reranking context at query time. For instance, "document_fields" specifies where to look in each search result for context to pass to the reranking model.

context parameters

field name dependent rerank types description
document_fields ml_opensearch nonempty list of document fields to rerank over

@HenryL27
Copy link
Contributor Author

HenryL27 commented Feb 7, 2024

Searching with a rerank pipeline:

POST /_search?search_pipeline=rerank_pipeline
{
  "query": {
    "match": {
      "text_representation": "Where is Albuquerque?"
     }
  },
  "ext": {
    "rerank": {
      "query_context": {
        "query_text": "Where is Albuquerque?"
      }
    }
  }
}

Reranking queries are pretty similar to ordinary queries, except that they have this additional "ext.rerank" section.
The query_context object must have exactly one of two parameters (mutually exclusive):
ext.rerank.query_context params

field name description
query_text the (natural language) text of the question you want to rerank over
query_text_path the json path to the text of the question you want to rerank over

when specifying query_text_path, use the fully specified path. For example, for the above query, you'd set
query_text_path = query.match.text_representation.query.

@hdhalter hdhalter added the 1 - Backlog - DEV Developer assigned to issue is responsible for creating PR. label Feb 7, 2024
@HenryL27
Copy link
Contributor Author

HenryL27 commented Feb 7, 2024

@tianjing-li model supporting is handled by the ml-commons plugin. If you have a model you'd like to add support for, please raise an issue and contribute it there. As I understand, this PR may have added support for the Cohere reranker already?

@hdhalter
Copy link
Contributor

hdhalter commented Feb 7, 2024

Hi @HenryL27 , thanks for submitting the doc issue. Can you please submit a documentation PR with this info? We need to have a PR up in order for it to be considered for the 2.12 release. Thanks!

@hdhalter hdhalter added 2 - In progress Issue/PR: The issue or PR is in progress. and removed 1 - Backlog - DEV Developer assigned to issue is responsible for creating PR. labels Feb 7, 2024
@hdhalter hdhalter added 3 - Done Issue is done/complete and removed 2 - In progress Issue/PR: The issue or PR is in progress. labels Feb 13, 2024
@tianjing-li
Copy link
Contributor

@HenryL27 thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Done Issue is done/complete v2.12.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants