-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix faiss index batch_size bug on python3.7 and update es config for … #2965
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave a comment
@@ -15,6 +15,7 @@ | |||
parser.add_argument("--max_seq_len_query", default=64, type=int, help="The maximum total length of query after tokenization.") | |||
parser.add_argument("--max_seq_len_passage", default=256, type=int, help="The maximum total length of passage after tokenization.") | |||
parser.add_argument("--retriever_batch_size", default=16, type=int, help="The batch size of retriever to extract passage embedding for building ANN index.") | |||
parser.add_argument("--update_batch_size", default=100, type=int, help="The batch size of document_store to update passage embedding for building ANN index.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update_batch_size 这个变量是要控制什么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update_batch_size 和 retriever_batch_size 之间的关系是什么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这是faiss的update_embeddings的参数,它给的解释是:
:param batch_size: When working with large number of documents, batching can help reduce memory footprint.
retriever_batch_size是DensePassageRetriever的参数,它给的解释是
:param batch_size: Number of questions or passages to encode at once. In case of multiple gpus, this will be the total batch size.
https://github.com/PaddlePaddle/PaddleNLP/blob/develop/applications/experimental/pipelines/pipelines/nodes/retriever/dense.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如线下讨论,不适合暴露 update_batch_size 参数,直接修改 pipeline 默认值即可。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…pipelines
PR types
PR changes
Description