Fix faiss index batch_size bug on python3.7 and update es config for … #2965

w5688414 · 2022-08-04T08:27:43Z

…pipelines

PR types

Bug fixes

PR changes

Docs
APIs

Description

faiss index batch_size bug on python3.7
update es config
Fix the nltk download bug
Optimize Readme for inner dataset pipline

…pipelines

tianxin1860

Leave a comment

tianxin1860 · 2022-08-05T13:48:02Z

applications/experimental/pipelines/examples/question-answering/dense_qa_example.py

@@ -15,6 +15,7 @@
 parser.add_argument("--max_seq_len_query", default=64, type=int, help="The maximum total length of query after tokenization.")
 parser.add_argument("--max_seq_len_passage", default=256, type=int, help="The maximum total length of passage after tokenization.")
 parser.add_argument("--retriever_batch_size", default=16, type=int, help="The batch size of retriever to extract passage embedding for building ANN index.")
+parser.add_argument("--update_batch_size", default=100, type=int, help="The batch size of document_store to update passage embedding for building ANN index.")


update_batch_size 这个变量是要控制什么？

update_batch_size 和 retriever_batch_size 之间的关系是什么？

这是faiss的update_embeddings的参数,它给的解释是：

:param batch_size: When working with large number of documents, batching can help reduce memory footprint.

https://github.com/PaddlePaddle/PaddleNLP/blob/develop/applications/experimental/pipelines/pipelines/document_stores/faiss.py

retriever_batch_size是DensePassageRetriever的参数，它给的解释是

:param batch_size: Number of questions or passages to encode at once. In case of multiple gpus, this will be the total batch size.
https://github.com/PaddlePaddle/PaddleNLP/blob/develop/applications/experimental/pipelines/pipelines/nodes/retriever/dense.py

如线下讨论，不适合暴露 update_batch_size 参数，直接修改 pipeline 默认值即可。

tianxin1860

LGTM

Fix faiss index batch_size bug on python3.7 and update es config for …

16a7a94

…pipelines

w5688414 requested a review from tianxin1860 August 4, 2022 08:27

w5688414 self-assigned this Aug 4, 2022

w5688414 added bugfix pipelines labels Aug 4, 2022

Fix the nltk download bug and Add FAQ for mac support

df8bae6

tianxin1860 reviewed Aug 5, 2022

View reviewed changes

w5688414 added 2 commits August 9, 2022 17:04

Remove update_batch_size for fais

79061a6

Merge branch 'develop' into pip8

b3d902a

tianxin1860 approved these changes Aug 9, 2022

View reviewed changes

Merge branch 'develop' into pip8

0c2ea2f

w5688414 merged commit bb1729b into PaddlePaddle:develop Aug 9, 2022

w5688414 mentioned this pull request Aug 24, 2022

PaddleNLP 2.3.6 Release Note Candidate #3122

Closed

w5688414 deleted the pip8 branch June 7, 2023 09:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix faiss index batch_size bug on python3.7 and update es config for … #2965

Fix faiss index batch_size bug on python3.7 and update es config for … #2965

w5688414 commented Aug 4, 2022 •

edited

Loading

tianxin1860 left a comment

tianxin1860 Aug 5, 2022

tianxin1860 Aug 5, 2022

w5688414 Aug 6, 2022

tianxin1860 Aug 9, 2022

tianxin1860 left a comment

Fix faiss index batch_size bug on python3.7 and update es config for … #2965

Fix faiss index batch_size bug on python3.7 and update es config for … #2965

Conversation

w5688414 commented Aug 4, 2022 • edited Loading

PR types

PR changes

Description

tianxin1860 left a comment

Choose a reason for hiding this comment

tianxin1860 Aug 5, 2022

Choose a reason for hiding this comment

tianxin1860 Aug 5, 2022

Choose a reason for hiding this comment

w5688414 Aug 6, 2022

Choose a reason for hiding this comment

tianxin1860 Aug 9, 2022

Choose a reason for hiding this comment

tianxin1860 left a comment

Choose a reason for hiding this comment

w5688414 commented Aug 4, 2022 •

edited

Loading