-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structured Index of Documents #9411
Structured Index of Documents #9411
Conversation
Thanks for your contribution! |
代码里的PDF文件建议移除,给出下载路径或者网址即可 |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #9411 +/- ##
========================================
Coverage 52.81% 52.81%
========================================
Files 710 710
Lines 111238 111238
========================================
Hits 58749 58749
Misses 52489 52489 ☔ View full report in Codecov by Sentry. |
移除PDF文件,并给出下载链接和PDF下载脚本 |
|
||
```bash | ||
conda install nccl -c conda-forge | ||
conda install paddlepaddle-gpu==2.6.1 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的安装命令好像不是官网的安装命令
conda install paddlepaddle-gpu==2.6.2 cudatoolkit=11.7 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
paddlenlp==3.0.0b2 | ||
tqdm | ||
numpy | ||
paddleocr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
faiss-gpu和paddleocr需要给定特殊的版本和安装方式吗?
脚本`data/source/download.sh`可用于下载示例文档: | ||
```bash | ||
cd data/source | ||
bash download.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
download.sh内部调用jq,建议增加apt install jq -y 命令
--parse_model_name_or_path Qwen/Qwen2-72B-Instruct \ | ||
--summarize_model_name_or_path Qwen/Qwen2-72B-Instruct \ | ||
--encode_model_name_or_path BAAI/bge-large-en-v1.5 \ | ||
--log_dir .logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
运行这个命令需要安装fitz和frontend,安装之后会出现报错RuntimeError: Directory 'static/' does not exist
。这里检查应该是和paddleocr版本相关,建议适配paddleocr版本。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议优先适配最新版本,如果遇到困难,可使用paddleocr==2.7.3正常运行,
仅需修改环境安装指令即可,README.md内容已经测试通过。 |
修改 requirements.txt
修改 README.md
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Structured Index of Documents * 替换pdf为url * 更改下载方式 * 更新README * 更新README * 修改环境安装指令
PR types
New features
PR changes
Others
Description
A pipeline of Structured Index of documents