-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load data pipeline supports read config #70
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
☂️ Python Coverage
Overall Coverage
New FilesNo new covered files... Modified FilesNo covered modified files...
|
moria97
approved these changes
Jun 19, 2024
moria97
pushed a commit
that referenced
this pull request
Jun 20, 2024
wwxxzz
added a commit
that referenced
this pull request
Jun 24, 2024
* Add fast bm25 * Update * Fix bug * Fix bm25 bug * Fix bug * Refine code * Update multi-process * Add API to support upload local files (#67) * support upload file via API * add Readme for upload API * refactor query api * modify load_knowledge with session_config * use tempfile.mkdtemp() to store upload files * add docker image timezone for China (#68) * add image zone for China * remove unused ENV --------- Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com> Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * load data pipeline supports read config (#70) * Add gpu docker image timezone for China (#74) * Add fast bm25 (#66) * Add fast bm25 * Fix bm25 bug * Fix bug * Fix test * Update dockerfile * Fix bug * Update * Update docker file * Fix empty file bug * Fix local index error * Fix lint * Decouple gradio and backend * Add ui build * Add gunicorn * Fix gunicorn * Update nginx * add nginx image * Fix deployment issue * Fix upload --------- Co-authored-by: 筱文 <zxw320697@alibaba-inc.com> Co-authored-by: paradiseHIT <paradiseHIT@gmail.com> Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com>
moria97
added a commit
that referenced
this pull request
Aug 5, 2024
* Bugfix: a case that files' encodings can not be detected by chardet (#61) * Bugfix: connection error for longtime upload tasks (#62) * Fix connection error for longtime job * fix testcase bugs * support num workers for embedding model * Refactor query api and add dataframe UI * Refactor query api * Remove embedding workers * Add file: file_utils.py (#63) * Fix connection error for longtime job * fix testcase bugs * support num workers for embedding model * Refactor query api and add dataframe UI * Refactor query api * Remove embedding workers * Add file_utils --------- Co-authored-by: Yue Fei <59813791+moria97@users.noreply.github.com> * Remove local storage and enable Elasticsearch hybrid query mode (#60) * Add gpu dockerfile * Fix bug * Fix gb2312 * Update embedding batch size * Set default embedding and llm model * Update docker tag * Fix hologres check * Update registry * Fix bug * Fix tests * Add queue * Update batch size * Add async interface * Fix index conflict * Add change index parameter for FAISS * Fix batch size * Update * Modify async upload to sync (#64) * Modify async upload to sync * fix failed test * Fix faiss_path not effective in retrieval (#65) * Add API to support upload local files (#67) * support upload file via API * add Readme for upload API * refactor query api * modify load_knowledge with session_config * use tempfile.mkdtemp() to store upload files * add docker image timezone for China (#68) * add image zone for China * remove unused ENV --------- Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com> Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * load data pipeline supports read config (#70) * Add gpu docker image timezone for China (#74) * Add fast bm25 (#66) * Add fast bm25 * Fix bm25 bug * Fix bug * Fix test * Update readme and configuration (#77) * fix demo.toml typo, and add comments for settings.toml for embedding * update readme, add load data * Update docker.yml * Enable multiple workers to improve perf (#75) * Add fast bm25 * Update * Fix bug * Fix bm25 bug * Fix bug * Refine code * Update multi-process * Add API to support upload local files (#67) * support upload file via API * add Readme for upload API * refactor query api * modify load_knowledge with session_config * use tempfile.mkdtemp() to store upload files * add docker image timezone for China (#68) * add image zone for China * remove unused ENV --------- Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com> Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * load data pipeline supports read config (#70) * Add gpu docker image timezone for China (#74) * Add fast bm25 (#66) * Add fast bm25 * Fix bm25 bug * Fix bug * Fix test * Update dockerfile * Fix bug * Update * Update docker file * Fix empty file bug * Fix local index error * Fix lint * Decouple gradio and backend * Add ui build * Add gunicorn * Fix gunicorn * Update nginx * add nginx image * Fix deployment issue * Fix upload --------- Co-authored-by: 筱文 <zxw320697@alibaba-inc.com> Co-authored-by: paradiseHIT <paradiseHIT@gmail.com> Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com> * Add guides for env and docker (#81) * Add guides for env * add guides for docker build * Add README * Add config guide cn&en (#82) * add es setting * add es setting * add elasticsearch test * add es test * add and modify es_tokenizer test * add and modify es_tokenizer test * modify test_as_tokenizer * add skipif * fix test linter fails * fix lint problem * update test_as_analyzer * add config_guide * add navigation into readme * Add doc reference for rag query (#84) * Support evaluation for generated and open datasets (#83) * Refactor evaluation module * add UI: eval_tab * support eval UI * tmp eval * remove eval web * Support evaluation * fix pytest * Add OpenDataSet class --------- Co-authored-by: ranxia <chenanyu.cay@alibaba-inc.com> * Fix oss url for miracl dataset (#86) * fix ui es upload (#85) * Fix eas LLM (#88) * Milvus support sparse search (#87) * Upload multiple files in single API call (#89) * Milvus support sparse search * aload fix * Upload multiple files in one api call * Remove notebooks * Fix tests * Fix http timeout issue * Add client default timeout limitation and support UI interactive (#90) * Add client default timeout limitation and support UI interactive * support interactivate for vectordb type * Fix ui issue (#91) * Fix deps and add gpu ci tests (#92) * Fix deps and add gpu ci tests * Don't send report in 2nd pipeline * Fix empty response for empty knowledge base (#93) * Fix empty response for empty knowledge base * Add constant for empty response message * Fix dup nodes (#94) * Add error handling (#96) * Add error handling * Add upload error msg * fix data_loader (#95) * fix data_loader * fix data_loder * fix data_loader * fix data_loader * Set proper log levels (#98) * Adjust config instruction and add es instruction (#99) * add es setting * add es setting * add elasticsearch test * add es test * add and modify es_tokenizer test * add and modify es_tokenizer test * modify test_as_tokenizer * add skipif * fix test linter fails * fix lint problem * update test_as_analyzer * add config_guide * add navigation into readme * adjust config guide and add es instruction * Log stacktrace for failed requests (#100) * Load milvus collection by default (#101) * Log stacktrace for failed requests * Load milvus collection by default * Rename & Relocate figures in md (#102) * add es setting * add es setting * add elasticsearch test * add es test * add and modify es_tokenizer test * add and modify es_tokenizer test * modify test_as_tokenizer * add skipif * fix test linter fails * fix lint problem * update test_as_analyzer * add config_guide * add navigation into readme * adjust config guide and add es instruction * modify md figures * minor modification * change md path and name * 针对windows平台修改docker启动命令 (#104) * 针对windows平台修改docker启动命令 * 针对windows平台修改docker启动命令 * 针对windows平台修改docker启动命令 * make format * make format, nothing changed * download models from oss automatically (#97) * download models from oss automatically * download models from oss automatically * download models from oss automatically * download models from oss automatically * download models from modelscope * download models from modelscope * fix readme * Fix bug in downloading models (#106) * Fix bug * Fix log * Fix download * Add markdown reader (#105) * fix pdf reader (#107) Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * Personal/ranxia/pdf table summary fix (#109) * fix pdf reader * fix pdf reader table summary --------- Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * FiAddage number to file_name (#110) * Support stream response for LLM (PaiEAS && DashScope) (#112) * Support stream response for LLM (PaiEAS && DashScope) * Add PaiEas LLM old file * Add image node processor (#114) * Fit image in response * Add image insert * Fix llm max-token * Fix bug (#115) * Fix bugs for chinese escaped string in API header (#117) * Fix bidi version (#119) * Add fix version * Update poetry.lock * Update streaming response to body field use server sent events (#120) * Fix streaming * Fix llm and vector query * Address comment * Remove extra print * Support simple-weighted-reranker and similarity-threshold (#116) * Support nomalized cosine_sim score for different vectorDB * Support simple-weighted-reranker and similarity-threshold * [Todo] Support ES hybrid search * Support Milvus * fix path * fix open dataset * Fix url for du-retrieval dataset * Restore setting * Fix reviews * Apply node_id for weighted_reranker * jsonl reader (#124) * jsonl reader * jsonl reader * Support function_calling with booking demo tools (#122) * Add booking system demo for function_calling * Support customized function calling tools * Add testcase for agent and llm * Fix test * Fix async test * Add readme for function calling * Add readme for function calling * Remove ref figs * Add nodes enhancement by raptor (#111) * add raptor * add raptor ui support * fix logger bug * add node_enhancement class and modify test * fix node_enhancement setting bug * lint adjustment * poetry lock * fix poetry.lock * fix poetry issues * add a param * add token calculation for Chinese and adjust context_window * update tokenization_qwen * update file_path * merge feature and update poetry.lock * exclude pytest since no vocab file in the test env * exclude qwen.tiktoken * delete assert * Add weather tool (#125) * weather okgit add .! * fix bug * space bug --------- Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * Don't use parallel when data size is big (#108) * Add opensearch (#127) * Add open search. Not tested * Fix * Fix config * update docker's readme (#126) * update docker's readme * change network back * change network back * change network back * Create ci.yml (#131) * Update CI & PR pipelines (#132) * Update CI * Fix ci * Fix a few ui bugs (#133) * Support RDS postgres vector store (#134) * support rds postgers for store engine * Format * support table * Make format --------- Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * Fix minor bugs (#135) * Fix bug * Fix index bug * Updaet password field * Add pre-commit * Remove upload button * Refine upload * Fix pg connection string * Fix empty response for score_threshold (#136) * Fix empty response for score_threshold * Modify empty response info * Modify empty response info --------- Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * fix table_reader in pdf_reader (#128) * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * add "enable_ocr" and "enable_table_summary" (#138) * add "enable_ocr" and "enable_table_summary" * add "enable_ocr" and "enable_table_summary" * add "enable_ocr" and "enable_table_summary" * Add release pipeline and fix some bugs (#137) * Fix bug * Add release pipeline * Update * Update * Fix bug * Fix login * Fix empty tag * Update * Fix ui issue * Add base version tag * Fix specific version * Use pg hybrid retrieval directly * Fix image tag * Fix llm config (#139) * Fix toml merge bug (#142) * Fix configuration conflict (#143) * Fix merge bug * Fix version conflict for config file * Resolve snapshot merge conflict * Fix space outage in github runner (#144) * Fix merge bug * Fix version conflict for config file * Resolve snapshot merge conflict * Update yaml --------- Co-authored-by: Ceceliachenen <162673161+Ceceliachenen@users.noreply.github.com> Co-authored-by: wwxxzz <zxw320697@alibaba-inc.com> Co-authored-by: paradiseHIT <paradiseHIT@gmail.com> Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com> Co-authored-by: aero-xi <129151855+aero-xi@users.noreply.github.com> Co-authored-by: ranxia <chenanyu.cay@alibaba-inc.com> Co-authored-by: aero-xi <chuyu.cx@alibaba-inc.com> Co-authored-by: CharlieKoo <81978191+CharlieKoo@users.noreply.github.com> Co-authored-by: zhangdingchu <80106639+zhangdingchu@users.noreply.github.com> Co-authored-by: zt2645802240 <47960912+zt2645802240@users.noreply.github.com>
moria97
added a commit
that referenced
this pull request
Aug 5, 2024
* Bugfix: a case that files' encodings can not be detected by chardet (#61) * Bugfix: connection error for longtime upload tasks (#62) * Fix connection error for longtime job * fix testcase bugs * support num workers for embedding model * Refactor query api and add dataframe UI * Refactor query api * Remove embedding workers * Add file: file_utils.py (#63) * Fix connection error for longtime job * fix testcase bugs * support num workers for embedding model * Refactor query api and add dataframe UI * Refactor query api * Remove embedding workers * Add file_utils --------- * Remove local storage and enable Elasticsearch hybrid query mode (#60) * Add gpu dockerfile * Fix bug * Fix gb2312 * Update embedding batch size * Set default embedding and llm model * Update docker tag * Fix hologres check * Update registry * Fix bug * Fix tests * Add queue * Update batch size * Add async interface * Fix index conflict * Add change index parameter for FAISS * Fix batch size * Update * Modify async upload to sync (#64) * Modify async upload to sync * fix failed test * Fix faiss_path not effective in retrieval (#65) * Add API to support upload local files (#67) * support upload file via API * add Readme for upload API * refactor query api * modify load_knowledge with session_config * use tempfile.mkdtemp() to store upload files * add docker image timezone for China (#68) * add image zone for China * remove unused ENV --------- * load data pipeline supports read config (#70) * Add gpu docker image timezone for China (#74) * Add fast bm25 (#66) * Add fast bm25 * Fix bm25 bug * Fix bug * Fix test * Update readme and configuration (#77) * fix demo.toml typo, and add comments for settings.toml for embedding * update readme, add load data * Update docker.yml * Enable multiple workers to improve perf (#75) * Add fast bm25 * Update * Fix bug * Fix bm25 bug * Fix bug * Refine code * Update multi-process * Add API to support upload local files (#67) * support upload file via API * add Readme for upload API * refactor query api * modify load_knowledge with session_config * use tempfile.mkdtemp() to store upload files * add docker image timezone for China (#68) * add image zone for China * remove unused ENV --------- * load data pipeline supports read config (#70) * Add gpu docker image timezone for China (#74) * Add fast bm25 (#66) * Add fast bm25 * Fix bm25 bug * Fix bug * Fix test * Update dockerfile * Fix bug * Update * Update docker file * Fix empty file bug * Fix local index error * Fix lint * Decouple gradio and backend * Add ui build * Add gunicorn * Fix gunicorn * Update nginx * add nginx image * Fix deployment issue * Fix upload --------- * Add guides for env and docker (#81) * Add guides for env * add guides for docker build * Add README * Add config guide cn&en (#82) * add es setting * add es setting * add elasticsearch test * add es test * add and modify es_tokenizer test * add and modify es_tokenizer test * modify test_as_tokenizer * add skipif * fix test linter fails * fix lint problem * update test_as_analyzer * add config_guide * add navigation into readme * Add doc reference for rag query (#84) * Support evaluation for generated and open datasets (#83) * Refactor evaluation module * add UI: eval_tab * support eval UI * tmp eval * remove eval web * Support evaluation * fix pytest * Add OpenDataSet class --------- * Fix oss url for miracl dataset (#86) * fix ui es upload (#85) * Fix eas LLM (#88) * Milvus support sparse search (#87) * Upload multiple files in single API call (#89) * Milvus support sparse search * aload fix * Upload multiple files in one api call * Remove notebooks * Fix tests * Fix http timeout issue * Add client default timeout limitation and support UI interactive (#90) * Add client default timeout limitation and support UI interactive * support interactivate for vectordb type * Fix ui issue (#91) * Fix deps and add gpu ci tests (#92) * Fix deps and add gpu ci tests * Don't send report in 2nd pipeline * Fix empty response for empty knowledge base (#93) * Fix empty response for empty knowledge base * Add constant for empty response message * Fix dup nodes (#94) * Add error handling (#96) * Add error handling * Add upload error msg * fix data_loader (#95) * fix data_loader * fix data_loder * fix data_loader * fix data_loader * Set proper log levels (#98) * Adjust config instruction and add es instruction (#99) * add es setting * add es setting * add elasticsearch test * add es test * add and modify es_tokenizer test * add and modify es_tokenizer test * modify test_as_tokenizer * add skipif * fix test linter fails * fix lint problem * update test_as_analyzer * add config_guide * add navigation into readme * adjust config guide and add es instruction * Log stacktrace for failed requests (#100) * Load milvus collection by default (#101) * Log stacktrace for failed requests * Load milvus collection by default * Rename & Relocate figures in md (#102) * add es setting * add es setting * add elasticsearch test * add es test * add and modify es_tokenizer test * add and modify es_tokenizer test * modify test_as_tokenizer * add skipif * fix test linter fails * fix lint problem * update test_as_analyzer * add config_guide * add navigation into readme * adjust config guide and add es instruction * modify md figures * minor modification * change md path and name * 针对windows平台修改docker启动命令 (#104) * 针对windows平台修改docker启动命令 * 针对windows平台修改docker启动命令 * 针对windows平台修改docker启动命令 * make format * make format, nothing changed * download models from oss automatically (#97) * download models from oss automatically * download models from oss automatically * download models from oss automatically * download models from oss automatically * download models from modelscope * download models from modelscope * fix readme * Fix bug in downloading models (#106) * Fix bug * Fix log * Fix download * Add markdown reader (#105) * fix pdf reader (#107) * Personal/ranxia/pdf table summary fix (#109) * fix pdf reader * fix pdf reader table summary --------- * FiAddage number to file_name (#110) * Support stream response for LLM (PaiEAS && DashScope) (#112) * Support stream response for LLM (PaiEAS && DashScope) * Add PaiEas LLM old file * Add image node processor (#114) * Fit image in response * Add image insert * Fix llm max-token * Fix bug (#115) * Fix bugs for chinese escaped string in API header (#117) * Fix bidi version (#119) * Add fix version * Update poetry.lock * Update streaming response to body field use server sent events (#120) * Fix streaming * Fix llm and vector query * Address comment * Remove extra print * Support simple-weighted-reranker and similarity-threshold (#116) * Support nomalized cosine_sim score for different vectorDB * Support simple-weighted-reranker and similarity-threshold * [Todo] Support ES hybrid search * Support Milvus * fix path * fix open dataset * Fix url for du-retrieval dataset * Restore setting * Fix reviews * Apply node_id for weighted_reranker * jsonl reader (#124) * jsonl reader * jsonl reader * Support function_calling with booking demo tools (#122) * Add booking system demo for function_calling * Support customized function calling tools * Add testcase for agent and llm * Fix test * Fix async test * Add readme for function calling * Add readme for function calling * Remove ref figs * Add nodes enhancement by raptor (#111) * add raptor * add raptor ui support * fix logger bug * add node_enhancement class and modify test * fix node_enhancement setting bug * lint adjustment * poetry lock * fix poetry.lock * fix poetry issues * add a param * add token calculation for Chinese and adjust context_window * update tokenization_qwen * update file_path * merge feature and update poetry.lock * exclude pytest since no vocab file in the test env * exclude qwen.tiktoken * delete assert * Add weather tool (#125) * weather okgit add .! * fix bug * space bug --------- * Don't use parallel when data size is big (#108) * Add opensearch (#127) * Add open search. Not tested * Fix * Fix config * update docker's readme (#126) * update docker's readme * change network back * change network back * change network back * Create ci.yml (#131) * Update CI & PR pipelines (#132) * Update CI * Fix ci * Fix a few ui bugs (#133) * Support RDS postgres vector store (#134) * support rds postgers for store engine * Format * support table * Make format --------- * Fix minor bugs (#135) * Fix bug * Fix index bug * Updaet password field * Add pre-commit * Remove upload button * Refine upload * Fix pg connection string * Fix empty response for score_threshold (#136) * Fix empty response for score_threshold * Modify empty response info * Modify empty response info --------- * fix table_reader in pdf_reader (#128) * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * add "enable_ocr" and "enable_table_summary" (#138) * add "enable_ocr" and "enable_table_summary" * add "enable_ocr" and "enable_table_summary" * add "enable_ocr" and "enable_table_summary" * Add release pipeline and fix some bugs (#137) * Fix bug * Add release pipeline * Update * Update * Fix bug * Fix login * Fix empty tag * Update * Fix ui issue * Add base version tag * Fix specific version * Use pg hybrid retrieval directly * Fix image tag * Fix llm config (#139) * Fix toml merge bug (#142) * Fix configuration conflict (#143) * Fix merge bug * Fix version conflict for config file * Resolve snapshot merge conflict * Fix space outage in github runner (#144) * Fix merge bug * Fix version conflict for config file * Resolve snapshot merge conflict * Update yaml --------- Co-authored-by: Ceceliachenen <162673161+Ceceliachenen@users.noreply.github.com> Co-authored-by: wwxxzz <zxw320697@alibaba-inc.com> Co-authored-by: paradiseHIT <paradiseHIT@gmail.com> Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com> Co-authored-by: aero-xi <129151855+aero-xi@users.noreply.github.com> Co-authored-by: ranxia <chenanyu.cay@alibaba-inc.com> Co-authored-by: aero-xi <chuyu.cx@alibaba-inc.com> Co-authored-by: CharlieKoo <81978191+CharlieKoo@users.noreply.github.com> Co-authored-by: zhangdingchu <80106639+zhangdingchu@users.noreply.github.com> Co-authored-by: zt2645802240 <47960912+zt2645802240@users.noreply.github.com>
moria97
added a commit
that referenced
this pull request
Aug 5, 2024
* Bugfix: a case that files' encodings can not be detected by chardet (#61) * Bugfix: connection error for longtime upload tasks (#62) * Fix connection error for longtime job * fix testcase bugs * support num workers for embedding model * Refactor query api and add dataframe UI * Refactor query api * Remove embedding workers * Add file: file_utils.py (#63) * Fix connection error for longtime job * fix testcase bugs * support num workers for embedding model * Refactor query api and add dataframe UI * Refactor query api * Remove embedding workers * Add file_utils --------- Co-authored-by: Yue Fei <59813791+moria97@users.noreply.github.com> * Remove local storage and enable Elasticsearch hybrid query mode (#60) * Add gpu dockerfile * Fix bug * Fix gb2312 * Update embedding batch size * Set default embedding and llm model * Update docker tag * Fix hologres check * Update registry * Fix bug * Fix tests * Add queue * Update batch size * Add async interface * Fix index conflict * Add change index parameter for FAISS * Fix batch size * Update * Modify async upload to sync (#64) * Modify async upload to sync * fix failed test * Fix faiss_path not effective in retrieval (#65) * Add API to support upload local files (#67) * support upload file via API * add Readme for upload API * refactor query api * modify load_knowledge with session_config * use tempfile.mkdtemp() to store upload files * add docker image timezone for China (#68) * add image zone for China * remove unused ENV --------- Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com> Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * load data pipeline supports read config (#70) * Add gpu docker image timezone for China (#74) * Add fast bm25 (#66) * Add fast bm25 * Fix bm25 bug * Fix bug * Fix test * Update readme and configuration (#77) * fix demo.toml typo, and add comments for settings.toml for embedding * update readme, add load data * Update docker.yml * Enable multiple workers to improve perf (#75) * Add fast bm25 * Update * Fix bug * Fix bm25 bug * Fix bug * Refine code * Update multi-process * Add API to support upload local files (#67) * support upload file via API * add Readme for upload API * refactor query api * modify load_knowledge with session_config * use tempfile.mkdtemp() to store upload files * add docker image timezone for China (#68) * add image zone for China * remove unused ENV --------- Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com> Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * load data pipeline supports read config (#70) * Add gpu docker image timezone for China (#74) * Add fast bm25 (#66) * Add fast bm25 * Fix bm25 bug * Fix bug * Fix test * Update dockerfile * Fix bug * Update * Update docker file * Fix empty file bug * Fix local index error * Fix lint * Decouple gradio and backend * Add ui build * Add gunicorn * Fix gunicorn * Update nginx * add nginx image * Fix deployment issue * Fix upload --------- Co-authored-by: 筱文 <zxw320697@alibaba-inc.com> Co-authored-by: paradiseHIT <paradiseHIT@gmail.com> Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com> * Add guides for env and docker (#81) * Add guides for env * add guides for docker build * Add README * Add config guide cn&en (#82) * add es setting * add es setting * add elasticsearch test * add es test * add and modify es_tokenizer test * add and modify es_tokenizer test * modify test_as_tokenizer * add skipif * fix test linter fails * fix lint problem * update test_as_analyzer * add config_guide * add navigation into readme * Add doc reference for rag query (#84) * Support evaluation for generated and open datasets (#83) * Refactor evaluation module * add UI: eval_tab * support eval UI * tmp eval * remove eval web * Support evaluation * fix pytest * Add OpenDataSet class --------- Co-authored-by: ranxia <chenanyu.cay@alibaba-inc.com> * Fix oss url for miracl dataset (#86) * fix ui es upload (#85) * Fix eas LLM (#88) * Milvus support sparse search (#87) * Upload multiple files in single API call (#89) * Milvus support sparse search * aload fix * Upload multiple files in one api call * Remove notebooks * Fix tests * Fix http timeout issue * Add client default timeout limitation and support UI interactive (#90) * Add client default timeout limitation and support UI interactive * support interactivate for vectordb type * Fix ui issue (#91) * Fix deps and add gpu ci tests (#92) * Fix deps and add gpu ci tests * Don't send report in 2nd pipeline * Fix empty response for empty knowledge base (#93) * Fix empty response for empty knowledge base * Add constant for empty response message * Fix dup nodes (#94) * Add error handling (#96) * Add error handling * Add upload error msg * fix data_loader (#95) * fix data_loader * fix data_loder * fix data_loader * fix data_loader * Set proper log levels (#98) * Adjust config instruction and add es instruction (#99) * add es setting * add es setting * add elasticsearch test * add es test * add and modify es_tokenizer test * add and modify es_tokenizer test * modify test_as_tokenizer * add skipif * fix test linter fails * fix lint problem * update test_as_analyzer * add config_guide * add navigation into readme * adjust config guide and add es instruction * Log stacktrace for failed requests (#100) * Load milvus collection by default (#101) * Log stacktrace for failed requests * Load milvus collection by default * Rename & Relocate figures in md (#102) * add es setting * add es setting * add elasticsearch test * add es test * add and modify es_tokenizer test * add and modify es_tokenizer test * modify test_as_tokenizer * add skipif * fix test linter fails * fix lint problem * update test_as_analyzer * add config_guide * add navigation into readme * adjust config guide and add es instruction * modify md figures * minor modification * change md path and name * 针对windows平台修改docker启动命令 (#104) * 针对windows平台修改docker启动命令 * 针对windows平台修改docker启动命令 * 针对windows平台修改docker启动命令 * make format * make format, nothing changed * download models from oss automatically (#97) * download models from oss automatically * download models from oss automatically * download models from oss automatically * download models from oss automatically * download models from modelscope * download models from modelscope * fix readme * Fix bug in downloading models (#106) * Fix bug * Fix log * Fix download * Add markdown reader (#105) * fix pdf reader (#107) Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * Personal/ranxia/pdf table summary fix (#109) * fix pdf reader * fix pdf reader table summary --------- Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * FiAddage number to file_name (#110) * Support stream response for LLM (PaiEAS && DashScope) (#112) * Support stream response for LLM (PaiEAS && DashScope) * Add PaiEas LLM old file * Add image node processor (#114) * Fit image in response * Add image insert * Fix llm max-token * Fix bug (#115) * Fix bugs for chinese escaped string in API header (#117) * Fix bidi version (#119) * Add fix version * Update poetry.lock * Update streaming response to body field use server sent events (#120) * Fix streaming * Fix llm and vector query * Address comment * Remove extra print * Support simple-weighted-reranker and similarity-threshold (#116) * Support nomalized cosine_sim score for different vectorDB * Support simple-weighted-reranker and similarity-threshold * [Todo] Support ES hybrid search * Support Milvus * fix path * fix open dataset * Fix url for du-retrieval dataset * Restore setting * Fix reviews * Apply node_id for weighted_reranker * jsonl reader (#124) * jsonl reader * jsonl reader * Support function_calling with booking demo tools (#122) * Add booking system demo for function_calling * Support customized function calling tools * Add testcase for agent and llm * Fix test * Fix async test * Add readme for function calling * Add readme for function calling * Remove ref figs * Add nodes enhancement by raptor (#111) * add raptor * add raptor ui support * fix logger bug * add node_enhancement class and modify test * fix node_enhancement setting bug * lint adjustment * poetry lock * fix poetry.lock * fix poetry issues * add a param * add token calculation for Chinese and adjust context_window * update tokenization_qwen * update file_path * merge feature and update poetry.lock * exclude pytest since no vocab file in the test env * exclude qwen.tiktoken * delete assert * Add weather tool (#125) * weather okgit add .! * fix bug * space bug --------- Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * Don't use parallel when data size is big (#108) * Add opensearch (#127) * Add open search. Not tested * Fix * Fix config * update docker's readme (#126) * update docker's readme * change network back * change network back * change network back * Create ci.yml (#131) * Update CI & PR pipelines (#132) * Update CI * Fix ci * Fix a few ui bugs (#133) * Support RDS postgres vector store (#134) * support rds postgers for store engine * Format * support table * Make format --------- Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * Fix minor bugs (#135) * Fix bug * Fix index bug * Updaet password field * Add pre-commit * Remove upload button * Refine upload * Fix pg connection string * Fix empty response for score_threshold (#136) * Fix empty response for score_threshold * Modify empty response info * Modify empty response info --------- Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * fix table_reader in pdf_reader (#128) * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * add "enable_ocr" and "enable_table_summary" (#138) * add "enable_ocr" and "enable_table_summary" * add "enable_ocr" and "enable_table_summary" * add "enable_ocr" and "enable_table_summary" * Add release pipeline and fix some bugs (#137) * Fix bug * Add release pipeline * Update * Update * Fix bug * Fix login * Fix empty tag * Update * Fix ui issue * Add base version tag * Fix specific version * Use pg hybrid retrieval directly * Fix image tag * Fix llm config (#139) * Fix toml merge bug (#142) * Fix configuration conflict (#143) * Fix merge bug * Fix version conflict for config file * Resolve snapshot merge conflict * Fix space outage in github runner (#144) * Fix merge bug * Fix version conflict for config file * Resolve snapshot merge conflict * Update yaml --------- Co-authored-by: Ceceliachenen <162673161+Ceceliachenen@users.noreply.github.com> Co-authored-by: wwxxzz <zxw320697@alibaba-inc.com> Co-authored-by: paradiseHIT <paradiseHIT@gmail.com> Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com> Co-authored-by: aero-xi <129151855+aero-xi@users.noreply.github.com> Co-authored-by: ranxia <chenanyu.cay@alibaba-inc.com> Co-authored-by: aero-xi <chuyu.cx@alibaba-inc.com> Co-authored-by: CharlieKoo <81978191+CharlieKoo@users.noreply.github.com> Co-authored-by: zhangdingchu <80106639+zhangdingchu@users.noreply.github.com> Co-authored-by: zt2645802240 <47960912+zt2645802240@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
load data pipeline supports read config, mainly for generating embedding and build index