load data pipeline supports read config #70

paradiseHIT · 2024-06-19T09:52:53Z

load data pipeline supports read config, mainly for generating embedding and build index

github-actions · 2024-06-19T09:56:34Z

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines	Covered	Coverage	Threshold	Status
3043	1823	60%	50%	🟢

New Files

No new covered files...

Modified Files

No covered modified files...

updated for commit: 64da2ca by action🐍

* Add fast bm25 * Update * Fix bug * Fix bm25 bug * Fix bug * Refine code * Update multi-process * Add API to support upload local files (#67) * support upload file via API * add Readme for upload API * refactor query api * modify load_knowledge with session_config * use tempfile.mkdtemp() to store upload files * add docker image timezone for China (#68) * add image zone for China * remove unused ENV --------- Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com> Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * load data pipeline supports read config (#70) * Add gpu docker image timezone for China (#74) * Add fast bm25 (#66) * Add fast bm25 * Fix bm25 bug * Fix bug * Fix test * Update dockerfile * Fix bug * Update * Update docker file * Fix empty file bug * Fix local index error * Fix lint * Decouple gradio and backend * Add ui build * Add gunicorn * Fix gunicorn * Update nginx * add nginx image * Fix deployment issue * Fix upload --------- Co-authored-by: 筱文 <zxw320697@alibaba-inc.com> Co-authored-by: paradiseHIT <paradiseHIT@gmail.com> Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com>

* Bugfix: a case that files' encodings can not be detected by chardet (#61) * Bugfix: connection error for longtime upload tasks (#62) * Fix connection error for longtime job * fix testcase bugs * support num workers for embedding model * Refactor query api and add dataframe UI * Refactor query api * Remove embedding workers * Add file: file_utils.py (#63) * Fix connection error for longtime job * fix testcase bugs * support num workers for embedding model * Refactor query api and add dataframe UI * Refactor query api * Remove embedding workers * Add file_utils --------- Co-authored-by: Yue Fei <59813791+moria97@users.noreply.github.com> * Remove local storage and enable Elasticsearch hybrid query mode (#60) * Add gpu dockerfile * Fix bug * Fix gb2312 * Update embedding batch size * Set default embedding and llm model * Update docker tag * Fix hologres check * Update registry * Fix bug * Fix tests * Add queue * Update batch size * Add async interface * Fix index conflict * Add change index parameter for FAISS * Fix batch size * Update * Modify async upload to sync (#64) * Modify async upload to sync * fix failed test * Fix faiss_path not effective in retrieval (#65) * Add API to support upload local files (#67) * support upload file via API * add Readme for upload API * refactor query api * modify load_knowledge with session_config * use tempfile.mkdtemp() to store upload files * add docker image timezone for China (#68) * add image zone for China * remove unused ENV --------- Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com> Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * load data pipeline supports read config (#70) * Add gpu docker image timezone for China (#74) * Add fast bm25 (#66) * Add fast bm25 * Fix bm25 bug * Fix bug * Fix test * Update readme and configuration (#77) * fix demo.toml typo, and add comments for settings.toml for embedding * update readme, add load data * Update docker.yml * Enable multiple workers to improve perf (#75) * Add fast bm25 * Update * Fix bug * Fix bm25 bug * Fix bug * Refine code * Update multi-process * Add API to support upload local files (#67) * support upload file via API * add Readme for upload API * refactor query api * modify load_knowledge with session_config * use tempfile.mkdtemp() to store upload files * add docker image timezone for China (#68) * add image zone for China * remove unused ENV --------- Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com> Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * load data pipeline supports read config (#70) * Add gpu docker image timezone for China (#74) * Add fast bm25 (#66) * Add fast bm25 * Fix bm25 bug * Fix bug * Fix test * Update dockerfile * Fix bug * Update * Update docker file * Fix empty file bug * Fix local index error * Fix lint * Decouple gradio and backend * Add ui build * Add gunicorn * Fix gunicorn * Update nginx * add nginx image * Fix deployment issue * Fix upload --------- Co-authored-by: 筱文 <zxw320697@alibaba-inc.com> Co-authored-by: paradiseHIT <paradiseHIT@gmail.com> Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com> * Add guides for env and docker (#81) * Add guides for env * add guides for docker build * Add README * Add config guide cn&en (#82) * add es setting * add es setting * add elasticsearch test * add es test * add and modify es_tokenizer test * add and modify es_tokenizer test * modify test_as_tokenizer * add skipif * fix test linter fails * fix lint problem * update test_as_analyzer * add config_guide * add navigation into readme * Add doc reference for rag query (#84) * Support evaluation for generated and open datasets (#83) * Refactor evaluation module * add UI: eval_tab * support eval UI * tmp eval * remove eval web * Support evaluation * fix pytest * Add OpenDataSet class --------- Co-authored-by: ranxia <chenanyu.cay@alibaba-inc.com> * Fix oss url for miracl dataset (#86) * fix ui es upload (#85) * Fix eas LLM (#88) * Milvus support sparse search (#87) * Upload multiple files in single API call (#89) * Milvus support sparse search * aload fix * Upload multiple files in one api call * Remove notebooks * Fix tests * Fix http timeout issue * Add client default timeout limitation and support UI interactive (#90) * Add client default timeout limitation and support UI interactive * support interactivate for vectordb type * Fix ui issue (#91) * Fix deps and add gpu ci tests (#92) * Fix deps and add gpu ci tests * Don't send report in 2nd pipeline * Fix empty response for empty knowledge base (#93) * Fix empty response for empty knowledge base * Add constant for empty response message * Fix dup nodes (#94) * Add error handling (#96) * Add error handling * Add upload error msg * fix data_loader (#95) * fix data_loader * fix data_loder * fix data_loader * fix data_loader * Set proper log levels (#98) * Adjust config instruction and add es instruction (#99) * add es setting * add es setting * add elasticsearch test * add es test * add and modify es_tokenizer test * add and modify es_tokenizer test * modify test_as_tokenizer * add skipif * fix test linter fails * fix lint problem * update test_as_analyzer * add config_guide * add navigation into readme * adjust config guide and add es instruction * Log stacktrace for failed requests (#100) * Load milvus collection by default (#101) * Log stacktrace for failed requests * Load milvus collection by default * Rename & Relocate figures in md (#102) * add es setting * add es setting * add elasticsearch test * add es test * add and modify es_tokenizer test * add and modify es_tokenizer test * modify test_as_tokenizer * add skipif * fix test linter fails * fix lint problem * update test_as_analyzer * add config_guide * add navigation into readme * adjust config guide and add es instruction * modify md figures * minor modification * change md path and name * 针对windows平台修改docker启动命令 (#104) * 针对windows平台修改docker启动命令 * 针对windows平台修改docker启动命令 * 针对windows平台修改docker启动命令 * make format * make format, nothing changed * download models from oss automatically (#97) * download models from oss automatically * download models from oss automatically * download models from oss automatically * download models from oss automatically * download models from modelscope * download models from modelscope * fix readme * Fix bug in downloading models (#106) * Fix bug * Fix log * Fix download * Add markdown reader (#105) * fix pdf reader (#107) Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * Personal/ranxia/pdf table summary fix (#109) * fix pdf reader * fix pdf reader table summary --------- Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * FiAddage number to file_name (#110) * Support stream response for LLM (PaiEAS && DashScope) (#112) * Support stream response for LLM (PaiEAS && DashScope) * Add PaiEas LLM old file * Add image node processor (#114) * Fit image in response * Add image insert * Fix llm max-token * Fix bug (#115) * Fix bugs for chinese escaped string in API header (#117) * Fix bidi version (#119) * Add fix version * Update poetry.lock * Update streaming response to body field use server sent events (#120) * Fix streaming * Fix llm and vector query * Address comment * Remove extra print * Support simple-weighted-reranker and similarity-threshold (#116) * Support nomalized cosine_sim score for different vectorDB * Support simple-weighted-reranker and similarity-threshold * [Todo] Support ES hybrid search * Support Milvus * fix path * fix open dataset * Fix url for du-retrieval dataset * Restore setting * Fix reviews * Apply node_id for weighted_reranker * jsonl reader (#124) * jsonl reader * jsonl reader * Support function_calling with booking demo tools (#122) * Add booking system demo for function_calling * Support customized function calling tools * Add testcase for agent and llm * Fix test * Fix async test * Add readme for function calling * Add readme for function calling * Remove ref figs * Add nodes enhancement by raptor (#111) * add raptor * add raptor ui support * fix logger bug * add node_enhancement class and modify test * fix node_enhancement setting bug * lint adjustment * poetry lock * fix poetry.lock * fix poetry issues * add a param * add token calculation for Chinese and adjust context_window * update tokenization_qwen * update file_path * merge feature and update poetry.lock * exclude pytest since no vocab file in the test env * exclude qwen.tiktoken * delete assert * Add weather tool (#125) * weather okgit add .! * fix bug * space bug --------- Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * Don't use parallel when data size is big (#108) * Add opensearch (#127) * Add open search. Not tested * Fix * Fix config * update docker's readme (#126) * update docker's readme * change network back * change network back * change network back * Create ci.yml (#131) * Update CI & PR pipelines (#132) * Update CI * Fix ci * Fix a few ui bugs (#133) * Support RDS postgres vector store (#134) * support rds postgers for store engine * Format * support table * Make format --------- Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * Fix minor bugs (#135) * Fix bug * Fix index bug * Updaet password field * Add pre-commit * Remove upload button * Refine upload * Fix pg connection string * Fix empty response for score_threshold (#136) * Fix empty response for score_threshold * Modify empty response info * Modify empty response info --------- Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * fix table_reader in pdf_reader (#128) * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * add "enable_ocr" and "enable_table_summary" (#138) * add "enable_ocr" and "enable_table_summary" * add "enable_ocr" and "enable_table_summary" * add "enable_ocr" and "enable_table_summary" * Add release pipeline and fix some bugs (#137) * Fix bug * Add release pipeline * Update * Update * Fix bug * Fix login * Fix empty tag * Update * Fix ui issue * Add base version tag * Fix specific version * Use pg hybrid retrieval directly * Fix image tag * Fix llm config (#139) * Fix toml merge bug (#142) * Fix configuration conflict (#143) * Fix merge bug * Fix version conflict for config file * Resolve snapshot merge conflict * Fix space outage in github runner (#144) * Fix merge bug * Fix version conflict for config file * Resolve snapshot merge conflict * Update yaml --------- Co-authored-by: Ceceliachenen <162673161+Ceceliachenen@users.noreply.github.com> Co-authored-by: wwxxzz <zxw320697@alibaba-inc.com> Co-authored-by: paradiseHIT <paradiseHIT@gmail.com> Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com> Co-authored-by: aero-xi <129151855+aero-xi@users.noreply.github.com> Co-authored-by: ranxia <chenanyu.cay@alibaba-inc.com> Co-authored-by: aero-xi <chuyu.cx@alibaba-inc.com> Co-authored-by: CharlieKoo <81978191+CharlieKoo@users.noreply.github.com> Co-authored-by: zhangdingchu <80106639+zhangdingchu@users.noreply.github.com> Co-authored-by: zt2645802240 <47960912+zt2645802240@users.noreply.github.com>

* Bugfix: a case that files' encodings can not be detected by chardet (#61) * Bugfix: connection error for longtime upload tasks (#62) * Fix connection error for longtime job * fix testcase bugs * support num workers for embedding model * Refactor query api and add dataframe UI * Refactor query api * Remove embedding workers * Add file: file_utils.py (#63) * Fix connection error for longtime job * fix testcase bugs * support num workers for embedding model * Refactor query api and add dataframe UI * Refactor query api * Remove embedding workers * Add file_utils --------- * Remove local storage and enable Elasticsearch hybrid query mode (#60) * Add gpu dockerfile * Fix bug * Fix gb2312 * Update embedding batch size * Set default embedding and llm model * Update docker tag * Fix hologres check * Update registry * Fix bug * Fix tests * Add queue * Update batch size * Add async interface * Fix index conflict * Add change index parameter for FAISS * Fix batch size * Update * Modify async upload to sync (#64) * Modify async upload to sync * fix failed test * Fix faiss_path not effective in retrieval (#65) * Add API to support upload local files (#67) * support upload file via API * add Readme for upload API * refactor query api * modify load_knowledge with session_config * use tempfile.mkdtemp() to store upload files * add docker image timezone for China (#68) * add image zone for China * remove unused ENV --------- * load data pipeline supports read config (#70) * Add gpu docker image timezone for China (#74) * Add fast bm25 (#66) * Add fast bm25 * Fix bm25 bug * Fix bug * Fix test * Update readme and configuration (#77) * fix demo.toml typo, and add comments for settings.toml for embedding * update readme, add load data * Update docker.yml * Enable multiple workers to improve perf (#75) * Add fast bm25 * Update * Fix bug * Fix bm25 bug * Fix bug * Refine code * Update multi-process * Add API to support upload local files (#67) * support upload file via API * add Readme for upload API * refactor query api * modify load_knowledge with session_config * use tempfile.mkdtemp() to store upload files * add docker image timezone for China (#68) * add image zone for China * remove unused ENV --------- * load data pipeline supports read config (#70) * Add gpu docker image timezone for China (#74) * Add fast bm25 (#66) * Add fast bm25 * Fix bm25 bug * Fix bug * Fix test * Update dockerfile * Fix bug * Update * Update docker file * Fix empty file bug * Fix local index error * Fix lint * Decouple gradio and backend * Add ui build * Add gunicorn * Fix gunicorn * Update nginx * add nginx image * Fix deployment issue * Fix upload --------- * Add guides for env and docker (#81) * Add guides for env * add guides for docker build * Add README * Add config guide cn&en (#82) * add es setting * add es setting * add elasticsearch test * add es test * add and modify es_tokenizer test * add and modify es_tokenizer test * modify test_as_tokenizer * add skipif * fix test linter fails * fix lint problem * update test_as_analyzer * add config_guide * add navigation into readme * Add doc reference for rag query (#84) * Support evaluation for generated and open datasets (#83) * Refactor evaluation module * add UI: eval_tab * support eval UI * tmp eval * remove eval web * Support evaluation * fix pytest * Add OpenDataSet class --------- * Fix oss url for miracl dataset (#86) * fix ui es upload (#85) * Fix eas LLM (#88) * Milvus support sparse search (#87) * Upload multiple files in single API call (#89) * Milvus support sparse search * aload fix * Upload multiple files in one api call * Remove notebooks * Fix tests * Fix http timeout issue * Add client default timeout limitation and support UI interactive (#90) * Add client default timeout limitation and support UI interactive * support interactivate for vectordb type * Fix ui issue (#91) * Fix deps and add gpu ci tests (#92) * Fix deps and add gpu ci tests * Don't send report in 2nd pipeline * Fix empty response for empty knowledge base (#93) * Fix empty response for empty knowledge base * Add constant for empty response message * Fix dup nodes (#94) * Add error handling (#96) * Add error handling * Add upload error msg * fix data_loader (#95) * fix data_loader * fix data_loder * fix data_loader * fix data_loader * Set proper log levels (#98) * Adjust config instruction and add es instruction (#99) * add es setting * add es setting * add elasticsearch test * add es test * add and modify es_tokenizer test * add and modify es_tokenizer test * modify test_as_tokenizer * add skipif * fix test linter fails * fix lint problem * update test_as_analyzer * add config_guide * add navigation into readme * adjust config guide and add es instruction * Log stacktrace for failed requests (#100) * Load milvus collection by default (#101) * Log stacktrace for failed requests * Load milvus collection by default * Rename & Relocate figures in md (#102) * add es setting * add es setting * add elasticsearch test * add es test * add and modify es_tokenizer test * add and modify es_tokenizer test * modify test_as_tokenizer * add skipif * fix test linter fails * fix lint problem * update test_as_analyzer * add config_guide * add navigation into readme * adjust config guide and add es instruction * modify md figures * minor modification * change md path and name * 针对windows平台修改docker启动命令 (#104) * 针对windows平台修改docker启动命令 * 针对windows平台修改docker启动命令 * 针对windows平台修改docker启动命令 * make format * make format, nothing changed * download models from oss automatically (#97) * download models from oss automatically * download models from oss automatically * download models from oss automatically * download models from oss automatically * download models from modelscope * download models from modelscope * fix readme * Fix bug in downloading models (#106) * Fix bug * Fix log * Fix download * Add markdown reader (#105) * fix pdf reader (#107) * Personal/ranxia/pdf table summary fix (#109) * fix pdf reader * fix pdf reader table summary --------- * FiAddage number to file_name (#110) * Support stream response for LLM (PaiEAS && DashScope) (#112) * Support stream response for LLM (PaiEAS && DashScope) * Add PaiEas LLM old file * Add image node processor (#114) * Fit image in response * Add image insert * Fix llm max-token * Fix bug (#115) * Fix bugs for chinese escaped string in API header (#117) * Fix bidi version (#119) * Add fix version * Update poetry.lock * Update streaming response to body field use server sent events (#120) * Fix streaming * Fix llm and vector query * Address comment * Remove extra print * Support simple-weighted-reranker and similarity-threshold (#116) * Support nomalized cosine_sim score for different vectorDB * Support simple-weighted-reranker and similarity-threshold * [Todo] Support ES hybrid search * Support Milvus * fix path * fix open dataset * Fix url for du-retrieval dataset * Restore setting * Fix reviews * Apply node_id for weighted_reranker * jsonl reader (#124) * jsonl reader * jsonl reader * Support function_calling with booking demo tools (#122) * Add booking system demo for function_calling * Support customized function calling tools * Add testcase for agent and llm * Fix test * Fix async test * Add readme for function calling * Add readme for function calling * Remove ref figs * Add nodes enhancement by raptor (#111) * add raptor * add raptor ui support * fix logger bug * add node_enhancement class and modify test * fix node_enhancement setting bug * lint adjustment * poetry lock * fix poetry.lock * fix poetry issues * add a param * add token calculation for Chinese and adjust context_window * update tokenization_qwen * update file_path * merge feature and update poetry.lock * exclude pytest since no vocab file in the test env * exclude qwen.tiktoken * delete assert * Add weather tool (#125) * weather okgit add .! * fix bug * space bug --------- * Don't use parallel when data size is big (#108) * Add opensearch (#127) * Add open search. Not tested * Fix * Fix config * update docker's readme (#126) * update docker's readme * change network back * change network back * change network back * Create ci.yml (#131) * Update CI & PR pipelines (#132) * Update CI * Fix ci * Fix a few ui bugs (#133) * Support RDS postgres vector store (#134) * support rds postgers for store engine * Format * support table * Make format --------- * Fix minor bugs (#135) * Fix bug * Fix index bug * Updaet password field * Add pre-commit * Remove upload button * Refine upload * Fix pg connection string * Fix empty response for score_threshold (#136) * Fix empty response for score_threshold * Modify empty response info * Modify empty response info --------- * fix table_reader in pdf_reader (#128) * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * add "enable_ocr" and "enable_table_summary" (#138) * add "enable_ocr" and "enable_table_summary" * add "enable_ocr" and "enable_table_summary" * add "enable_ocr" and "enable_table_summary" * Add release pipeline and fix some bugs (#137) * Fix bug * Add release pipeline * Update * Update * Fix bug * Fix login * Fix empty tag * Update * Fix ui issue * Add base version tag * Fix specific version * Use pg hybrid retrieval directly * Fix image tag * Fix llm config (#139) * Fix toml merge bug (#142) * Fix configuration conflict (#143) * Fix merge bug * Fix version conflict for config file * Resolve snapshot merge conflict * Fix space outage in github runner (#144) * Fix merge bug * Fix version conflict for config file * Resolve snapshot merge conflict * Update yaml --------- Co-authored-by: Ceceliachenen <162673161+Ceceliachenen@users.noreply.github.com> Co-authored-by: wwxxzz <zxw320697@alibaba-inc.com> Co-authored-by: paradiseHIT <paradiseHIT@gmail.com> Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com> Co-authored-by: aero-xi <129151855+aero-xi@users.noreply.github.com> Co-authored-by: ranxia <chenanyu.cay@alibaba-inc.com> Co-authored-by: aero-xi <chuyu.cx@alibaba-inc.com> Co-authored-by: CharlieKoo <81978191+CharlieKoo@users.noreply.github.com> Co-authored-by: zhangdingchu <80106639+zhangdingchu@users.noreply.github.com> Co-authored-by: zt2645802240 <47960912+zt2645802240@users.noreply.github.com>

* Bugfix: a case that files' encodings can not be detected by chardet (#61) * Bugfix: connection error for longtime upload tasks (#62) * Fix connection error for longtime job * fix testcase bugs * support num workers for embedding model * Refactor query api and add dataframe UI * Refactor query api * Remove embedding workers * Add file: file_utils.py (#63) * Fix connection error for longtime job * fix testcase bugs * support num workers for embedding model * Refactor query api and add dataframe UI * Refactor query api * Remove embedding workers * Add file_utils --------- Co-authored-by: Yue Fei <59813791+moria97@users.noreply.github.com> * Remove local storage and enable Elasticsearch hybrid query mode (#60) * Add gpu dockerfile * Fix bug * Fix gb2312 * Update embedding batch size * Set default embedding and llm model * Update docker tag * Fix hologres check * Update registry * Fix bug * Fix tests * Add queue * Update batch size * Add async interface * Fix index conflict * Add change index parameter for FAISS * Fix batch size * Update * Modify async upload to sync (#64) * Modify async upload to sync * fix failed test * Fix faiss_path not effective in retrieval (#65) * Add API to support upload local files (#67) * support upload file via API * add Readme for upload API * refactor query api * modify load_knowledge with session_config * use tempfile.mkdtemp() to store upload files * add docker image timezone for China (#68) * add image zone for China * remove unused ENV --------- Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com> Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * load data pipeline supports read config (#70) * Add gpu docker image timezone for China (#74) * Add fast bm25 (#66) * Add fast bm25 * Fix bm25 bug * Fix bug * Fix test * Update readme and configuration (#77) * fix demo.toml typo, and add comments for settings.toml for embedding * update readme, add load data * Update docker.yml * Enable multiple workers to improve perf (#75) * Add fast bm25 * Update * Fix bug * Fix bm25 bug * Fix bug * Refine code * Update multi-process * Add API to support upload local files (#67) * support upload file via API * add Readme for upload API * refactor query api * modify load_knowledge with session_config * use tempfile.mkdtemp() to store upload files * add docker image timezone for China (#68) * add image zone for China * remove unused ENV --------- Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com> Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * load data pipeline supports read config (#70) * Add gpu docker image timezone for China (#74) * Add fast bm25 (#66) * Add fast bm25 * Fix bm25 bug * Fix bug * Fix test * Update dockerfile * Fix bug * Update * Update docker file * Fix empty file bug * Fix local index error * Fix lint * Decouple gradio and backend * Add ui build * Add gunicorn * Fix gunicorn * Update nginx * add nginx image * Fix deployment issue * Fix upload --------- Co-authored-by: 筱文 <zxw320697@alibaba-inc.com> Co-authored-by: paradiseHIT <paradiseHIT@gmail.com> Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com> * Add guides for env and docker (#81) * Add guides for env * add guides for docker build * Add README * Add config guide cn&en (#82) * add es setting * add es setting * add elasticsearch test * add es test * add and modify es_tokenizer test * add and modify es_tokenizer test * modify test_as_tokenizer * add skipif * fix test linter fails * fix lint problem * update test_as_analyzer * add config_guide * add navigation into readme * Add doc reference for rag query (#84) * Support evaluation for generated and open datasets (#83) * Refactor evaluation module * add UI: eval_tab * support eval UI * tmp eval * remove eval web * Support evaluation * fix pytest * Add OpenDataSet class --------- Co-authored-by: ranxia <chenanyu.cay@alibaba-inc.com> * Fix oss url for miracl dataset (#86) * fix ui es upload (#85) * Fix eas LLM (#88) * Milvus support sparse search (#87) * Upload multiple files in single API call (#89) * Milvus support sparse search * aload fix * Upload multiple files in one api call * Remove notebooks * Fix tests * Fix http timeout issue * Add client default timeout limitation and support UI interactive (#90) * Add client default timeout limitation and support UI interactive * support interactivate for vectordb type * Fix ui issue (#91) * Fix deps and add gpu ci tests (#92) * Fix deps and add gpu ci tests * Don't send report in 2nd pipeline * Fix empty response for empty knowledge base (#93) * Fix empty response for empty knowledge base * Add constant for empty response message * Fix dup nodes (#94) * Add error handling (#96) * Add error handling * Add upload error msg * fix data_loader (#95) * fix data_loader * fix data_loder * fix data_loader * fix data_loader * Set proper log levels (#98) * Adjust config instruction and add es instruction (#99) * add es setting * add es setting * add elasticsearch test * add es test * add and modify es_tokenizer test * add and modify es_tokenizer test * modify test_as_tokenizer * add skipif * fix test linter fails * fix lint problem * update test_as_analyzer * add config_guide * add navigation into readme * adjust config guide and add es instruction * Log stacktrace for failed requests (#100) * Load milvus collection by default (#101) * Log stacktrace for failed requests * Load milvus collection by default * Rename & Relocate figures in md (#102) * add es setting * add es setting * add elasticsearch test * add es test * add and modify es_tokenizer test * add and modify es_tokenizer test * modify test_as_tokenizer * add skipif * fix test linter fails * fix lint problem * update test_as_analyzer * add config_guide * add navigation into readme * adjust config guide and add es instruction * modify md figures * minor modification * change md path and name * 针对windows平台修改docker启动命令 (#104) * 针对windows平台修改docker启动命令 * 针对windows平台修改docker启动命令 * 针对windows平台修改docker启动命令 * make format * make format, nothing changed * download models from oss automatically (#97) * download models from oss automatically * download models from oss automatically * download models from oss automatically * download models from oss automatically * download models from modelscope * download models from modelscope * fix readme * Fix bug in downloading models (#106) * Fix bug * Fix log * Fix download * Add markdown reader (#105) * fix pdf reader (#107) Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * Personal/ranxia/pdf table summary fix (#109) * fix pdf reader * fix pdf reader table summary --------- Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * FiAddage number to file_name (#110) * Support stream response for LLM (PaiEAS && DashScope) (#112) * Support stream response for LLM (PaiEAS && DashScope) * Add PaiEas LLM old file * Add image node processor (#114) * Fit image in response * Add image insert * Fix llm max-token * Fix bug (#115) * Fix bugs for chinese escaped string in API header (#117) * Fix bidi version (#119) * Add fix version * Update poetry.lock * Update streaming response to body field use server sent events (#120) * Fix streaming * Fix llm and vector query * Address comment * Remove extra print * Support simple-weighted-reranker and similarity-threshold (#116) * Support nomalized cosine_sim score for different vectorDB * Support simple-weighted-reranker and similarity-threshold * [Todo] Support ES hybrid search * Support Milvus * fix path * fix open dataset * Fix url for du-retrieval dataset * Restore setting * Fix reviews * Apply node_id for weighted_reranker * jsonl reader (#124) * jsonl reader * jsonl reader * Support function_calling with booking demo tools (#122) * Add booking system demo for function_calling * Support customized function calling tools * Add testcase for agent and llm * Fix test * Fix async test * Add readme for function calling * Add readme for function calling * Remove ref figs * Add nodes enhancement by raptor (#111) * add raptor * add raptor ui support * fix logger bug * add node_enhancement class and modify test * fix node_enhancement setting bug * lint adjustment * poetry lock * fix poetry.lock * fix poetry issues * add a param * add token calculation for Chinese and adjust context_window * update tokenization_qwen * update file_path * merge feature and update poetry.lock * exclude pytest since no vocab file in the test env * exclude qwen.tiktoken * delete assert * Add weather tool (#125) * weather okgit add .! * fix bug * space bug --------- Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * Don't use parallel when data size is big (#108) * Add opensearch (#127) * Add open search. Not tested * Fix * Fix config * update docker's readme (#126) * update docker's readme * change network back * change network back * change network back * Create ci.yml (#131) * Update CI & PR pipelines (#132) * Update CI * Fix ci * Fix a few ui bugs (#133) * Support RDS postgres vector store (#134) * support rds postgers for store engine * Format * support table * Make format --------- Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * Fix minor bugs (#135) * Fix bug * Fix index bug * Updaet password field * Add pre-commit * Remove upload button * Refine upload * Fix pg connection string * Fix empty response for score_threshold (#136) * Fix empty response for score_threshold * Modify empty response info * Modify empty response info --------- Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * fix table_reader in pdf_reader (#128) * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * fix table_reader in pdf_reader * add "enable_ocr" and "enable_table_summary" (#138) * add "enable_ocr" and "enable_table_summary" * add "enable_ocr" and "enable_table_summary" * add "enable_ocr" and "enable_table_summary" * Add release pipeline and fix some bugs (#137) * Fix bug * Add release pipeline * Update * Update * Fix bug * Fix login * Fix empty tag * Update * Fix ui issue * Add base version tag * Fix specific version * Use pg hybrid retrieval directly * Fix image tag * Fix llm config (#139) * Fix toml merge bug (#142) * Fix configuration conflict (#143) * Fix merge bug * Fix version conflict for config file * Resolve snapshot merge conflict * Fix space outage in github runner (#144) * Fix merge bug * Fix version conflict for config file * Resolve snapshot merge conflict * Update yaml --------- Co-authored-by: Ceceliachenen <162673161+Ceceliachenen@users.noreply.github.com> Co-authored-by: wwxxzz <zxw320697@alibaba-inc.com> Co-authored-by: paradiseHIT <paradiseHIT@gmail.com> Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com> Co-authored-by: aero-xi <129151855+aero-xi@users.noreply.github.com> Co-authored-by: ranxia <chenanyu.cay@alibaba-inc.com> Co-authored-by: aero-xi <chuyu.cx@alibaba-inc.com> Co-authored-by: CharlieKoo <81978191+CharlieKoo@users.noreply.github.com> Co-authored-by: zhangdingchu <80106639+zhangdingchu@users.noreply.github.com> Co-authored-by: zt2645802240 <47960912+zt2645802240@users.noreply.github.com>

load data pipeline supports read config

64da2ca

paradiseHIT requested a review from moria97 June 19, 2024 09:58

moria97 approved these changes Jun 19, 2024

View reviewed changes

paradiseHIT merged commit 5e4b667 into feature Jun 19, 2024
1 check passed

moria97 pushed a commit that referenced this pull request Jun 20, 2024

load data pipeline supports read config (#70)

d28f986

moria97 mentioned this pull request Aug 5, 2024

v0.1.0-20240802 release (#140) #146

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load data pipeline supports read config #70

load data pipeline supports read config #70

paradiseHIT commented Jun 19, 2024

github-actions bot commented Jun 19, 2024

load data pipeline supports read config #70

load data pipeline supports read config #70

Conversation

paradiseHIT commented Jun 19, 2024

github-actions bot commented Jun 19, 2024

☂️ Python Coverage

Overall Coverage

New Files

Modified Files