Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs for chinese escaped string in API header #117

Merged
merged 1 commit into from
Jul 19, 2024

Conversation

wwxxzz
Copy link
Collaborator

@wwxxzz wwxxzz commented Jul 19, 2024

No description provided.

@wwxxzz wwxxzz requested a review from moria97 July 19, 2024 04:29
@wwxxzz wwxxzz merged commit 18e001f into feature Jul 19, 2024
2 checks passed
@wwxxzz wwxxzz deleted the personal/xiaowen/fix_chinese_escaped_string branch July 19, 2024 04:31
Copy link

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
4525 2410 53% 50% 🟢

New Files

No new covered files...

Modified Files

No covered modified files...

updated for commit: 18951ea by action🐍

moria97 added a commit that referenced this pull request Aug 5, 2024
* Bugfix: a case that files' encodings can not be detected by chardet (#61)

* Bugfix: connection error for longtime upload tasks (#62)

* Fix connection error for longtime job

* fix testcase bugs

* support num workers for embedding model

* Refactor query api and add dataframe UI

* Refactor query api

* Remove embedding workers

* Add file: file_utils.py (#63)

* Fix connection error for longtime job

* fix testcase bugs

* support num workers for embedding model

* Refactor query api and add dataframe UI

* Refactor query api

* Remove embedding workers

* Add file_utils

---------

Co-authored-by: Yue Fei <59813791+moria97@users.noreply.github.com>

* Remove local storage and enable Elasticsearch hybrid query mode (#60)

* Add gpu dockerfile

* Fix bug

* Fix gb2312

* Update embedding batch size

* Set default embedding and llm model

* Update docker tag

* Fix hologres check

* Update registry

* Fix bug

* Fix tests

* Add queue

* Update batch size

* Add async interface

* Fix index conflict

* Add change index parameter for FAISS

* Fix batch size

* Update

* Modify async upload to sync (#64)

* Modify async upload to sync

* fix failed test

* Fix faiss_path not effective in retrieval (#65)

* Add API to support upload local files (#67)

* support upload file via API

* add Readme for upload API

* refactor query api

* modify load_knowledge with session_config

* use tempfile.mkdtemp() to store upload files

* add docker image timezone for China (#68)

* add image zone for China

* remove unused ENV

---------

Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com>
Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* load data pipeline supports read config (#70)

* Add gpu docker image timezone for China (#74)

* Add fast bm25 (#66)

* Add fast bm25

* Fix bm25 bug

* Fix bug

* Fix test

* Update readme and configuration (#77)

* fix demo.toml typo, and add comments for settings.toml for embedding

* update readme, add load data

* Update docker.yml

* Enable multiple workers to improve perf (#75)

* Add fast bm25

* Update

* Fix bug

* Fix bm25 bug

* Fix bug

* Refine code

* Update multi-process

* Add API to support upload local files (#67)

* support upload file via API

* add Readme for upload API

* refactor query api

* modify load_knowledge with session_config

* use tempfile.mkdtemp() to store upload files

* add docker image timezone for China (#68)

* add image zone for China

* remove unused ENV

---------

Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com>
Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* load data pipeline supports read config (#70)

* Add gpu docker image timezone for China (#74)

* Add fast bm25 (#66)

* Add fast bm25

* Fix bm25 bug

* Fix bug

* Fix test

* Update dockerfile

* Fix bug

* Update

* Update docker file

* Fix empty file bug

* Fix local index error

* Fix lint

* Decouple gradio and backend

* Add ui build

* Add gunicorn

* Fix gunicorn

* Update nginx

* add nginx image

* Fix deployment issue

* Fix upload

---------

Co-authored-by: 筱文 <zxw320697@alibaba-inc.com>
Co-authored-by: paradiseHIT <paradiseHIT@gmail.com>
Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com>

* Add guides for env and docker (#81)

* Add guides for env

* add guides for docker build

* Add README

* Add config guide cn&en (#82)

* add es setting

* add es setting

* add elasticsearch test

* add es test

* add and modify es_tokenizer test

* add and modify es_tokenizer test

* modify test_as_tokenizer

* add skipif

* fix test linter fails

* fix lint problem

* update test_as_analyzer

* add config_guide

* add navigation into readme

* Add doc reference for rag query (#84)

* Support evaluation for generated and open datasets (#83)

* Refactor evaluation module

* add UI: eval_tab

* support eval UI

* tmp eval

* remove eval web

* Support evaluation

* fix pytest

* Add OpenDataSet class

---------

Co-authored-by: ranxia <chenanyu.cay@alibaba-inc.com>

* Fix oss url for miracl dataset (#86)

* fix ui es upload (#85)

* Fix eas LLM (#88)

* Milvus support sparse search (#87)

* Upload multiple files in single API call (#89)

* Milvus support sparse search

* aload fix

* Upload multiple files in one api call

* Remove notebooks

* Fix tests

* Fix http timeout issue

* Add client default timeout limitation and support UI interactive (#90)

* Add client default timeout limitation and support UI interactive

* support interactivate for vectordb type

* Fix ui issue (#91)

* Fix deps and add gpu ci tests (#92)

* Fix deps and add gpu ci tests

* Don't send report in 2nd pipeline

* Fix empty response for empty knowledge base (#93)

* Fix empty response for empty knowledge base

* Add constant for empty response message

* Fix dup nodes (#94)

* Add error handling (#96)

* Add error handling

* Add upload error msg

* fix data_loader (#95)

* fix data_loader

* fix data_loder

* fix data_loader

* fix data_loader

* Set proper log levels (#98)

* Adjust config instruction and add es instruction (#99)

* add es setting

* add es setting

* add elasticsearch test

* add es test

* add and modify es_tokenizer test

* add and modify es_tokenizer test

* modify test_as_tokenizer

* add skipif

* fix test linter fails

* fix lint problem

* update test_as_analyzer

* add config_guide

* add navigation into readme

* adjust config guide and add es instruction

* Log stacktrace for failed requests (#100)

* Load milvus collection by default (#101)

* Log stacktrace for failed requests

* Load milvus collection by default

* Rename & Relocate figures in md (#102)

* add es setting

* add es setting

* add elasticsearch test

* add es test

* add and modify es_tokenizer test

* add and modify es_tokenizer test

* modify test_as_tokenizer

* add skipif

* fix test linter fails

* fix lint problem

* update test_as_analyzer

* add config_guide

* add navigation into readme

* adjust config guide and add es instruction

* modify md figures

* minor modification

* change md path and name

* 针对windows平台修改docker启动命令 (#104)

* 针对windows平台修改docker启动命令

* 针对windows平台修改docker启动命令

* 针对windows平台修改docker启动命令

* make format

* make format, nothing changed

* download models from oss automatically (#97)

* download models from oss automatically

* download models from oss automatically

* download models from oss automatically

* download models from oss automatically

* download models from modelscope

* download models from modelscope

* fix readme

* Fix bug in downloading models (#106)

* Fix bug

* Fix log

* Fix download

* Add markdown reader (#105)

* fix pdf reader (#107)

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* Personal/ranxia/pdf table summary fix (#109)

* fix pdf reader

* fix pdf reader table summary

---------

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* FiAddage number to file_name (#110)

* Support stream response for LLM (PaiEAS && DashScope) (#112)

* Support stream response for LLM (PaiEAS && DashScope)

* Add PaiEas LLM old file

* Add image node processor (#114)

* Fit image in response

* Add image insert

* Fix llm max-token

* Fix bug (#115)

* Fix bugs for chinese escaped string in API header (#117)

* Fix bidi version (#119)

* Add fix version

* Update poetry.lock

* Update streaming response to body field use server sent events (#120)

* Fix streaming

* Fix llm and vector query

* Address comment

* Remove extra print

* Support simple-weighted-reranker and similarity-threshold (#116)

* Support nomalized cosine_sim score for different vectorDB

* Support simple-weighted-reranker and similarity-threshold

* [Todo] Support ES hybrid search

* Support Milvus

* fix path

* fix open dataset

* Fix url for du-retrieval dataset

* Restore setting

* Fix reviews

* Apply node_id for weighted_reranker

* jsonl reader (#124)

* jsonl reader

* jsonl reader

* Support function_calling with booking demo tools (#122)

* Add booking system demo for function_calling

* Support customized function calling tools

* Add testcase for agent and llm

* Fix test

* Fix async test

* Add readme for function calling

* Add readme for function calling

* Remove ref figs

* Add nodes enhancement by raptor (#111)

* add raptor

* add raptor ui support

* fix logger bug

* add node_enhancement class and modify test

* fix node_enhancement setting bug

* lint adjustment

* poetry lock

* fix poetry.lock

* fix poetry issues

* add a param

* add token calculation for Chinese and adjust context_window

* update tokenization_qwen

* update file_path

* merge feature and update poetry.lock

* exclude pytest since no vocab file in the test env

* exclude qwen.tiktoken

* delete assert

* Add weather tool (#125)

* weather okgit add .!

* fix bug

* space bug

---------

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* Don't use parallel when data size is big (#108)

* Add opensearch (#127)

* Add open search. Not tested

* Fix

* Fix config

* update docker's readme (#126)

* update docker's readme

* change network back

* change network back

* change network back

* Create ci.yml (#131)

* Update CI & PR pipelines (#132)

* Update CI

* Fix ci

* Fix a few ui bugs (#133)

* Support RDS postgres vector store (#134)

* support rds postgers for store engine

* Format

* support table

* Make format

---------

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* Fix minor bugs (#135)

* Fix bug

* Fix index bug

* Updaet password field

* Add pre-commit

* Remove upload button

* Refine upload

* Fix pg connection string

* Fix empty response for score_threshold (#136)

* Fix empty response for score_threshold

* Modify empty response info

* Modify empty response info

---------

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* fix table_reader in pdf_reader (#128)

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* add "enable_ocr" and "enable_table_summary" (#138)

* add "enable_ocr" and "enable_table_summary"

* add "enable_ocr" and "enable_table_summary"

* add "enable_ocr" and "enable_table_summary"

* Add release pipeline and fix some bugs (#137)

* Fix bug

* Add release pipeline

* Update

* Update

* Fix bug

* Fix login

* Fix empty tag

* Update

* Fix ui issue

* Add base version tag

* Fix specific version

* Use pg hybrid retrieval directly

* Fix image tag

* Fix llm config (#139)

* Fix toml merge bug (#142)

* Fix configuration conflict (#143)

* Fix merge bug

* Fix version conflict for config file

* Resolve snapshot merge conflict

* Fix space outage in github runner (#144)

* Fix merge bug

* Fix version conflict for config file

* Resolve snapshot merge conflict

* Update yaml

---------

Co-authored-by: Ceceliachenen <162673161+Ceceliachenen@users.noreply.github.com>
Co-authored-by: wwxxzz <zxw320697@alibaba-inc.com>
Co-authored-by: paradiseHIT <paradiseHIT@gmail.com>
Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com>
Co-authored-by: aero-xi <129151855+aero-xi@users.noreply.github.com>
Co-authored-by: ranxia <chenanyu.cay@alibaba-inc.com>
Co-authored-by: aero-xi <chuyu.cx@alibaba-inc.com>
Co-authored-by: CharlieKoo <81978191+CharlieKoo@users.noreply.github.com>
Co-authored-by: zhangdingchu <80106639+zhangdingchu@users.noreply.github.com>
Co-authored-by: zt2645802240 <47960912+zt2645802240@users.noreply.github.com>
moria97 added a commit that referenced this pull request Aug 5, 2024
* Bugfix: a case that files' encodings can not be detected by chardet (#61)

* Bugfix: connection error for longtime upload tasks (#62)

* Fix connection error for longtime job

* fix testcase bugs

* support num workers for embedding model

* Refactor query api and add dataframe UI

* Refactor query api

* Remove embedding workers

* Add file: file_utils.py (#63)

* Fix connection error for longtime job

* fix testcase bugs

* support num workers for embedding model

* Refactor query api and add dataframe UI

* Refactor query api

* Remove embedding workers

* Add file_utils

---------



* Remove local storage and enable Elasticsearch hybrid query mode (#60)

* Add gpu dockerfile

* Fix bug

* Fix gb2312

* Update embedding batch size

* Set default embedding and llm model

* Update docker tag

* Fix hologres check

* Update registry

* Fix bug

* Fix tests

* Add queue

* Update batch size

* Add async interface

* Fix index conflict

* Add change index parameter for FAISS

* Fix batch size

* Update

* Modify async upload to sync (#64)

* Modify async upload to sync

* fix failed test

* Fix faiss_path not effective in retrieval (#65)

* Add API to support upload local files (#67)

* support upload file via API

* add Readme for upload API

* refactor query api

* modify load_knowledge with session_config

* use tempfile.mkdtemp() to store upload files

* add docker image timezone for China (#68)

* add image zone for China

* remove unused ENV

---------




* load data pipeline supports read config (#70)

* Add gpu docker image timezone for China (#74)

* Add fast bm25 (#66)

* Add fast bm25

* Fix bm25 bug

* Fix bug

* Fix test

* Update readme and configuration (#77)

* fix demo.toml typo, and add comments for settings.toml for embedding

* update readme, add load data

* Update docker.yml

* Enable multiple workers to improve perf (#75)

* Add fast bm25

* Update

* Fix bug

* Fix bm25 bug

* Fix bug

* Refine code

* Update multi-process

* Add API to support upload local files (#67)

* support upload file via API

* add Readme for upload API

* refactor query api

* modify load_knowledge with session_config

* use tempfile.mkdtemp() to store upload files

* add docker image timezone for China (#68)

* add image zone for China

* remove unused ENV

---------




* load data pipeline supports read config (#70)

* Add gpu docker image timezone for China (#74)

* Add fast bm25 (#66)

* Add fast bm25

* Fix bm25 bug

* Fix bug

* Fix test

* Update dockerfile

* Fix bug

* Update

* Update docker file

* Fix empty file bug

* Fix local index error

* Fix lint

* Decouple gradio and backend

* Add ui build

* Add gunicorn

* Fix gunicorn

* Update nginx

* add nginx image

* Fix deployment issue

* Fix upload

---------





* Add guides for env and docker (#81)

* Add guides for env

* add guides for docker build

* Add README

* Add config guide cn&en (#82)

* add es setting

* add es setting

* add elasticsearch test

* add es test

* add and modify es_tokenizer test

* add and modify es_tokenizer test

* modify test_as_tokenizer

* add skipif

* fix test linter fails

* fix lint problem

* update test_as_analyzer

* add config_guide

* add navigation into readme

* Add doc reference for rag query (#84)

* Support evaluation for generated and open datasets (#83)

* Refactor evaluation module

* add UI: eval_tab

* support eval UI

* tmp eval

* remove eval web

* Support evaluation

* fix pytest

* Add OpenDataSet class

---------



* Fix oss url for miracl dataset (#86)

* fix ui es upload (#85)

* Fix eas LLM (#88)

* Milvus support sparse search (#87)

* Upload multiple files in single API call (#89)

* Milvus support sparse search

* aload fix

* Upload multiple files in one api call

* Remove notebooks

* Fix tests

* Fix http timeout issue

* Add client default timeout limitation and support UI interactive (#90)

* Add client default timeout limitation and support UI interactive

* support interactivate for vectordb type

* Fix ui issue (#91)

* Fix deps and add gpu ci tests (#92)

* Fix deps and add gpu ci tests

* Don't send report in 2nd pipeline

* Fix empty response for empty knowledge base (#93)

* Fix empty response for empty knowledge base

* Add constant for empty response message

* Fix dup nodes (#94)

* Add error handling (#96)

* Add error handling

* Add upload error msg

* fix data_loader (#95)

* fix data_loader

* fix data_loder

* fix data_loader

* fix data_loader

* Set proper log levels (#98)

* Adjust config instruction and add es instruction (#99)

* add es setting

* add es setting

* add elasticsearch test

* add es test

* add and modify es_tokenizer test

* add and modify es_tokenizer test

* modify test_as_tokenizer

* add skipif

* fix test linter fails

* fix lint problem

* update test_as_analyzer

* add config_guide

* add navigation into readme

* adjust config guide and add es instruction

* Log stacktrace for failed requests (#100)

* Load milvus collection by default (#101)

* Log stacktrace for failed requests

* Load milvus collection by default

* Rename & Relocate figures in md (#102)

* add es setting

* add es setting

* add elasticsearch test

* add es test

* add and modify es_tokenizer test

* add and modify es_tokenizer test

* modify test_as_tokenizer

* add skipif

* fix test linter fails

* fix lint problem

* update test_as_analyzer

* add config_guide

* add navigation into readme

* adjust config guide and add es instruction

* modify md figures

* minor modification

* change md path and name

* 针对windows平台修改docker启动命令 (#104)

* 针对windows平台修改docker启动命令

* 针对windows平台修改docker启动命令

* 针对windows平台修改docker启动命令

* make format

* make format, nothing changed

* download models from oss automatically (#97)

* download models from oss automatically

* download models from oss automatically

* download models from oss automatically

* download models from oss automatically

* download models from modelscope

* download models from modelscope

* fix readme

* Fix bug in downloading models (#106)

* Fix bug

* Fix log

* Fix download

* Add markdown reader (#105)

* fix pdf reader (#107)



* Personal/ranxia/pdf table summary fix (#109)

* fix pdf reader

* fix pdf reader table summary

---------



* FiAddage number to file_name (#110)

* Support stream response for LLM (PaiEAS && DashScope) (#112)

* Support stream response for LLM (PaiEAS && DashScope)

* Add PaiEas LLM old file

* Add image node processor (#114)

* Fit image in response

* Add image insert

* Fix llm max-token

* Fix bug (#115)

* Fix bugs for chinese escaped string in API header (#117)

* Fix bidi version (#119)

* Add fix version

* Update poetry.lock

* Update streaming response to body field use server sent events (#120)

* Fix streaming

* Fix llm and vector query

* Address comment

* Remove extra print

* Support simple-weighted-reranker and similarity-threshold (#116)

* Support nomalized cosine_sim score for different vectorDB

* Support simple-weighted-reranker and similarity-threshold

* [Todo] Support ES hybrid search

* Support Milvus

* fix path

* fix open dataset

* Fix url for du-retrieval dataset

* Restore setting

* Fix reviews

* Apply node_id for weighted_reranker

* jsonl reader (#124)

* jsonl reader

* jsonl reader

* Support function_calling with booking demo tools (#122)

* Add booking system demo for function_calling

* Support customized function calling tools

* Add testcase for agent and llm

* Fix test

* Fix async test

* Add readme for function calling

* Add readme for function calling

* Remove ref figs

* Add nodes enhancement by raptor (#111)

* add raptor

* add raptor ui support

* fix logger bug

* add node_enhancement class and modify test

* fix node_enhancement setting bug

* lint adjustment

* poetry lock

* fix poetry.lock

* fix poetry issues

* add a param

* add token calculation for Chinese and adjust context_window

* update tokenization_qwen

* update file_path

* merge feature and update poetry.lock

* exclude pytest since no vocab file in the test env

* exclude qwen.tiktoken

* delete assert

* Add weather tool (#125)

* weather okgit add .!

* fix bug

* space bug

---------



* Don't use parallel when data size is big (#108)

* Add opensearch (#127)

* Add open search. Not tested

* Fix

* Fix config

* update docker's readme (#126)

* update docker's readme

* change network back

* change network back

* change network back

* Create ci.yml (#131)

* Update CI & PR pipelines (#132)

* Update CI

* Fix ci

* Fix a few ui bugs (#133)

* Support RDS postgres vector store (#134)

* support rds postgers for store engine

* Format

* support table

* Make format

---------



* Fix minor bugs (#135)

* Fix bug

* Fix index bug

* Updaet password field

* Add pre-commit

* Remove upload button

* Refine upload

* Fix pg connection string

* Fix empty response for score_threshold (#136)

* Fix empty response for score_threshold

* Modify empty response info

* Modify empty response info

---------



* fix table_reader in pdf_reader (#128)

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* add "enable_ocr" and "enable_table_summary" (#138)

* add "enable_ocr" and "enable_table_summary"

* add "enable_ocr" and "enable_table_summary"

* add "enable_ocr" and "enable_table_summary"

* Add release pipeline and fix some bugs (#137)

* Fix bug

* Add release pipeline

* Update

* Update

* Fix bug

* Fix login

* Fix empty tag

* Update

* Fix ui issue

* Add base version tag

* Fix specific version

* Use pg hybrid retrieval directly

* Fix image tag

* Fix llm config (#139)

* Fix toml merge bug (#142)

* Fix configuration conflict (#143)

* Fix merge bug

* Fix version conflict for config file

* Resolve snapshot merge conflict

* Fix space outage in github runner (#144)

* Fix merge bug

* Fix version conflict for config file

* Resolve snapshot merge conflict

* Update yaml

---------

Co-authored-by: Ceceliachenen <162673161+Ceceliachenen@users.noreply.github.com>
Co-authored-by: wwxxzz <zxw320697@alibaba-inc.com>
Co-authored-by: paradiseHIT <paradiseHIT@gmail.com>
Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com>
Co-authored-by: aero-xi <129151855+aero-xi@users.noreply.github.com>
Co-authored-by: ranxia <chenanyu.cay@alibaba-inc.com>
Co-authored-by: aero-xi <chuyu.cx@alibaba-inc.com>
Co-authored-by: CharlieKoo <81978191+CharlieKoo@users.noreply.github.com>
Co-authored-by: zhangdingchu <80106639+zhangdingchu@users.noreply.github.com>
Co-authored-by: zt2645802240 <47960912+zt2645802240@users.noreply.github.com>
moria97 added a commit that referenced this pull request Aug 5, 2024
* Bugfix: a case that files' encodings can not be detected by chardet (#61)

* Bugfix: connection error for longtime upload tasks (#62)

* Fix connection error for longtime job

* fix testcase bugs

* support num workers for embedding model

* Refactor query api and add dataframe UI

* Refactor query api

* Remove embedding workers

* Add file: file_utils.py (#63)

* Fix connection error for longtime job

* fix testcase bugs

* support num workers for embedding model

* Refactor query api and add dataframe UI

* Refactor query api

* Remove embedding workers

* Add file_utils

---------

Co-authored-by: Yue Fei <59813791+moria97@users.noreply.github.com>

* Remove local storage and enable Elasticsearch hybrid query mode (#60)

* Add gpu dockerfile

* Fix bug

* Fix gb2312

* Update embedding batch size

* Set default embedding and llm model

* Update docker tag

* Fix hologres check

* Update registry

* Fix bug

* Fix tests

* Add queue

* Update batch size

* Add async interface

* Fix index conflict

* Add change index parameter for FAISS

* Fix batch size

* Update

* Modify async upload to sync (#64)

* Modify async upload to sync

* fix failed test

* Fix faiss_path not effective in retrieval (#65)

* Add API to support upload local files (#67)

* support upload file via API

* add Readme for upload API

* refactor query api

* modify load_knowledge with session_config

* use tempfile.mkdtemp() to store upload files

* add docker image timezone for China (#68)

* add image zone for China

* remove unused ENV

---------

Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com>
Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* load data pipeline supports read config (#70)

* Add gpu docker image timezone for China (#74)

* Add fast bm25 (#66)

* Add fast bm25

* Fix bm25 bug

* Fix bug

* Fix test

* Update readme and configuration (#77)

* fix demo.toml typo, and add comments for settings.toml for embedding

* update readme, add load data

* Update docker.yml

* Enable multiple workers to improve perf (#75)

* Add fast bm25

* Update

* Fix bug

* Fix bm25 bug

* Fix bug

* Refine code

* Update multi-process

* Add API to support upload local files (#67)

* support upload file via API

* add Readme for upload API

* refactor query api

* modify load_knowledge with session_config

* use tempfile.mkdtemp() to store upload files

* add docker image timezone for China (#68)

* add image zone for China

* remove unused ENV

---------

Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com>
Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* load data pipeline supports read config (#70)

* Add gpu docker image timezone for China (#74)

* Add fast bm25 (#66)

* Add fast bm25

* Fix bm25 bug

* Fix bug

* Fix test

* Update dockerfile

* Fix bug

* Update

* Update docker file

* Fix empty file bug

* Fix local index error

* Fix lint

* Decouple gradio and backend

* Add ui build

* Add gunicorn

* Fix gunicorn

* Update nginx

* add nginx image

* Fix deployment issue

* Fix upload

---------

Co-authored-by: 筱文 <zxw320697@alibaba-inc.com>
Co-authored-by: paradiseHIT <paradiseHIT@gmail.com>
Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com>

* Add guides for env and docker (#81)

* Add guides for env

* add guides for docker build

* Add README

* Add config guide cn&en (#82)

* add es setting

* add es setting

* add elasticsearch test

* add es test

* add and modify es_tokenizer test

* add and modify es_tokenizer test

* modify test_as_tokenizer

* add skipif

* fix test linter fails

* fix lint problem

* update test_as_analyzer

* add config_guide

* add navigation into readme

* Add doc reference for rag query (#84)

* Support evaluation for generated and open datasets (#83)

* Refactor evaluation module

* add UI: eval_tab

* support eval UI

* tmp eval

* remove eval web

* Support evaluation

* fix pytest

* Add OpenDataSet class

---------

Co-authored-by: ranxia <chenanyu.cay@alibaba-inc.com>

* Fix oss url for miracl dataset (#86)

* fix ui es upload (#85)

* Fix eas LLM (#88)

* Milvus support sparse search (#87)

* Upload multiple files in single API call (#89)

* Milvus support sparse search

* aload fix

* Upload multiple files in one api call

* Remove notebooks

* Fix tests

* Fix http timeout issue

* Add client default timeout limitation and support UI interactive (#90)

* Add client default timeout limitation and support UI interactive

* support interactivate for vectordb type

* Fix ui issue (#91)

* Fix deps and add gpu ci tests (#92)

* Fix deps and add gpu ci tests

* Don't send report in 2nd pipeline

* Fix empty response for empty knowledge base (#93)

* Fix empty response for empty knowledge base

* Add constant for empty response message

* Fix dup nodes (#94)

* Add error handling (#96)

* Add error handling

* Add upload error msg

* fix data_loader (#95)

* fix data_loader

* fix data_loder

* fix data_loader

* fix data_loader

* Set proper log levels (#98)

* Adjust config instruction and add es instruction (#99)

* add es setting

* add es setting

* add elasticsearch test

* add es test

* add and modify es_tokenizer test

* add and modify es_tokenizer test

* modify test_as_tokenizer

* add skipif

* fix test linter fails

* fix lint problem

* update test_as_analyzer

* add config_guide

* add navigation into readme

* adjust config guide and add es instruction

* Log stacktrace for failed requests (#100)

* Load milvus collection by default (#101)

* Log stacktrace for failed requests

* Load milvus collection by default

* Rename & Relocate figures in md (#102)

* add es setting

* add es setting

* add elasticsearch test

* add es test

* add and modify es_tokenizer test

* add and modify es_tokenizer test

* modify test_as_tokenizer

* add skipif

* fix test linter fails

* fix lint problem

* update test_as_analyzer

* add config_guide

* add navigation into readme

* adjust config guide and add es instruction

* modify md figures

* minor modification

* change md path and name

* 针对windows平台修改docker启动命令 (#104)

* 针对windows平台修改docker启动命令

* 针对windows平台修改docker启动命令

* 针对windows平台修改docker启动命令

* make format

* make format, nothing changed

* download models from oss automatically (#97)

* download models from oss automatically

* download models from oss automatically

* download models from oss automatically

* download models from oss automatically

* download models from modelscope

* download models from modelscope

* fix readme

* Fix bug in downloading models (#106)

* Fix bug

* Fix log

* Fix download

* Add markdown reader (#105)

* fix pdf reader (#107)

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* Personal/ranxia/pdf table summary fix (#109)

* fix pdf reader

* fix pdf reader table summary

---------

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* FiAddage number to file_name (#110)

* Support stream response for LLM (PaiEAS && DashScope) (#112)

* Support stream response for LLM (PaiEAS && DashScope)

* Add PaiEas LLM old file

* Add image node processor (#114)

* Fit image in response

* Add image insert

* Fix llm max-token

* Fix bug (#115)

* Fix bugs for chinese escaped string in API header (#117)

* Fix bidi version (#119)

* Add fix version

* Update poetry.lock

* Update streaming response to body field use server sent events (#120)

* Fix streaming

* Fix llm and vector query

* Address comment

* Remove extra print

* Support simple-weighted-reranker and similarity-threshold (#116)

* Support nomalized cosine_sim score for different vectorDB

* Support simple-weighted-reranker and similarity-threshold

* [Todo] Support ES hybrid search

* Support Milvus

* fix path

* fix open dataset

* Fix url for du-retrieval dataset

* Restore setting

* Fix reviews

* Apply node_id for weighted_reranker

* jsonl reader (#124)

* jsonl reader

* jsonl reader

* Support function_calling with booking demo tools (#122)

* Add booking system demo for function_calling

* Support customized function calling tools

* Add testcase for agent and llm

* Fix test

* Fix async test

* Add readme for function calling

* Add readme for function calling

* Remove ref figs

* Add nodes enhancement by raptor (#111)

* add raptor

* add raptor ui support

* fix logger bug

* add node_enhancement class and modify test

* fix node_enhancement setting bug

* lint adjustment

* poetry lock

* fix poetry.lock

* fix poetry issues

* add a param

* add token calculation for Chinese and adjust context_window

* update tokenization_qwen

* update file_path

* merge feature and update poetry.lock

* exclude pytest since no vocab file in the test env

* exclude qwen.tiktoken

* delete assert

* Add weather tool (#125)

* weather okgit add .!

* fix bug

* space bug

---------

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* Don't use parallel when data size is big (#108)

* Add opensearch (#127)

* Add open search. Not tested

* Fix

* Fix config

* update docker's readme (#126)

* update docker's readme

* change network back

* change network back

* change network back

* Create ci.yml (#131)

* Update CI & PR pipelines (#132)

* Update CI

* Fix ci

* Fix a few ui bugs (#133)

* Support RDS postgres vector store (#134)

* support rds postgers for store engine

* Format

* support table

* Make format

---------

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* Fix minor bugs (#135)

* Fix bug

* Fix index bug

* Updaet password field

* Add pre-commit

* Remove upload button

* Refine upload

* Fix pg connection string

* Fix empty response for score_threshold (#136)

* Fix empty response for score_threshold

* Modify empty response info

* Modify empty response info

---------

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* fix table_reader in pdf_reader (#128)

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* add "enable_ocr" and "enable_table_summary" (#138)

* add "enable_ocr" and "enable_table_summary"

* add "enable_ocr" and "enable_table_summary"

* add "enable_ocr" and "enable_table_summary"

* Add release pipeline and fix some bugs (#137)

* Fix bug

* Add release pipeline

* Update

* Update

* Fix bug

* Fix login

* Fix empty tag

* Update

* Fix ui issue

* Add base version tag

* Fix specific version

* Use pg hybrid retrieval directly

* Fix image tag

* Fix llm config (#139)

* Fix toml merge bug (#142)

* Fix configuration conflict (#143)

* Fix merge bug

* Fix version conflict for config file

* Resolve snapshot merge conflict

* Fix space outage in github runner (#144)

* Fix merge bug

* Fix version conflict for config file

* Resolve snapshot merge conflict

* Update yaml

---------

Co-authored-by: Ceceliachenen <162673161+Ceceliachenen@users.noreply.github.com>
Co-authored-by: wwxxzz <zxw320697@alibaba-inc.com>
Co-authored-by: paradiseHIT <paradiseHIT@gmail.com>
Co-authored-by: shubao.sx <shubao.sx@alibaba-inc.com>
Co-authored-by: aero-xi <129151855+aero-xi@users.noreply.github.com>
Co-authored-by: ranxia <chenanyu.cay@alibaba-inc.com>
Co-authored-by: aero-xi <chuyu.cx@alibaba-inc.com>
Co-authored-by: CharlieKoo <81978191+CharlieKoo@users.noreply.github.com>
Co-authored-by: zhangdingchu <80106639+zhangdingchu@users.noreply.github.com>
Co-authored-by: zt2645802240 <47960912+zt2645802240@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants