Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multimodal image retriever adjustment #167

Merged

Conversation

Ceceliachenen
Copy link
Collaborator

  1. change image size
  2. limit image number
  3. fix retriever answer ui format

2. limit image number
3. fix retriever answer ui format
@Ceceliachenen Ceceliachenen requested a review from wwxxzz August 27, 2024 11:43
@Ceceliachenen Ceceliachenen merged commit 79c0a3f into personal/ranxia/MinerU Aug 27, 2024
@Ceceliachenen Ceceliachenen deleted the personal/ranxia/multimodal_adjust_images branch August 27, 2024 11:52
Ceceliachenen added a commit that referenced this pull request Sep 4, 2024
* add minerU

* add minerU

* add minerU

* Fix nodes id and simi_topK

* remove image url from text

* remove image url from text

* remove image url from text

* Support FAQ query w/o image (#162)

* Support FAQ query w/o image

* Using LLM when query w/o images

* Personal/ranxia/mineru enhancement (#164)

* remove repeat nodes

* show multiple pictures in media

* show multiple pictures in media

* Install miner with poetry (#165)

* fix retriever

* Support OSS Data Loader (#166)

* Support oss data loader

* Skip file which has been uploaded

* Support oss prefix via api

* 1. change image size (#167)

2. limit image number
3. fix retriever answer ui format

* adjust image score (#169)

* merge feature

* merge feature

* merge feature

* merge feature

* Fix bug (#173)

* Support chunk text-overflow display (#170)

* Fix bugs

* Support text-overflow

* Support text-overflow

* Support load MinerU config file automatically (#175)

* Support load MinerU config file automatically

* Modify

* Direct writing the config rather than copying

* Fix multi_modal build docker (#176)

* fix load_config (#177)

* change  multimodal prompt (#178)

* Test Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix test bug (#174) (#179)

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* Fix Dockerfile (#180)

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix docker env (#181)

* Fix Dockerfile

* Fix bugs

* Fix docker env

* Fix docker env

* Fix docker env (#183)

* Fix Dockerfile

* Fix bugs

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Bugfix

* Bugfix for EAS (#184)

* Fix Dockerfile

* Fix bugs

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Bugfix

* Bugfix

* Fix detectron link (#182)

* Update detectron dependency (#185)

* Update dependency

* udpate poetry lock

* fix multimodal_config and prompt (#186)

* fix MinerU readme (#189)

* Add timeout and more logs (#188)

* Personal/ranxia/fix miner u readme (#190)

* fix MinerU readme

* fix MinerU readme

* Personal/ranxia/fix miner u readme (#191)

* fix MinerU readme

* fix MinerU readme

* fix MinerU config

* fix MinerU bug (#192)

* Personal/ranxia/fix test and review bug (#193)

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

---------

Co-authored-by: 筱文 <zxw320697@alibaba-inc.com>
Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>
moria97 added a commit that referenced this pull request Sep 6, 2024
* Replace PaiEas LLM with LLI-integration and upgrade python to 3.11 (#148)

* Replace PaiEas LLM with LLI-integration and upgrade python version to 3.11

* Replace MyFCDashScope with OpenAILike class

* Fix pyproject dependency

* bug fix (#149)

* Support postgresql load user dict (#150)

* make format

* Allow not install extension pg_jieba

* table name data_default

* Convert raptor processor to TransformComponent (#151)

* udpate raptor using transform

* modify raptor with transform

* modify raptor and dataloader

---------

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* Add clip model (#130)

* Update

* Add clip model

* Fix oss cache

* Fix cache

* Pdf reader upload image

* Add multimodal

* Update config

* Use two embedding

* Add text_image node

* Add tests

* Fix tests

* fix multi_modal_vector

---------

Co-authored-by: 燃夏 <chenanyu.cay@alibaba-inc.com>

* Fix docker base image (#152)

* change insert to be sync (#153)

* Personal/ranxia/fix image readme (#155)

* fix multi_modal and readme

* fix multi_modal and readme

* fix multi_modal and readme

* fix multi_modal image (#156)

* Support Agentic RAG with intent and functioncalling (#154)

* Add intent detection module

* Remove LlmQuery class

* Support API

* Refactor agent module and format toml

* Refactor module tool

* Refactor query api

* Add demo and UI

* remove

* Fix reviews

* Add test for intent and api

* Add web search (#161)

* Add web search

* Fix lint

* Fix bug

* Update timeout

* Fix bug

* Fix jieba bug (#163)

* Support PAI-EAS MultiModal LLM (#168)

* Support minicpm

* Fix issue

* Bugfix: PaiEas LLM endpoint & max_tokens (#171)

* Fix dashscope interface (#172)

* Fix dashscope llm

* Fix bug

* Fix test bug (#174)

* add minerU (#160)

* add minerU

* add minerU

* add minerU

* Fix nodes id and simi_topK

* remove image url from text

* remove image url from text

* remove image url from text

* Support FAQ query w/o image (#162)

* Support FAQ query w/o image

* Using LLM when query w/o images

* Personal/ranxia/mineru enhancement (#164)

* remove repeat nodes

* show multiple pictures in media

* show multiple pictures in media

* Install miner with poetry (#165)

* fix retriever

* Support OSS Data Loader (#166)

* Support oss data loader

* Skip file which has been uploaded

* Support oss prefix via api

* 1. change image size (#167)

2. limit image number
3. fix retriever answer ui format

* adjust image score (#169)

* merge feature

* merge feature

* merge feature

* merge feature

* Fix bug (#173)

* Support chunk text-overflow display (#170)

* Fix bugs

* Support text-overflow

* Support text-overflow

* Support load MinerU config file automatically (#175)

* Support load MinerU config file automatically

* Modify

* Direct writing the config rather than copying

* Fix multi_modal build docker (#176)

* fix load_config (#177)

* change  multimodal prompt (#178)

* Test Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix test bug (#174) (#179)

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* Fix Dockerfile (#180)

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix docker env (#181)

* Fix Dockerfile

* Fix bugs

* Fix docker env

* Fix docker env

* Fix docker env (#183)

* Fix Dockerfile

* Fix bugs

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Bugfix

* Bugfix for EAS (#184)

* Fix Dockerfile

* Fix bugs

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Bugfix

* Bugfix

* Fix detectron link (#182)

* Update detectron dependency (#185)

* Update dependency

* udpate poetry lock

* fix multimodal_config and prompt (#186)

* fix MinerU readme (#189)

* Add timeout and more logs (#188)

* Personal/ranxia/fix miner u readme (#190)

* fix MinerU readme

* fix MinerU readme

* Personal/ranxia/fix miner u readme (#191)

* fix MinerU readme

* fix MinerU readme

* fix MinerU config

* fix MinerU bug (#192)

* Personal/ranxia/fix test and review bug (#193)

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

---------

Co-authored-by: 筱文 <zxw320697@alibaba-inc.com>
Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* fix multimodal readme and config (#195)

* nl2sql refactoring (#194)

* change insert to be sync

* add nl2sql

* nl2sql setting

* nl2sql setting

* fix test bug

* fix bugs

* data analysis retriever and synthesizer

* fix tests bugs

* add data_analysis ui

* update poetry.lock

* remove unnecessary comment

* add fault tolerance if no file provided

* add minor fault tolerance

* add upload_datasheet

* nl2sql refactor and add db ui

* restore retriever & synthesizer

* update poetry.lock

* Fix list merge

* bug fix

* add default display

---------

Co-authored-by: 陆逊 <luxun.fy@alibaba-inc.com>

* Personal/xi/nl2sql UI (#196)

* change insert to be sync

* add nl2sql

* nl2sql setting

* nl2sql setting

* fix test bug

* fix bugs

* data analysis retriever and synthesizer

* fix tests bugs

* add data_analysis ui

* update poetry.lock

* remove unnecessary comment

* add fault tolerance if no file provided

* add minor fault tolerance

* add upload_datasheet

* nl2sql refactor and add db ui

* restore retriever & synthesizer

* update poetry.lock

* Fix list merge

* bug fix

* add default display

* data_analysis ui update

---------

Co-authored-by: 陆逊 <luxun.fy@alibaba-inc.com>

* Personal/ranxia/change max new tokens (#199)

* set multimodal llm max_new_tokens

* set multimodal llm max_new_tokens

* Add trace (#197)

* Add trace

* Fix bug

* Push to hangzhou region by default

* 修复tables和descriptions默认配置bug (#198)

* change insert to be sync

* add nl2sql

* nl2sql setting

* nl2sql setting

* fix test bug

* fix bugs

* data analysis retriever and synthesizer

* fix tests bugs

* add data_analysis ui

* update poetry.lock

* remove unnecessary comment

* add fault tolerance if no file provided

* add minor fault tolerance

* add upload_datasheet

* nl2sql refactor and add db ui

* restore retriever & synthesizer

* update poetry.lock

* Fix list merge

* bug fix

* add default display

* data_analysis ui update

* fix table & description & query_output bugs

* fix inconsistency between frontend and backend data structures

---------

Co-authored-by: 陆逊 <luxun.fy@alibaba-inc.com>

* Fix nginx routing (#200)

* Fix nginx routing (#202)

* Fix nginx routing

* Fix nginx config

* add data_analysis doc (#201)

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* Resolve conflict

---------

Co-authored-by: wwxxzz <zxw320697@alibaba-inc.com>
Co-authored-by: aero-xi <chuyu.cx@alibaba-inc.com>
Co-authored-by: zt2645802240 <47960912+zt2645802240@users.noreply.github.com>
Co-authored-by: 燃夏 <chenanyu.cay@alibaba-inc.com>
moria97 added a commit that referenced this pull request Sep 6, 2024
* Replace PaiEas LLM with LLI-integration and upgrade python to 3.11 (#148)

* Replace PaiEas LLM with LLI-integration and upgrade python version to 3.11

* Replace MyFCDashScope with OpenAILike class

* Fix pyproject dependency

* bug fix (#149)

* Support postgresql load user dict (#150)

* make format

* Allow not install extension pg_jieba

* table name data_default

* Convert raptor processor to TransformComponent (#151)

* udpate raptor using transform

* modify raptor with transform

* modify raptor and dataloader

---------

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* Add clip model (#130)

* Update

* Add clip model

* Fix oss cache

* Fix cache

* Pdf reader upload image

* Add multimodal

* Update config

* Use two embedding

* Add text_image node

* Add tests

* Fix tests

* fix multi_modal_vector

---------

Co-authored-by: 燃夏 <chenanyu.cay@alibaba-inc.com>

* Fix docker base image (#152)

* change insert to be sync (#153)

* Personal/ranxia/fix image readme (#155)

* fix multi_modal and readme

* fix multi_modal and readme

* fix multi_modal and readme

* fix multi_modal image (#156)

* Support Agentic RAG with intent and functioncalling (#154)

* Add intent detection module

* Remove LlmQuery class

* Support API

* Refactor agent module and format toml

* Refactor module tool

* Refactor query api

* Add demo and UI

* remove

* Fix reviews

* Add test for intent and api

* Add web search (#161)

* Add web search

* Fix lint

* Fix bug

* Update timeout

* Fix bug

* Fix jieba bug (#163)

* Support PAI-EAS MultiModal LLM (#168)

* Support minicpm

* Fix issue

* Bugfix: PaiEas LLM endpoint & max_tokens (#171)

* Fix dashscope interface (#172)

* Fix dashscope llm

* Fix bug

* Fix test bug (#174)

* add minerU (#160)

* add minerU

* add minerU

* add minerU

* Fix nodes id and simi_topK

* remove image url from text

* remove image url from text

* remove image url from text

* Support FAQ query w/o image (#162)

* Support FAQ query w/o image

* Using LLM when query w/o images

* Personal/ranxia/mineru enhancement (#164)

* remove repeat nodes

* show multiple pictures in media

* show multiple pictures in media

* Install miner with poetry (#165)

* fix retriever

* Support OSS Data Loader (#166)

* Support oss data loader

* Skip file which has been uploaded

* Support oss prefix via api

* 1. change image size (#167)

2. limit image number
3. fix retriever answer ui format

* adjust image score (#169)

* merge feature

* merge feature

* merge feature

* merge feature

* Fix bug (#173)

* Support chunk text-overflow display (#170)

* Fix bugs

* Support text-overflow

* Support text-overflow

* Support load MinerU config file automatically (#175)

* Support load MinerU config file automatically

* Modify

* Direct writing the config rather than copying

* Fix multi_modal build docker (#176)

* fix load_config (#177)

* change  multimodal prompt (#178)

* Test Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix test bug (#174) (#179)

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* Fix Dockerfile (#180)

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix docker env (#181)

* Fix Dockerfile

* Fix bugs

* Fix docker env

* Fix docker env

* Fix docker env (#183)

* Fix Dockerfile

* Fix bugs

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Bugfix

* Bugfix for EAS (#184)

* Fix Dockerfile

* Fix bugs

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Bugfix

* Bugfix

* Fix detectron link (#182)

* Update detectron dependency (#185)

* Update dependency

* udpate poetry lock

* fix multimodal_config and prompt (#186)

* fix MinerU readme (#189)

* Add timeout and more logs (#188)

* Personal/ranxia/fix miner u readme (#190)

* fix MinerU readme

* fix MinerU readme

* Personal/ranxia/fix miner u readme (#191)

* fix MinerU readme

* fix MinerU readme

* fix MinerU config

* fix MinerU bug (#192)

* Personal/ranxia/fix test and review bug (#193)

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

---------

Co-authored-by: 筱文 <zxw320697@alibaba-inc.com>
Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* fix multimodal readme and config (#195)

* nl2sql refactoring (#194)

* change insert to be sync

* add nl2sql

* nl2sql setting

* nl2sql setting

* fix test bug

* fix bugs

* data analysis retriever and synthesizer

* fix tests bugs

* add data_analysis ui

* update poetry.lock

* remove unnecessary comment

* add fault tolerance if no file provided

* add minor fault tolerance

* add upload_datasheet

* nl2sql refactor and add db ui

* restore retriever & synthesizer

* update poetry.lock

* Fix list merge

* bug fix

* add default display

---------

Co-authored-by: 陆逊 <luxun.fy@alibaba-inc.com>

* Personal/xi/nl2sql UI (#196)

* change insert to be sync

* add nl2sql

* nl2sql setting

* nl2sql setting

* fix test bug

* fix bugs

* data analysis retriever and synthesizer

* fix tests bugs

* add data_analysis ui

* update poetry.lock

* remove unnecessary comment

* add fault tolerance if no file provided

* add minor fault tolerance

* add upload_datasheet

* nl2sql refactor and add db ui

* restore retriever & synthesizer

* update poetry.lock

* Fix list merge

* bug fix

* add default display

* data_analysis ui update

---------

Co-authored-by: 陆逊 <luxun.fy@alibaba-inc.com>

* Personal/ranxia/change max new tokens (#199)

* set multimodal llm max_new_tokens

* set multimodal llm max_new_tokens

* Add trace (#197)

* Add trace

* Fix bug

* Push to hangzhou region by default

* 修复tables和descriptions默认配置bug (#198)

* change insert to be sync

* add nl2sql

* nl2sql setting

* nl2sql setting

* fix test bug

* fix bugs

* data analysis retriever and synthesizer

* fix tests bugs

* add data_analysis ui

* update poetry.lock

* remove unnecessary comment

* add fault tolerance if no file provided

* add minor fault tolerance

* add upload_datasheet

* nl2sql refactor and add db ui

* restore retriever & synthesizer

* update poetry.lock

* Fix list merge

* bug fix

* add default display

* data_analysis ui update

* fix table & description & query_output bugs

* fix inconsistency between frontend and backend data structures

---------

Co-authored-by: 陆逊 <luxun.fy@alibaba-inc.com>

* Fix nginx routing (#200)

* Fix nginx routing (#202)

* Fix nginx routing

* Fix nginx config

* add data_analysis doc (#201)

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* Resolve conflict

* Fix session_id bug (#204)

---------

Co-authored-by: wwxxzz <zxw320697@alibaba-inc.com>
Co-authored-by: aero-xi <chuyu.cx@alibaba-inc.com>
Co-authored-by: zt2645802240 <47960912+zt2645802240@users.noreply.github.com>
Co-authored-by: 燃夏 <chenanyu.cay@alibaba-inc.com>
moria97 added a commit that referenced this pull request Sep 6, 2024
* Replace PaiEas LLM with LLI-integration and upgrade python to 3.11 (#148)

* Replace PaiEas LLM with LLI-integration and upgrade python version to 3.11

* Replace MyFCDashScope with OpenAILike class

* Fix pyproject dependency

* bug fix (#149)

* Support postgresql load user dict (#150)

* make format

* Allow not install extension pg_jieba

* table name data_default

* Convert raptor processor to TransformComponent (#151)

* udpate raptor using transform

* modify raptor with transform

* modify raptor and dataloader

---------



* Add clip model (#130)

* Update

* Add clip model

* Fix oss cache

* Fix cache

* Pdf reader upload image

* Add multimodal

* Update config

* Use two embedding

* Add text_image node

* Add tests

* Fix tests

* fix multi_modal_vector

---------



* Fix docker base image (#152)

* change insert to be sync (#153)

* Personal/ranxia/fix image readme (#155)

* fix multi_modal and readme

* fix multi_modal and readme

* fix multi_modal and readme

* fix multi_modal image (#156)

* Support Agentic RAG with intent and functioncalling (#154)

* Add intent detection module

* Remove LlmQuery class

* Support API

* Refactor agent module and format toml

* Refactor module tool

* Refactor query api

* Add demo and UI

* remove

* Fix reviews

* Add test for intent and api

* Add web search (#161)

* Add web search

* Fix lint

* Fix bug

* Update timeout

* Fix bug

* Fix jieba bug (#163)

* Support PAI-EAS MultiModal LLM (#168)

* Support minicpm

* Fix issue

* Bugfix: PaiEas LLM endpoint & max_tokens (#171)

* Fix dashscope interface (#172)

* Fix dashscope llm

* Fix bug

* Fix test bug (#174)

* add minerU (#160)

* add minerU

* add minerU

* add minerU

* Fix nodes id and simi_topK

* remove image url from text

* remove image url from text

* remove image url from text

* Support FAQ query w/o image (#162)

* Support FAQ query w/o image

* Using LLM when query w/o images

* Personal/ranxia/mineru enhancement (#164)

* remove repeat nodes

* show multiple pictures in media

* show multiple pictures in media

* Install miner with poetry (#165)

* fix retriever

* Support OSS Data Loader (#166)

* Support oss data loader

* Skip file which has been uploaded

* Support oss prefix via api

* 1. change image size (#167)

2. limit image number
3. fix retriever answer ui format

* adjust image score (#169)

* merge feature

* merge feature

* merge feature

* merge feature

* Fix bug (#173)

* Support chunk text-overflow display (#170)

* Fix bugs

* Support text-overflow

* Support text-overflow

* Support load MinerU config file automatically (#175)

* Support load MinerU config file automatically

* Modify

* Direct writing the config rather than copying

* Fix multi_modal build docker (#176)

* fix load_config (#177)

* change  multimodal prompt (#178)

* Test Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix test bug (#174) (#179)



* Fix Dockerfile (#180)

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix docker env (#181)

* Fix Dockerfile

* Fix bugs

* Fix docker env

* Fix docker env

* Fix docker env (#183)

* Fix Dockerfile

* Fix bugs

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Bugfix

* Bugfix for EAS (#184)

* Fix Dockerfile

* Fix bugs

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Bugfix

* Bugfix

* Fix detectron link (#182)

* Update detectron dependency (#185)

* Update dependency

* udpate poetry lock

* fix multimodal_config and prompt (#186)

* fix MinerU readme (#189)

* Add timeout and more logs (#188)

* Personal/ranxia/fix miner u readme (#190)

* fix MinerU readme

* fix MinerU readme

* Personal/ranxia/fix miner u readme (#191)

* fix MinerU readme

* fix MinerU readme

* fix MinerU config

* fix MinerU bug (#192)

* Personal/ranxia/fix test and review bug (#193)

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

---------




* fix multimodal readme and config (#195)

* nl2sql refactoring (#194)

* change insert to be sync

* add nl2sql

* nl2sql setting

* nl2sql setting

* fix test bug

* fix bugs

* data analysis retriever and synthesizer

* fix tests bugs

* add data_analysis ui

* update poetry.lock

* remove unnecessary comment

* add fault tolerance if no file provided

* add minor fault tolerance

* add upload_datasheet

* nl2sql refactor and add db ui

* restore retriever & synthesizer

* update poetry.lock

* Fix list merge

* bug fix

* add default display

---------



* Personal/xi/nl2sql UI (#196)

* change insert to be sync

* add nl2sql

* nl2sql setting

* nl2sql setting

* fix test bug

* fix bugs

* data analysis retriever and synthesizer

* fix tests bugs

* add data_analysis ui

* update poetry.lock

* remove unnecessary comment

* add fault tolerance if no file provided

* add minor fault tolerance

* add upload_datasheet

* nl2sql refactor and add db ui

* restore retriever & synthesizer

* update poetry.lock

* Fix list merge

* bug fix

* add default display

* data_analysis ui update

---------



* Personal/ranxia/change max new tokens (#199)

* set multimodal llm max_new_tokens

* set multimodal llm max_new_tokens

* Add trace (#197)

* Add trace

* Fix bug

* Push to hangzhou region by default

* 修复tables和descriptions默认配置bug (#198)

* change insert to be sync

* add nl2sql

* nl2sql setting

* nl2sql setting

* fix test bug

* fix bugs

* data analysis retriever and synthesizer

* fix tests bugs

* add data_analysis ui

* update poetry.lock

* remove unnecessary comment

* add fault tolerance if no file provided

* add minor fault tolerance

* add upload_datasheet

* nl2sql refactor and add db ui

* restore retriever & synthesizer

* update poetry.lock

* Fix list merge

* bug fix

* add default display

* data_analysis ui update

* fix table & description & query_output bugs

* fix inconsistency between frontend and backend data structures

---------



* Fix nginx routing (#200)

* Fix nginx routing (#202)

* Fix nginx routing

* Fix nginx config

* add data_analysis doc (#201)



* Resolve conflict

* Fix session_id bug (#204)

---------

Co-authored-by: wwxxzz <zxw320697@alibaba-inc.com>
Co-authored-by: aero-xi <chuyu.cx@alibaba-inc.com>
Co-authored-by: zt2645802240 <47960912+zt2645802240@users.noreply.github.com>
Co-authored-by: 燃夏 <chenanyu.cay@alibaba-inc.com>
moria97 added a commit that referenced this pull request Nov 22, 2024
* Replace PaiEas LLM with LLI-integration and upgrade python to 3.11 (#148)

* Replace PaiEas LLM with LLI-integration and upgrade python version to 3.11

* Replace MyFCDashScope with OpenAILike class

* Fix pyproject dependency

* bug fix (#149)

* Support postgresql load user dict (#150)

* make format

* Allow not install extension pg_jieba

* table name data_default

* Convert raptor processor to TransformComponent (#151)

* udpate raptor using transform

* modify raptor with transform

* modify raptor and dataloader

---------

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* Add clip model (#130)

* Update

* Add clip model

* Fix oss cache

* Fix cache

* Pdf reader upload image

* Add multimodal

* Update config

* Use two embedding

* Add text_image node

* Add tests

* Fix tests

* fix multi_modal_vector

---------

Co-authored-by: 燃夏 <chenanyu.cay@alibaba-inc.com>

* Fix docker base image (#152)

* change insert to be sync (#153)

* Personal/ranxia/fix image readme (#155)

* fix multi_modal and readme

* fix multi_modal and readme

* fix multi_modal and readme

* fix multi_modal image (#156)

* Support Agentic RAG with intent and functioncalling (#154)

* Add intent detection module

* Remove LlmQuery class

* Support API

* Refactor agent module and format toml

* Refactor module tool

* Refactor query api

* Add demo and UI

* remove

* Fix reviews

* Add test for intent and api

* Add web search (#161)

* Add web search

* Fix lint

* Fix bug

* Update timeout

* Fix bug

* Fix jieba bug (#163)

* Support PAI-EAS MultiModal LLM (#168)

* Support minicpm

* Fix issue

* Bugfix: PaiEas LLM endpoint & max_tokens (#171)

* Fix dashscope interface (#172)

* Fix dashscope llm

* Fix bug

* Fix test bug (#174)

* add minerU (#160)

* add minerU

* add minerU

* add minerU

* Fix nodes id and simi_topK

* remove image url from text

* remove image url from text

* remove image url from text

* Support FAQ query w/o image (#162)

* Support FAQ query w/o image

* Using LLM when query w/o images

* Personal/ranxia/mineru enhancement (#164)

* remove repeat nodes

* show multiple pictures in media

* show multiple pictures in media

* Install miner with poetry (#165)

* fix retriever

* Support OSS Data Loader (#166)

* Support oss data loader

* Skip file which has been uploaded

* Support oss prefix via api

* 1. change image size (#167)

2. limit image number
3. fix retriever answer ui format

* adjust image score (#169)

* merge feature

* merge feature

* merge feature

* merge feature

* Fix bug (#173)

* Support chunk text-overflow display (#170)

* Fix bugs

* Support text-overflow

* Support text-overflow

* Support load MinerU config file automatically (#175)

* Support load MinerU config file automatically

* Modify

* Direct writing the config rather than copying

* Fix multi_modal build docker (#176)

* fix load_config (#177)

* change  multimodal prompt (#178)

* Test Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix test bug (#174) (#179)

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* Fix Dockerfile (#180)

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix docker env (#181)

* Fix Dockerfile

* Fix bugs

* Fix docker env

* Fix docker env

* Fix docker env (#183)

* Fix Dockerfile

* Fix bugs

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Bugfix

* Bugfix for EAS (#184)

* Fix Dockerfile

* Fix bugs

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Bugfix

* Bugfix

* Fix detectron link (#182)

* Update detectron dependency (#185)

* Update dependency

* udpate poetry lock

* fix multimodal_config and prompt (#186)

* fix MinerU readme (#189)

* Add timeout and more logs (#188)

* Personal/ranxia/fix miner u readme (#190)

* fix MinerU readme

* fix MinerU readme

* Personal/ranxia/fix miner u readme (#191)

* fix MinerU readme

* fix MinerU readme

* fix MinerU config

* fix MinerU bug (#192)

* Personal/ranxia/fix test and review bug (#193)

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

---------

Co-authored-by: 筱文 <zxw320697@alibaba-inc.com>
Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* fix multimodal readme and config (#195)

* nl2sql refactoring (#194)

* change insert to be sync

* add nl2sql

* nl2sql setting

* nl2sql setting

* fix test bug

* fix bugs

* data analysis retriever and synthesizer

* fix tests bugs

* add data_analysis ui

* update poetry.lock

* remove unnecessary comment

* add fault tolerance if no file provided

* add minor fault tolerance

* add upload_datasheet

* nl2sql refactor and add db ui

* restore retriever & synthesizer

* update poetry.lock

* Fix list merge

* bug fix

* add default display

---------

Co-authored-by: 陆逊 <luxun.fy@alibaba-inc.com>

* Personal/xi/nl2sql UI (#196)

* change insert to be sync

* add nl2sql

* nl2sql setting

* nl2sql setting

* fix test bug

* fix bugs

* data analysis retriever and synthesizer

* fix tests bugs

* add data_analysis ui

* update poetry.lock

* remove unnecessary comment

* add fault tolerance if no file provided

* add minor fault tolerance

* add upload_datasheet

* nl2sql refactor and add db ui

* restore retriever & synthesizer

* update poetry.lock

* Fix list merge

* bug fix

* add default display

* data_analysis ui update

---------

Co-authored-by: 陆逊 <luxun.fy@alibaba-inc.com>

* Personal/ranxia/change max new tokens (#199)

* set multimodal llm max_new_tokens

* set multimodal llm max_new_tokens

* Add trace (#197)

* Add trace

* Fix bug

* Push to hangzhou region by default

* 修复tables和descriptions默认配置bug (#198)

* change insert to be sync

* add nl2sql

* nl2sql setting

* nl2sql setting

* fix test bug

* fix bugs

* data analysis retriever and synthesizer

* fix tests bugs

* add data_analysis ui

* update poetry.lock

* remove unnecessary comment

* add fault tolerance if no file provided

* add minor fault tolerance

* add upload_datasheet

* nl2sql refactor and add db ui

* restore retriever & synthesizer

* update poetry.lock

* Fix list merge

* bug fix

* add default display

* data_analysis ui update

* fix table & description & query_output bugs

* fix inconsistency between frontend and backend data structures

---------

Co-authored-by: 陆逊 <luxun.fy@alibaba-inc.com>

* Fix nginx routing (#200)

* Fix nginx routing (#202)

* Fix nginx routing

* Fix nginx config

* add data_analysis doc (#201)

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* Resolve conflict

* Fix session_id bug (#204)

* Fix session_id bug (#205) (#206)

* Replace PaiEas LLM with LLI-integration and upgrade python to 3.11 (#148)

* Replace PaiEas LLM with LLI-integration and upgrade python version to 3.11

* Replace MyFCDashScope with OpenAILike class

* Fix pyproject dependency

* bug fix (#149)

* Support postgresql load user dict (#150)

* make format

* Allow not install extension pg_jieba

* table name data_default

* Convert raptor processor to TransformComponent (#151)

* udpate raptor using transform

* modify raptor with transform

* modify raptor and dataloader

---------



* Add clip model (#130)

* Update

* Add clip model

* Fix oss cache

* Fix cache

* Pdf reader upload image

* Add multimodal

* Update config

* Use two embedding

* Add text_image node

* Add tests

* Fix tests

* fix multi_modal_vector

---------



* Fix docker base image (#152)

* change insert to be sync (#153)

* Personal/ranxia/fix image readme (#155)

* fix multi_modal and readme

* fix multi_modal and readme

* fix multi_modal and readme

* fix multi_modal image (#156)

* Support Agentic RAG with intent and functioncalling (#154)

* Add intent detection module

* Remove LlmQuery class

* Support API

* Refactor agent module and format toml

* Refactor module tool

* Refactor query api

* Add demo and UI

* remove

* Fix reviews

* Add test for intent and api

* Add web search (#161)

* Add web search

* Fix lint

* Fix bug

* Update timeout

* Fix bug

* Fix jieba bug (#163)

* Support PAI-EAS MultiModal LLM (#168)

* Support minicpm

* Fix issue

* Bugfix: PaiEas LLM endpoint & max_tokens (#171)

* Fix dashscope interface (#172)

* Fix dashscope llm

* Fix bug

* Fix test bug (#174)

* add minerU (#160)

* add minerU

* add minerU

* add minerU

* Fix nodes id and simi_topK

* remove image url from text

* remove image url from text

* remove image url from text

* Support FAQ query w/o image (#162)

* Support FAQ query w/o image

* Using LLM when query w/o images

* Personal/ranxia/mineru enhancement (#164)

* remove repeat nodes

* show multiple pictures in media

* show multiple pictures in media

* Install miner with poetry (#165)

* fix retriever

* Support OSS Data Loader (#166)

* Support oss data loader

* Skip file which has been uploaded

* Support oss prefix via api

* 1. change image size (#167)

2. limit image number
3. fix retriever answer ui format

* adjust image score (#169)

* merge feature

* merge feature

* merge feature

* merge feature

* Fix bug (#173)

* Support chunk text-overflow display (#170)

* Fix bugs

* Support text-overflow

* Support text-overflow

* Support load MinerU config file automatically (#175)

* Support load MinerU config file automatically

* Modify

* Direct writing the config rather than copying

* Fix multi_modal build docker (#176)

* fix load_config (#177)

* change  multimodal prompt (#178)

* Test Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix test bug (#174) (#179)



* Fix Dockerfile (#180)

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix Dockerfile

* Fix docker env (#181)

* Fix Dockerfile

* Fix bugs

* Fix docker env

* Fix docker env

* Fix docker env (#183)

* Fix Dockerfile

* Fix bugs

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Bugfix

* Bugfix for EAS (#184)

* Fix Dockerfile

* Fix bugs

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Fix docker env

* Bugfix

* Bugfix

* Fix detectron link (#182)

* Update detectron dependency (#185)

* Update dependency

* udpate poetry lock

* fix multimodal_config and prompt (#186)

* fix MinerU readme (#189)

* Add timeout and more logs (#188)

* Personal/ranxia/fix miner u readme (#190)

* fix MinerU readme

* fix MinerU readme

* Personal/ranxia/fix miner u readme (#191)

* fix MinerU readme

* fix MinerU readme

* fix MinerU config

* fix MinerU bug (#192)

* Personal/ranxia/fix test and review bug (#193)

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

* fix MinerU bug

---------




* fix multimodal readme and config (#195)

* nl2sql refactoring (#194)

* change insert to be sync

* add nl2sql

* nl2sql setting

* nl2sql setting

* fix test bug

* fix bugs

* data analysis retriever and synthesizer

* fix tests bugs

* add data_analysis ui

* update poetry.lock

* remove unnecessary comment

* add fault tolerance if no file provided

* add minor fault tolerance

* add upload_datasheet

* nl2sql refactor and add db ui

* restore retriever & synthesizer

* update poetry.lock

* Fix list merge

* bug fix

* add default display

---------



* Personal/xi/nl2sql UI (#196)

* change insert to be sync

* add nl2sql

* nl2sql setting

* nl2sql setting

* fix test bug

* fix bugs

* data analysis retriever and synthesizer

* fix tests bugs

* add data_analysis ui

* update poetry.lock

* remove unnecessary comment

* add fault tolerance if no file provided

* add minor fault tolerance

* add upload_datasheet

* nl2sql refactor and add db ui

* restore retriever & synthesizer

* update poetry.lock

* Fix list merge

* bug fix

* add default display

* data_analysis ui update

---------



* Personal/ranxia/change max new tokens (#199)

* set multimodal llm max_new_tokens

* set multimodal llm max_new_tokens

* Add trace (#197)

* Add trace

* Fix bug

* Push to hangzhou region by default

* 修复tables和descriptions默认配置bug (#198)

* change insert to be sync

* add nl2sql

* nl2sql setting

* nl2sql setting

* fix test bug

* fix bugs

* data analysis retriever and synthesizer

* fix tests bugs

* add data_analysis ui

* update poetry.lock

* remove unnecessary comment

* add fault tolerance if no file provided

* add minor fault tolerance

* add upload_datasheet

* nl2sql refactor and add db ui

* restore retriever & synthesizer

* update poetry.lock

* Fix list merge

* bug fix

* add default display

* data_analysis ui update

* fix table & description & query_output bugs

* fix inconsistency between frontend and backend data structures

---------



* Fix nginx routing (#200)

* Fix nginx routing (#202)

* Fix nginx routing

* Fix nginx config

* add data_analysis doc (#201)



* Resolve conflict

* Fix session_id bug (#204)

---------

Co-authored-by: wwxxzz <zxw320697@alibaba-inc.com>
Co-authored-by: aero-xi <chuyu.cx@alibaba-inc.com>
Co-authored-by: zt2645802240 <47960912+zt2645802240@users.noreply.github.com>
Co-authored-by: 燃夏 <chenanyu.cay@alibaba-inc.com>

* add multi headings (#207)

* add multi headings

* add multi headings

* add multi headings

* add multi headings

* Support MLLM & OSS Configuration on WebUI (#208)

* Add UI for mllm & oss configuration

* Support OSS config

* Support Oss cfg

* Support Oss cfg

* fix default trace handler (#209)

* Fixbug: UI error if oss cfg is none (#210)

* Fixbug: when oss cfg is none

* Fixbug

* db分析增加reference (#212)

* add db ref

* add reference

* fix no headings case (#211)

* fix no headings case

* fix no headings case

* Fix elastic search threading unsafe bug (#215)

* Fix bug

* Remote duplicate properties

* Add eval tab (#214)

* Support ImageVectorStore for other DBs (#213)

* Support image store for hologres and milvus

* Support ES

* Support Opensearch

* Support ImageStore for OS/PG/HOLO/Milvus

* Add schema config for os

* Fix spell bug

* Fix mllm in cfg

* Fix cfg

* Add pai settings cfg

* Add pai settings cfg

* Fix bug

* Fix index bug

* Fix poetry toml and image name

---------

Co-authored-by: 陆逊 <luxun.fy@alibaba-inc.com>

* Release hotfix image (#218)

* 增加data analysis prompt透出 (#216)

* add db ref

* add reference

* add nl2sql prompt

* use one button

* Update create table with cache

* add fault tolerance to custom prompt

---------

Co-authored-by: 陆逊 <luxun.fy@alibaba-inc.com>

* Fix release yaml

* refine nl2sql tiny (#219)

* Fix llm display bug (#220)

* Update ui&reference (#221)

* update ui button and reference

* update refernece display & table match logic

* delete button

* Fix postprocessor bug: callback_manager (#223)

* Fix error for downloading oss model_info file (#227)

* Remove Eval Tab (#228)

* Remove Eval Tab

* Fix duckduckgo-search version

* Personal/ranxia/query transform (#224)

* query_transform

* query_transform

* query_transform & fix load_data

* query_transform & fix load_data

---------

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* update data_analysis doc (#230)

* fix query_transform (#231)

* Refactor for multimodal (#232)

* Refactor

* Fix mm embedding

* Fix node id bug

* Fix retriever

* Add faiss debug

* Fix reranker

* Fix tests

* Fix oss upload

* Update

* Fix milvus weights

* Fix opensearch multithreading

* Fix UI

* Add multimodal prompt template (#233)

* Add multimodal template

* Fix load data (#235)

* Support unique node_id for postgresql (#236)

* Support more advanced embed models (#237)

* Support more advanced embed models

* Support more advanced embed models

* Add model introduction link in huggingface

* Md parser (#238)

* pdf_reader & md_parser

* pdf_reader & md_parser

* pdf_reader & md_parser

* pdf_reader & md_parser

* pdf_reader & md_parser

* pdf_reader & md_parser

* pdf_reader & md_parser

* Fix chinese character bug in streaming json & fix pickle load bug for bm25 index (#239)

* Fix bug

* Fix linter

* Fix dup file error (#241)

* Fix dup file error

* Fix max_score bug

* Fix lint

* Remove empty flag

* fix image display format (#240)

* fix image display format

* fix md parser and image display format

* fix image node hash

* fix upload dir

* Add llm module (#242)

* Add llm module

* Fix bug

* Fix test

* Fix test failures (#244)

* Fix test failures

* Fix llm connection error

* Fix asyncio error

* Fix eventloop error

* Improve api call stability

* fix llm settings (#243)

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* Refactor evaluation: remove module & support qca generator (#245)

* Modfy Eval Pipeline

* Add rag QCA dataset generator

* Add predicted qca dataset generator

* Remove evaluetion module

* Fix

---------

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* fix image retriever (#246)

* fix image retriever

* fix image retriever

---------

Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com>

* Remove model_name, use model. (#248)

* Refactor & add index management (#249)

* Refactor

* Delete modules

* Fix tests

* Address comments

* Fix trace old config name

* Fix config

* Update

* Update cnclip

* Add index readme (#251)

* Add readme

* Update image size

* Fix update index (#252)

* Fix update index

* Fix threshold bug

* docx_reader (#250)

* docx_reader

* docx_reader

* docx_reader

* docx_reader

* docx_reader

* Personal/xi/nl2sql op1 (#254)

* add syn prompt

* add data_sample

* update parse by lstrip

* update test

* Support multi evaluators and experiments pipeline (#247)

* Add evaluator and metrics

* Add evaluator and metrics

* Add eval experiment pipeline

* Modify entry file

* Modify entry file

* Modify result file

* Fix

* Refactor evaluation

* Fix int value (#256)

* Personal/ranxia/html reader (#255)

* html_reader

* html_reader

* html_reader

* html_reader

* fix docx reader (#257)

* fix docx reader

* fix docx reader

* fix docx reader

* Support auto evaluation for multi-modal (#258)

* Support MM: text and image eval

* Support MultiModal Eval

* Update agent module (#259)

* Update agent

* Update application

* Address comment

* Resolve trace app name from environment variable (#253)

* Add trace namespace

* Fix app name

* Use arms python trace

* remove setup tracing

* Update arms startup comamnd

* Add instrument

* Update main

* Update aliyun-bootstrap

* Fix agent/trace bugs (#260)

* Fix bugs

* Fix test bug

* Fix test

* Add agent doc (#261)

* Fix bugs

* Fix test bug

* Fix test

* Update agent doc

* Update doc

* address comment

* Fix tools in pyproject.toml (#262)

* Fix agent bugs (#264)

* Fix bugs

* Fix test bug

* Fix test

* Update agent doc

* Update doc

* address comment

* Fix agent

* Address comment

* Remove duplicate logger from PaiPDFReader (#263)

* Remove duplicate logger from PaiPDFReader

* Add ModelScopeDownloader

* Use logger from loguru not logging

* Remove ak sk info

* Add predicted_node_score for eval data (#265)

* Add predicted_node_score for eval data

* Update description

* Personal/ranxia/pptx reader (#266)

* pai ppt reader

* pai ppt reader & fix oss cache

* fix poetry

* pptx reader

* Update docker.yml (#267)

* Update docker.yml

* Update llm api key (#268)

* fix excel reader (#271)

* Fix frontend bug (#270)

* Fix ui bug

* Fix arms package link error

* Fix bug

* Fix exclude key

* Remove row_number

* Fix template error

* Update trace instrument command (#272)

* Fix ui bug

* Fix arms package link error

* Fix bug

* Fix exclude key

* Remove row_number

* Fix template error

* Fix dockerfile

* Clean docker build cache (#273)

* Clear docker build cache

* Update settings

* Fix v1 api bug (#274)

* fix da display

* Fix lint

---------

Co-authored-by: wwxxzz <zxw320697@alibaba-inc.com>
Co-authored-by: aero-xi <chuyu.cx@alibaba-inc.com>
Co-authored-by: zt2645802240 <47960912+zt2645802240@users.noreply.github.com>
Co-authored-by: 燃夏 <chenanyu.cay@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants