-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docx_reader #250
Merged
Merged
docx_reader #250
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
☂️ Python Coverage
Overall Coverage
New Files
Modified Files
|
wwxxzz
reviewed
Oct 21, 2024
moria97
reviewed
Oct 21, 2024
moria97
reviewed
Oct 21, 2024
moria97
approved these changes
Oct 22, 2024
moria97
added a commit
that referenced
this pull request
Nov 22, 2024
* Replace PaiEas LLM with LLI-integration and upgrade python to 3.11 (#148) * Replace PaiEas LLM with LLI-integration and upgrade python version to 3.11 * Replace MyFCDashScope with OpenAILike class * Fix pyproject dependency * bug fix (#149) * Support postgresql load user dict (#150) * make format * Allow not install extension pg_jieba * table name data_default * Convert raptor processor to TransformComponent (#151) * udpate raptor using transform * modify raptor with transform * modify raptor and dataloader --------- Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * Add clip model (#130) * Update * Add clip model * Fix oss cache * Fix cache * Pdf reader upload image * Add multimodal * Update config * Use two embedding * Add text_image node * Add tests * Fix tests * fix multi_modal_vector --------- Co-authored-by: 燃夏 <chenanyu.cay@alibaba-inc.com> * Fix docker base image (#152) * change insert to be sync (#153) * Personal/ranxia/fix image readme (#155) * fix multi_modal and readme * fix multi_modal and readme * fix multi_modal and readme * fix multi_modal image (#156) * Support Agentic RAG with intent and functioncalling (#154) * Add intent detection module * Remove LlmQuery class * Support API * Refactor agent module and format toml * Refactor module tool * Refactor query api * Add demo and UI * remove * Fix reviews * Add test for intent and api * Add web search (#161) * Add web search * Fix lint * Fix bug * Update timeout * Fix bug * Fix jieba bug (#163) * Support PAI-EAS MultiModal LLM (#168) * Support minicpm * Fix issue * Bugfix: PaiEas LLM endpoint & max_tokens (#171) * Fix dashscope interface (#172) * Fix dashscope llm * Fix bug * Fix test bug (#174) * add minerU (#160) * add minerU * add minerU * add minerU * Fix nodes id and simi_topK * remove image url from text * remove image url from text * remove image url from text * Support FAQ query w/o image (#162) * Support FAQ query w/o image * Using LLM when query w/o images * Personal/ranxia/mineru enhancement (#164) * remove repeat nodes * show multiple pictures in media * show multiple pictures in media * Install miner with poetry (#165) * fix retriever * Support OSS Data Loader (#166) * Support oss data loader * Skip file which has been uploaded * Support oss prefix via api * 1. change image size (#167) 2. limit image number 3. fix retriever answer ui format * adjust image score (#169) * merge feature * merge feature * merge feature * merge feature * Fix bug (#173) * Support chunk text-overflow display (#170) * Fix bugs * Support text-overflow * Support text-overflow * Support load MinerU config file automatically (#175) * Support load MinerU config file automatically * Modify * Direct writing the config rather than copying * Fix multi_modal build docker (#176) * fix load_config (#177) * change multimodal prompt (#178) * Test Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix test bug (#174) (#179) Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * Fix Dockerfile (#180) * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix docker env (#181) * Fix Dockerfile * Fix bugs * Fix docker env * Fix docker env * Fix docker env (#183) * Fix Dockerfile * Fix bugs * Fix docker env * Fix docker env * Fix docker env * Fix docker env * Fix docker env * Bugfix * Bugfix for EAS (#184) * Fix Dockerfile * Fix bugs * Fix docker env * Fix docker env * Fix docker env * Fix docker env * Fix docker env * Bugfix * Bugfix * Fix detectron link (#182) * Update detectron dependency (#185) * Update dependency * udpate poetry lock * fix multimodal_config and prompt (#186) * fix MinerU readme (#189) * Add timeout and more logs (#188) * Personal/ranxia/fix miner u readme (#190) * fix MinerU readme * fix MinerU readme * Personal/ranxia/fix miner u readme (#191) * fix MinerU readme * fix MinerU readme * fix MinerU config * fix MinerU bug (#192) * Personal/ranxia/fix test and review bug (#193) * fix MinerU bug * fix MinerU bug * fix MinerU bug * fix MinerU bug * fix MinerU bug * fix MinerU bug * fix MinerU bug --------- Co-authored-by: 筱文 <zxw320697@alibaba-inc.com> Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * fix multimodal readme and config (#195) * nl2sql refactoring (#194) * change insert to be sync * add nl2sql * nl2sql setting * nl2sql setting * fix test bug * fix bugs * data analysis retriever and synthesizer * fix tests bugs * add data_analysis ui * update poetry.lock * remove unnecessary comment * add fault tolerance if no file provided * add minor fault tolerance * add upload_datasheet * nl2sql refactor and add db ui * restore retriever & synthesizer * update poetry.lock * Fix list merge * bug fix * add default display --------- Co-authored-by: 陆逊 <luxun.fy@alibaba-inc.com> * Personal/xi/nl2sql UI (#196) * change insert to be sync * add nl2sql * nl2sql setting * nl2sql setting * fix test bug * fix bugs * data analysis retriever and synthesizer * fix tests bugs * add data_analysis ui * update poetry.lock * remove unnecessary comment * add fault tolerance if no file provided * add minor fault tolerance * add upload_datasheet * nl2sql refactor and add db ui * restore retriever & synthesizer * update poetry.lock * Fix list merge * bug fix * add default display * data_analysis ui update --------- Co-authored-by: 陆逊 <luxun.fy@alibaba-inc.com> * Personal/ranxia/change max new tokens (#199) * set multimodal llm max_new_tokens * set multimodal llm max_new_tokens * Add trace (#197) * Add trace * Fix bug * Push to hangzhou region by default * 修复tables和descriptions默认配置bug (#198) * change insert to be sync * add nl2sql * nl2sql setting * nl2sql setting * fix test bug * fix bugs * data analysis retriever and synthesizer * fix tests bugs * add data_analysis ui * update poetry.lock * remove unnecessary comment * add fault tolerance if no file provided * add minor fault tolerance * add upload_datasheet * nl2sql refactor and add db ui * restore retriever & synthesizer * update poetry.lock * Fix list merge * bug fix * add default display * data_analysis ui update * fix table & description & query_output bugs * fix inconsistency between frontend and backend data structures --------- Co-authored-by: 陆逊 <luxun.fy@alibaba-inc.com> * Fix nginx routing (#200) * Fix nginx routing (#202) * Fix nginx routing * Fix nginx config * add data_analysis doc (#201) Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * Resolve conflict * Fix session_id bug (#204) * Fix session_id bug (#205) (#206) * Replace PaiEas LLM with LLI-integration and upgrade python to 3.11 (#148) * Replace PaiEas LLM with LLI-integration and upgrade python version to 3.11 * Replace MyFCDashScope with OpenAILike class * Fix pyproject dependency * bug fix (#149) * Support postgresql load user dict (#150) * make format * Allow not install extension pg_jieba * table name data_default * Convert raptor processor to TransformComponent (#151) * udpate raptor using transform * modify raptor with transform * modify raptor and dataloader --------- * Add clip model (#130) * Update * Add clip model * Fix oss cache * Fix cache * Pdf reader upload image * Add multimodal * Update config * Use two embedding * Add text_image node * Add tests * Fix tests * fix multi_modal_vector --------- * Fix docker base image (#152) * change insert to be sync (#153) * Personal/ranxia/fix image readme (#155) * fix multi_modal and readme * fix multi_modal and readme * fix multi_modal and readme * fix multi_modal image (#156) * Support Agentic RAG with intent and functioncalling (#154) * Add intent detection module * Remove LlmQuery class * Support API * Refactor agent module and format toml * Refactor module tool * Refactor query api * Add demo and UI * remove * Fix reviews * Add test for intent and api * Add web search (#161) * Add web search * Fix lint * Fix bug * Update timeout * Fix bug * Fix jieba bug (#163) * Support PAI-EAS MultiModal LLM (#168) * Support minicpm * Fix issue * Bugfix: PaiEas LLM endpoint & max_tokens (#171) * Fix dashscope interface (#172) * Fix dashscope llm * Fix bug * Fix test bug (#174) * add minerU (#160) * add minerU * add minerU * add minerU * Fix nodes id and simi_topK * remove image url from text * remove image url from text * remove image url from text * Support FAQ query w/o image (#162) * Support FAQ query w/o image * Using LLM when query w/o images * Personal/ranxia/mineru enhancement (#164) * remove repeat nodes * show multiple pictures in media * show multiple pictures in media * Install miner with poetry (#165) * fix retriever * Support OSS Data Loader (#166) * Support oss data loader * Skip file which has been uploaded * Support oss prefix via api * 1. change image size (#167) 2. limit image number 3. fix retriever answer ui format * adjust image score (#169) * merge feature * merge feature * merge feature * merge feature * Fix bug (#173) * Support chunk text-overflow display (#170) * Fix bugs * Support text-overflow * Support text-overflow * Support load MinerU config file automatically (#175) * Support load MinerU config file automatically * Modify * Direct writing the config rather than copying * Fix multi_modal build docker (#176) * fix load_config (#177) * change multimodal prompt (#178) * Test Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix test bug (#174) (#179) * Fix Dockerfile (#180) * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix Dockerfile * Fix docker env (#181) * Fix Dockerfile * Fix bugs * Fix docker env * Fix docker env * Fix docker env (#183) * Fix Dockerfile * Fix bugs * Fix docker env * Fix docker env * Fix docker env * Fix docker env * Fix docker env * Bugfix * Bugfix for EAS (#184) * Fix Dockerfile * Fix bugs * Fix docker env * Fix docker env * Fix docker env * Fix docker env * Fix docker env * Bugfix * Bugfix * Fix detectron link (#182) * Update detectron dependency (#185) * Update dependency * udpate poetry lock * fix multimodal_config and prompt (#186) * fix MinerU readme (#189) * Add timeout and more logs (#188) * Personal/ranxia/fix miner u readme (#190) * fix MinerU readme * fix MinerU readme * Personal/ranxia/fix miner u readme (#191) * fix MinerU readme * fix MinerU readme * fix MinerU config * fix MinerU bug (#192) * Personal/ranxia/fix test and review bug (#193) * fix MinerU bug * fix MinerU bug * fix MinerU bug * fix MinerU bug * fix MinerU bug * fix MinerU bug * fix MinerU bug --------- * fix multimodal readme and config (#195) * nl2sql refactoring (#194) * change insert to be sync * add nl2sql * nl2sql setting * nl2sql setting * fix test bug * fix bugs * data analysis retriever and synthesizer * fix tests bugs * add data_analysis ui * update poetry.lock * remove unnecessary comment * add fault tolerance if no file provided * add minor fault tolerance * add upload_datasheet * nl2sql refactor and add db ui * restore retriever & synthesizer * update poetry.lock * Fix list merge * bug fix * add default display --------- * Personal/xi/nl2sql UI (#196) * change insert to be sync * add nl2sql * nl2sql setting * nl2sql setting * fix test bug * fix bugs * data analysis retriever and synthesizer * fix tests bugs * add data_analysis ui * update poetry.lock * remove unnecessary comment * add fault tolerance if no file provided * add minor fault tolerance * add upload_datasheet * nl2sql refactor and add db ui * restore retriever & synthesizer * update poetry.lock * Fix list merge * bug fix * add default display * data_analysis ui update --------- * Personal/ranxia/change max new tokens (#199) * set multimodal llm max_new_tokens * set multimodal llm max_new_tokens * Add trace (#197) * Add trace * Fix bug * Push to hangzhou region by default * 修复tables和descriptions默认配置bug (#198) * change insert to be sync * add nl2sql * nl2sql setting * nl2sql setting * fix test bug * fix bugs * data analysis retriever and synthesizer * fix tests bugs * add data_analysis ui * update poetry.lock * remove unnecessary comment * add fault tolerance if no file provided * add minor fault tolerance * add upload_datasheet * nl2sql refactor and add db ui * restore retriever & synthesizer * update poetry.lock * Fix list merge * bug fix * add default display * data_analysis ui update * fix table & description & query_output bugs * fix inconsistency between frontend and backend data structures --------- * Fix nginx routing (#200) * Fix nginx routing (#202) * Fix nginx routing * Fix nginx config * add data_analysis doc (#201) * Resolve conflict * Fix session_id bug (#204) --------- Co-authored-by: wwxxzz <zxw320697@alibaba-inc.com> Co-authored-by: aero-xi <chuyu.cx@alibaba-inc.com> Co-authored-by: zt2645802240 <47960912+zt2645802240@users.noreply.github.com> Co-authored-by: 燃夏 <chenanyu.cay@alibaba-inc.com> * add multi headings (#207) * add multi headings * add multi headings * add multi headings * add multi headings * Support MLLM & OSS Configuration on WebUI (#208) * Add UI for mllm & oss configuration * Support OSS config * Support Oss cfg * Support Oss cfg * fix default trace handler (#209) * Fixbug: UI error if oss cfg is none (#210) * Fixbug: when oss cfg is none * Fixbug * db分析增加reference (#212) * add db ref * add reference * fix no headings case (#211) * fix no headings case * fix no headings case * Fix elastic search threading unsafe bug (#215) * Fix bug * Remote duplicate properties * Add eval tab (#214) * Support ImageVectorStore for other DBs (#213) * Support image store for hologres and milvus * Support ES * Support Opensearch * Support ImageStore for OS/PG/HOLO/Milvus * Add schema config for os * Fix spell bug * Fix mllm in cfg * Fix cfg * Add pai settings cfg * Add pai settings cfg * Fix bug * Fix index bug * Fix poetry toml and image name --------- Co-authored-by: 陆逊 <luxun.fy@alibaba-inc.com> * Release hotfix image (#218) * 增加data analysis prompt透出 (#216) * add db ref * add reference * add nl2sql prompt * use one button * Update create table with cache * add fault tolerance to custom prompt --------- Co-authored-by: 陆逊 <luxun.fy@alibaba-inc.com> * Fix release yaml * refine nl2sql tiny (#219) * Fix llm display bug (#220) * Update ui&reference (#221) * update ui button and reference * update refernece display & table match logic * delete button * Fix postprocessor bug: callback_manager (#223) * Fix error for downloading oss model_info file (#227) * Remove Eval Tab (#228) * Remove Eval Tab * Fix duckduckgo-search version * Personal/ranxia/query transform (#224) * query_transform * query_transform * query_transform & fix load_data * query_transform & fix load_data --------- Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * update data_analysis doc (#230) * fix query_transform (#231) * Refactor for multimodal (#232) * Refactor * Fix mm embedding * Fix node id bug * Fix retriever * Add faiss debug * Fix reranker * Fix tests * Fix oss upload * Update * Fix milvus weights * Fix opensearch multithreading * Fix UI * Add multimodal prompt template (#233) * Add multimodal template * Fix load data (#235) * Support unique node_id for postgresql (#236) * Support more advanced embed models (#237) * Support more advanced embed models * Support more advanced embed models * Add model introduction link in huggingface * Md parser (#238) * pdf_reader & md_parser * pdf_reader & md_parser * pdf_reader & md_parser * pdf_reader & md_parser * pdf_reader & md_parser * pdf_reader & md_parser * pdf_reader & md_parser * Fix chinese character bug in streaming json & fix pickle load bug for bm25 index (#239) * Fix bug * Fix linter * Fix dup file error (#241) * Fix dup file error * Fix max_score bug * Fix lint * Remove empty flag * fix image display format (#240) * fix image display format * fix md parser and image display format * fix image node hash * fix upload dir * Add llm module (#242) * Add llm module * Fix bug * Fix test * Fix test failures (#244) * Fix test failures * Fix llm connection error * Fix asyncio error * Fix eventloop error * Improve api call stability * fix llm settings (#243) Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * Refactor evaluation: remove module & support qca generator (#245) * Modfy Eval Pipeline * Add rag QCA dataset generator * Add predicted qca dataset generator * Remove evaluetion module * Fix --------- Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * fix image retriever (#246) * fix image retriever * fix image retriever --------- Co-authored-by: Yue Fei <luxun.fy@alibaba-inc.com> * Remove model_name, use model. (#248) * Refactor & add index management (#249) * Refactor * Delete modules * Fix tests * Address comments * Fix trace old config name * Fix config * Update * Update cnclip * Add index readme (#251) * Add readme * Update image size * Fix update index (#252) * Fix update index * Fix threshold bug * docx_reader (#250) * docx_reader * docx_reader * docx_reader * docx_reader * docx_reader * Personal/xi/nl2sql op1 (#254) * add syn prompt * add data_sample * update parse by lstrip * update test * Support multi evaluators and experiments pipeline (#247) * Add evaluator and metrics * Add evaluator and metrics * Add eval experiment pipeline * Modify entry file * Modify entry file * Modify result file * Fix * Refactor evaluation * Fix int value (#256) * Personal/ranxia/html reader (#255) * html_reader * html_reader * html_reader * html_reader * fix docx reader (#257) * fix docx reader * fix docx reader * fix docx reader * Support auto evaluation for multi-modal (#258) * Support MM: text and image eval * Support MultiModal Eval * Update agent module (#259) * Update agent * Update application * Address comment * Resolve trace app name from environment variable (#253) * Add trace namespace * Fix app name * Use arms python trace * remove setup tracing * Update arms startup comamnd * Add instrument * Update main * Update aliyun-bootstrap * Fix agent/trace bugs (#260) * Fix bugs * Fix test bug * Fix test * Add agent doc (#261) * Fix bugs * Fix test bug * Fix test * Update agent doc * Update doc * address comment * Fix tools in pyproject.toml (#262) * Fix agent bugs (#264) * Fix bugs * Fix test bug * Fix test * Update agent doc * Update doc * address comment * Fix agent * Address comment * Remove duplicate logger from PaiPDFReader (#263) * Remove duplicate logger from PaiPDFReader * Add ModelScopeDownloader * Use logger from loguru not logging * Remove ak sk info * Add predicted_node_score for eval data (#265) * Add predicted_node_score for eval data * Update description * Personal/ranxia/pptx reader (#266) * pai ppt reader * pai ppt reader & fix oss cache * fix poetry * pptx reader * Update docker.yml (#267) * Update docker.yml * Update llm api key (#268) * fix excel reader (#271) * Fix frontend bug (#270) * Fix ui bug * Fix arms package link error * Fix bug * Fix exclude key * Remove row_number * Fix template error * Update trace instrument command (#272) * Fix ui bug * Fix arms package link error * Fix bug * Fix exclude key * Remove row_number * Fix template error * Fix dockerfile * Clean docker build cache (#273) * Clear docker build cache * Update settings * Fix v1 api bug (#274) * fix da display * Fix lint --------- Co-authored-by: wwxxzz <zxw320697@alibaba-inc.com> Co-authored-by: aero-xi <chuyu.cx@alibaba-inc.com> Co-authored-by: zt2645802240 <47960912+zt2645802240@users.noreply.github.com> Co-authored-by: 燃夏 <chenanyu.cay@alibaba-inc.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.