GitHub

本项目实践审计场景下合同关键信息批量提取,并作适当修改

修改内容如下：

原始项目至只能对图片进行分析，新增对pdf文件进行分析
原始项目只能对1张图片分析，新增对多张图片分析
新增扫描一个文件夹下所有pdf文件的功能
新增将识别结果保存到csv中的过程
设计界面
打包发布

助手使用

初始化AI： 设定需要提取的关键字信息，并初始化AI
选择PDF目录： 选择待分析的PDF文件夹地址
选择结果保存目录： 选择结果保存位置，包括中间结果及汇总结果output.csv
是否保存文本提取结果： 整个过程的第一步，将每页PDF转换为文本
是否过滤AI结果： 由于PDF文档的原因，AI输出有重复，勾选这个对这些值进行过滤

结果展示

项目打包过程

使用Tkinter布局助手设计UI
实现前后台逻辑功能
使用nuitka打包项目

nuitka打包命令

python -m nuitka --onefile --mingw64 --show-memory --show-progress --show-modules --enable-plugin=pylint-warnings --windows-company-name=EHOLLY --enable-plugin=tk-inter  --nofollow-import-to=paddle --nofollow-import-to=paddleocr --nofollow-import-to=paddlenlp --windows-product-name=GetPDFKeyInformation --windows-file-version=0.0.1 --windows-product-version=0.0.1  --windows-file-description="PDF关键信息提取工具" main.py

更新日志 v0.0.1 实现基本功能

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
doc		doc
poppler-24.02.0/Library/bin		poppler-24.02.0/Library/bin
test_img		test_img
test_pdf		test_pdf
.gitignore		.gitignore
README.md		README.md
control.py		control.py
demo.ipynb		demo.ipynb
get_keyimformation_paddle.py		get_keyimformation_paddle.py
main.py		main.py
my_read_code_tools.py		my_read_code_tools.py
simfang.ttf		simfang.ttf
test.ipynb		test.ipynb
ui.py		ui.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

WuShaogui/GetKeyMessageByPaddle

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages