Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

为 LLM 应用预处理非结构化数据 #2

Merged
merged 10 commits into from
Jun 14, 2024
Merged

Conversation

SQ-AMD
Copy link

@SQ-AMD SQ-AMD commented Jun 3, 2024

为 LLM 应用预处理非结构化数据

曾浩龙:3. 规范化内容 Normalizing the Content

为 LLM 应用预处理非结构化数据

曾浩龙:3. 规范化内容 Normalizing the Content
@6forwater29
Copy link
Owner

6forwater29 commented Jun 4, 2024

感谢!做的很好了,有些小细节注意一下:

  • 1. 缺少大标题,可以参考其他文件,如选修-Large Language Models with Semantic Search中的第三章 嵌入 Embeddings。

  • 2. images文件夹下,图片命名格式为[第几章]-[序号].png,如你的第三章第2张图片可写为3-2.png。

  • 3. 第三章的Utils.py能否改名为Utils_Ch3.py,因为我这一章也有,可能其他章也有?

  • 4. requirements.txt 里面可以加一个python-pptx

  • 5. pip install -r requirements.txt可以在开始加一下

  • 6. 可以在开始加一下环境变量如何设置,比如windows可以用cmd的 setx DLAI_API_KEY ""等

  • 7. CoT也就是pdf这里的代码太集中,能否详细解释一下?Word那里也是

image

  • 8. 类似<IPython.core.display.JSON object>这样的结果有很多,我的vscode也是这样,我不太清楚能否显示?还是说只能在jupyter notebook里面显示?如果改不了的话就算了,向你请教一下!

  • 9. 再仔细检查一下格式是否符合pdf标准,比如3. 规范化内容 Normalizing the Content.ipynb的3.后面多了个空格

感谢!

逸涵

@6forwater29
Copy link
Owner

我才发现有python-pptx。然后就是
image
这个地方我报错了,你可以运行吗?

@6forwater29 6forwater29 merged commit 3c60dad into 6forwater29:main Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants