Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon 7th No.19】为 Paddle 新增 load_state_dict_from_url API v1 #958

Merged
merged 4 commits into from
Sep 19, 2024

Conversation

zty-king
Copy link
Contributor

  • [Add] load_state_dict_from_url

  • [Update] doc

  • [Update] rfc

Copy link

paddle-bot bot commented Sep 16, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请检查PR提交格式和内容是否完备,具体请参考示例模版
Your PR has been submitted. Thanks for your contribution!
Please check its format and content. For this, you can refer to Template and Demo.

### 添加 Python API:

```
Paddle.hub.load_state_dict_from_url(url, model_dir=None, map_location=None, progress=True, check_hash=False, file_name=None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weights_only的参数为什么没有对齐呢?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改,请查收:

1726670013817

1726670027355

1726670021875


```

- 函数**_is_legacy_zip_format**判断是否为ZIP文件的函数:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image 全文预览里有多处加粗地方,没有显示正确。全文再扫一遍。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改,请查收


# 六、测试和验收的考量

测试考虑的case如下:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的测试考量部分,不能copy原来的模版,因为不是一个算子类API,需要自己写下测哪些部分。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改,请查收:
image

HASH_REGEX = re.compile(r'-([a-f0-9]*)\.')
```

- 函数**download_url_to_file**根据url下载文件到本地,`Paddle/PaddleMIX/paddlemix/datacopilot/misc/_download.py`有download_url_to_file函数的实现方法,直接使用或集成到hub.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paddle/PaddleMIX/paddlemix/datacopilot/misc/_download.py有download_url_to_file函数的实现方法,直接使用或集成到hub.py

可以看下Paddle Repo里有类似的实现方法么,比如python/paddle/utils/download.py 文件?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

前面几个都改好了,请您检查,最后这个我还没有改,我看了一下,可以用:from paddle.utils.download import _download
调用_download函数做文件下载,但是有一个参数progress被剔除了,即不会在标准错误输出中显示进度条,原来这个参数是可选的True或者None,如果这个进度条功能不考虑的话,可以直接用python/paddle/utils/download.py中的函数

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

但是有一个参数progress被剔除了,即不会在标准错误输出中显示进度条

这个参数能否在_download函数里面加上?

@CLAassistant
Copy link

CLAassistant commented Sep 18, 2024

CLA assistant check
All committers have signed the CLA.

2. **下载模型文件**:函数从指定的 URL 下载模型文件,并在下载过程中显示进度条(如果设置了)。
3. **文件哈希验证**:函数支持对下载的文件进行哈希验证,以确保文件的完整性和唯一性。
4. **解压支持**:如果下载的文件是一个 zip 文件,函数会自动解压。
5. **加载模型**:函数会将下载的模型文件加载到 Paddle中,并处理旧格式的文件(如 zip 文件)。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新加了weight_only参数,目标部分是否要再加?

HASH_REGEX = re.compile(r'-([a-f0-9]*)\.')
```

- 函数**download_url_to_file**根据url下载文件到本地,`Paddle/PaddleMIX/paddlemix/datacopilot/misc/_download.py`有download_url_to_file函数的实现方法,直接使用或集成到hub.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

但是有一个参数progress被剔除了,即不会在标准错误输出中显示进度条

这个参数能否在_download函数里面加上?


2.用Paddle.hub.load_state_dict_from_url()加载url,下载压缩的模型权重,即ZIP格式文件;同时手动下载对应url的多个模型权重ZIP文件,并手动解压,用paddle.hub.load()加载文件,进行结果对齐;

3.用Paddle.hub.load_state_dict_from_url()加载已经下载的模型权重文件;同时用paddle.hub.load()加载对应的模型权重文件,进行结果对齐;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image 这里显示格式乱了

Copy link
Contributor Author

@zty-king zty-king Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个昨天已经改好了,您看一下最新版
image

Copy link
Contributor Author

@zty-king zty-king Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个参数能否在_download函数里面加上? @luotao1

_download()函数调用了_get_download()函数用来做实际的get请求并下载文件,在_get_download()函数中已经有进度条显示的逻辑了,但是它不是作为一个参数,让用户选择是否显示进度条,而是如果能获取文件大小,会直接显示进度条,是否要修改原来这个函数的逻辑,修改的话,可以加上进度条的参数

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,先维持原状

@luotao1 luotao1 merged commit 86b9bcc into PaddlePaddle:master Sep 19, 2024
1 check passed
@zty-king
Copy link
Contributor Author

zty-king commented Sep 26, 2024

@luotao1
您好请问一下,是否有一些官方的网站,保存了一些模型权重文件的如:以.pdparams为后缀的模型权重文件,我想测试一下我写的API

@luotao1
Copy link
Collaborator

luotao1 commented Sep 27, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants