Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Optimize unified checkpoint save/load speed. #8204

Merged
merged 12 commits into from
May 8, 2024

Conversation

ZHUI
Copy link
Collaborator

@ZHUI ZHUI commented Mar 28, 2024

PR types

PR changes

Description

For facebook/llama-7b

UC/PD Save/Load 对比 UC save PD save
testDP8 0 0
testPP4DP2 47.13 68.36
testPP4Sharding2 25.93 41.04
testPP8 27.04 39.68
testSharding2S1DP4 93.19 145.13
testSharding2S2DP4 90.62 147.15
testSharding4S1DP2 47.6 94.75
testSharding4S2DP2 44.12 95.68
testSharding8S1 25.94 64.37
testSharding8S2 23.95 65.77
testTP2PP4 27.92 35.71
testTP2Sharding4 27.78 46.31
testTP4DP2 53.74 51.98
testTP4PP2 27.32 30.55
testTP4Sharding2 28.35 31.31
testTP8 30.63 23.75

image

image

Copy link

paddle-bot bot commented Mar 28, 2024

Thanks for your contribution!

@CLAassistant
Copy link

CLAassistant commented Mar 28, 2024

CLA assistant check
All committers have signed the CLA.

Copy link

codecov bot commented Mar 28, 2024

Codecov Report

Attention: Patch coverage is 73.49398% with 66 lines in your changes are missing coverage. Please review.

Project coverage is 55.42%. Comparing base (f29a7b9) to head (32132ae).
Report is 23 commits behind head on develop.

Files Patch % Lines
paddlenlp/utils/safetensors.py 88.26% 23 Missing ⚠️
paddlenlp/trainer/plugins/unified_checkpoint.py 6.25% 15 Missing ⚠️
paddlenlp/transformers/model_utils.py 37.50% 15 Missing ⚠️
paddlenlp/transformers/conversion_utils.py 0.00% 12 Missing ⚠️
paddlenlp/trainer/trainer.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8204      +/-   ##
===========================================
+ Coverage    55.37%   55.42%   +0.04%     
===========================================
  Files          613      615       +2     
  Lines        95855    96235     +380     
===========================================
+ Hits         53083    53335     +252     
- Misses       42772    42900     +128     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ZHUI ZHUI force-pushed the uc/speed_check branch from 94b78d7 to 3abf709 Compare April 1, 2024 09:48
@ZHUI ZHUI changed the title Uc/speed check [Performance] Optimize unified checkpoint save/load speed. Apr 1, 2024
@ZHUI ZHUI requested review from DesmonDay and DrownFish19 April 1, 2024 13:04
Copy link
Collaborator

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZHUI ZHUI closed this May 8, 2024
@ZHUI ZHUI reopened this May 8, 2024
@wawltor wawltor merged commit d6ac1bd into PaddlePaddle:develop May 8, 2024
8 of 13 checks passed
@ZHUI ZHUI deleted the uc/speed_check branch May 9, 2024 03:21
ZHUI added a commit to ZHUI/PaddleNLP that referenced this pull request May 17, 2024
…dle#8204)

* opt unified checkpoint save/load speed.

* fix bug.

* add fast safe open API.

* mix file open and mmap.

* fix

* add test for read fast read tensors.

* fix

* fix tests.

* remove profile log.

* fix

* fix ci
ZHUI added a commit that referenced this pull request May 20, 2024
* [Performance] Optimize unified checkpoint save/load speed. (#8204)

* opt unified checkpoint save/load speed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants