Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trainer support simultaneously parse JSON files and cmd arguments. #7768

Merged
merged 10 commits into from
Jan 10, 2024

Conversation

greycooker
Copy link
Contributor

@greycooker greycooker commented Jan 3, 2024

PR types
New features

PR changes
LLM

Description
Add functionality to the Trainer to simultaneously parse JSON files and command line arguments.

Copy link

paddle-bot bot commented Jan 3, 2024

Thanks for your contribution!

Copy link

codecov bot commented Jan 3, 2024

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (97f6158) 57.25% compared to head (9f8e8d5) 57.22%.
Report is 8 commits behind head on develop.

Files Patch % Lines
paddlenlp/trainer/argparser.py 88.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #7768      +/-   ##
===========================================
- Coverage    57.25%   57.22%   -0.04%     
===========================================
  Files          585      587       +2     
  Lines        87977    88215     +238     
===========================================
+ Hits         50372    50477     +105     
- Misses       37605    37738     +133     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@greycooker greycooker reopened this Jan 3, 2024
@gongel
Copy link
Member

gongel commented Jan 4, 2024

需要加到CI脚本中去

@greycooker
Copy link
Contributor Author

greycooker commented Jan 4, 2024

需要加到CI脚本中去

单测文件路径是tests/trainer/test_argparser.py

@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Jan 5, 2024
@PaddlePaddle PaddlePaddle unlocked this conversation Jan 5, 2024
llm/run_pretrain.py Outdated Show resolved Hide resolved
@greycooker greycooker changed the title 增加llama预训练和微调同时解析json文件和命令行参数的功能 增加Traine同时解析json文件和命令行参数的功能 Jan 8, 2024
@greycooker greycooker changed the title 增加Traine同时解析json文件和命令行参数的功能 增加Trainer同时解析json文件和命令行参数的功能 Jan 8, 2024
@greycooker greycooker changed the title 增加Trainer同时解析json文件和命令行参数的功能 Add functionality to the Trainer to simultaneously parse JSON files and command line arguments. Jan 8, 2024
llm/finetune_generation.py Outdated Show resolved Hide resolved
Comment on lines 273 to 280
output_dir_arg = next(
(arg for arg in sys.argv if arg == "--output_dir" or arg.startswith("--output_dir=")), None
)
if output_dir_arg is None:
if "output_dir" in json_args.keys():
sys.argv.extend(["--output_dir", json_args["output_dir"]])
else:
raise ValueError("The following arguments are required: --output_dir")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么这里要 特判 output_dir ?

Copy link
Contributor Author

@greycooker greycooker Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果不特判output_dir,出现json文件里有output_dir参数,但是命令行里没有的情况,执行281行vars(self.parse_args())的时候就会报错,但是我们现在不希望让它报错,所以进行了output_dir的特判

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/PaddlePaddle/PaddleNLP/blob/d89c01130a7f27c39d762cefb15926c4c69aa711/paddlenlp/trainer/argparser.py#L177C9-L177C36

你参考一下这个函数,这个函数也是一样的支持本地文件。
看看这个是怎么处理的。

这个作为通用的parser,在这里做output_dir之类的特判是不太合理的。

JunnYu
JunnYu previously approved these changes Jan 9, 2024
Copy link
Member

@JunnYu JunnYu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@JunnYu JunnYu requested a review from ZHUI January 9, 2024 02:22
@greycooker greycooker changed the title Add functionality to the Trainer to simultaneously parse JSON files and command line arguments. Trainer support simultaneously parse JSON files and cmd arguments. Jan 9, 2024
Copy link
Collaborator

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gongel gongel merged commit 5a32534 into PaddlePaddle:develop Jan 10, 2024
10 checks passed
JunnYu pushed a commit that referenced this pull request Jan 22, 2024
…7768)

* add parse_json_file_and_cmd_lines

* change unit test file path

* Change the way the JSON file is determined

* Merge parameter parsing judgment branches and add comments.

* remove the special handling of output_dir

* Add remaining_args warning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants