-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IA3 Modules for Phi #1407
Add IA3 Modules for Phi #1407
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot ! 🚀
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @arnavgarg1, as per the T-few/IA3 paper, only fc2 feedforward layer would be considered. Therefore, left suggestions accordingly. Thank you for adding the target modules when using IA3 for Phi models!
src/peft/utils/constants.py
Outdated
@@ -99,6 +99,7 @@ def starcoder_model_postprocess_past_key_value(past_key_values): | |||
"RefinedWebModel": ["query_key_value", "dense_4h_to_h"], | |||
"RefinedWeb": ["query_key_value", "dense_4h_to_h"], | |||
"falcon": ["query_key_value", "dense_4h_to_h"], | |||
"phi": ["q_proj", "v_proj", "fc1", "fc2"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"phi": ["q_proj", "v_proj", "fc1", "fc2"], | |
"phi": ["q_proj", "v_proj", "fc2"], |
src/peft/utils/constants.py
Outdated
@@ -121,6 +122,7 @@ def starcoder_model_postprocess_past_key_value(past_key_values): | |||
"RefinedWeb": ["dense_4h_to_h"], | |||
"RefinedWebModel": ["dense_4h_to_h"], | |||
"falcon": ["dense_4h_to_h"], | |||
"phi": ["fc1", "fc2"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"phi": ["fc1", "fc2"], | |
"phi": ["fc2"], |
Hi @pacman100! Thanks for the great catch - totally agree that this should only be applied to the fc2 feedforward layer. I just updated my PR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
* Add IA3 Modules for Phi * Address comments
Here's what the model architecture looks like with my proposed target modules:
This is the total number of trainable parameters using the default rank: