-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename max_length parameter to max_model_len to be in sync with vLLM #3827
Conversation
@@ -126,6 +120,14 @@ def list_of_strings(arg): | |||
choices=dtype_choices, | |||
help=f"data type to load the weights in. One of {dtype_choices}. Defaults to float16 for GPU and float32 for CPU systems", | |||
) | |||
# vLLM uses max-model-len as paramter to denote max tokens. Register the same for HuggingFace (if vllm not available) | |||
parser.add_argument( | |||
"--max-model-len", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the other args are using _
(underscore). We should be consistent with the naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vLLM uses -
hyphenated separation. Using the same to ensure we can make use of only single parameter name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't be a breaking change for current users ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this would be a breaking change for existing users. Should we allow both for time being and warn that max_length
would be deprecated in favour of max-model-len
while giving the latter precedence?
I'm curious if they're internally handling the setting of parameters with diff name for diff backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuzisun What's your thought on this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuzisun thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be possible:
arg_parser.add_argument('--example-one', '--example_one')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I just added max_length
back which internally sets args.max_model_len
.
Let me know if this works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuzisun @sivanantha321 @spolti please take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we support both args.max_model_len
and args.max-model-len
and deprecate max_length
?
Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>
Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>
662a64b
to
1f3dc7b
Compare
/rerun-all |
@@ -126,6 +127,14 @@ def list_of_strings(arg): | |||
choices=dtype_choices, | |||
help=f"data type to load the weights in. One of {dtype_choices}. Defaults to float16 for GPU and float32 for CPU systems", | |||
) | |||
# vLLM uses max-model-len as paramter to denote max tokens. Register the same for HuggingFace (if vllm not available) | |||
parser.add_argument( | |||
"--max-model-len", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"--max-model-len", | |
"--max-model-len", | |
"--max_model_len", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're just trying to have the same argument as vLLM. Hence the hyphen -
instead of underscore _
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the problem with adding --max_model_len
here is, this would not work if vLLM exists.
If we want to support --max_model_len
, we'd need to add it separately. This ends up creating 3 CLI flags for one variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought --max-model-len
is parsed to max_model_len
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah --max-model-len is parsed to args.max_model_len
Just that if we want to support it, it should be added outside of if _vllm:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parser.add_argument('--foo-bar')
parser.add_argument('--foo_bar')
args = parser.parse_args(['--foo-bar', '24', '--foo_bar', '30'])
Namespace(foo_bar='30')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if vLLM exists, we import that parser and hence it won't let us add --max-model-len
again
so the final code would end up looking like
parser.add_argument('--max_length', dest='max_model_len')
parser.add_argument('--max_model_len')
if not vllm_available():
parser.add_argument('--max-model-len')
All three for essentially the same variable. Do we want this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vllm actually implements a flexible arg parser so it can automatically convert underscore to dashes
https://github.com/vllm-project/vllm/blob/80162c44b1d1e59a2c10f65b6adb9b0407439b1f/vllm/utils.py#L1088
@@ -71,9 +71,10 @@ def list_of_strings(arg): | |||
) | |||
parser.add_argument( | |||
"--max_length", | |||
dest="max_model_len", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dest="max_model_len", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to make sure we're writing to same argument and hence avoid doing something like
final_max_len = args.max_length or args.max_model_len
type=int, | ||
required=False, | ||
help="max sequence length for the tokenizer", | ||
help="max sequence length for the tokenizer. will be deprecated in favour of --max-model-length", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
help="max sequence length for the tokenizer. will be deprecated in favour of --max-model-length", | |
help="max sequence length for the tokenizer. will be deprecated in favour of --max-model-len", |
Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>
Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>
Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
What this PR does / why we need it:
vLLM uses parameter called
max-model-len
to denote the max length allowed for tokenisation and processing. We have similar parameter for HF case but under a diff name calledmax_length
. Rename that to use the same parameter for both backends for consistency.Now we're adding
--max_model_len
which would essentially write toargs.max_model_len
. We'll slowly move away from--max_length
.Checklist: