Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename max_length parameter to max_model_len to be in sync with vLLM #3827

Merged
merged 5 commits into from
Aug 25, 2024

Conversation

Datta0
Copy link
Contributor

@Datta0 Datta0 commented Jul 30, 2024

What this PR does / why we need it:
vLLM uses parameter called max-model-len to denote the max length allowed for tokenisation and processing. We have similar parameter for HF case but under a diff name called max_length. Rename that to use the same parameter for both backends for consistency.
Now we're adding --max_model_len which would essentially write to args.max_model_len. We'll slowly move away from --max_length .

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?
Add `--max_model_len` to huggingface runtime and deprecate `--max_length`

@Datta0 Datta0 marked this pull request as ready for review July 30, 2024 11:10
@@ -126,6 +120,14 @@ def list_of_strings(arg):
choices=dtype_choices,
help=f"data type to load the weights in. One of {dtype_choices}. Defaults to float16 for GPU and float32 for CPU systems",
)
# vLLM uses max-model-len as paramter to denote max tokens. Register the same for HuggingFace (if vllm not available)
parser.add_argument(
"--max-model-len",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the other args are using _(underscore). We should be consistent with the naming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vLLM uses - hyphenated separation. Using the same to ensure we can make use of only single parameter name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be a breaking change for current users ?

Copy link
Contributor Author

@Datta0 Datta0 Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this would be a breaking change for existing users. Should we allow both for time being and warn that max_length would be deprecated in favour of max-model-len while giving the latter precedence?
I'm curious if they're internally handling the setting of parameters with diff name for diff backends.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuzisun What's your thought on this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuzisun thoughts on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be possible:

arg_parser.add_argument('--example-one', '--example_one')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I just added max_length back which internally sets args.max_model_len.
Let me know if this works

Copy link
Contributor Author

@Datta0 Datta0 Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuzisun @sivanantha321 @spolti please take a look

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we support both args.max_model_len and args.max-model-len and deprecate max_length ?

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>
Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>
@Datta0
Copy link
Contributor Author

Datta0 commented Aug 16, 2024

/rerun-all

@@ -126,6 +127,14 @@ def list_of_strings(arg):
choices=dtype_choices,
help=f"data type to load the weights in. One of {dtype_choices}. Defaults to float16 for GPU and float32 for CPU systems",
)
# vLLM uses max-model-len as paramter to denote max tokens. Register the same for HuggingFace (if vllm not available)
parser.add_argument(
"--max-model-len",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"--max-model-len",
"--max-model-len",
"--max_model_len",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're just trying to have the same argument as vLLM. Hence the hyphen - instead of underscore _

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the problem with adding --max_model_len here is, this would not work if vLLM exists.
If we want to support --max_model_len, we'd need to add it separately. This ends up creating 3 CLI flags for one variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought --max-model-len is parsed to max_model_len as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah --max-model-len is parsed to args.max_model_len
Just that if we want to support it, it should be added outside of if _vllm:

Copy link
Member

@yuzisun yuzisun Aug 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parser.add_argument('--foo-bar')
parser.add_argument('--foo_bar')
args = parser.parse_args(['--foo-bar', '24', '--foo_bar', '30'])
Namespace(foo_bar='30')

Copy link
Contributor Author

@Datta0 Datta0 Aug 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if vLLM exists, we import that parser and hence it won't let us add --max-model-len again
so the final code would end up looking like

parser.add_argument('--max_length', dest='max_model_len')
parser.add_argument('--max_model_len')
if not vllm_available():
    parser.add_argument('--max-model-len')

All three for essentially the same variable. Do we want this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm actually implements a flexible arg parser so it can automatically convert underscore to dashes
https://github.com/vllm-project/vllm/blob/80162c44b1d1e59a2c10f65b6adb9b0407439b1f/vllm/utils.py#L1088

@@ -71,9 +71,10 @@ def list_of_strings(arg):
)
parser.add_argument(
"--max_length",
dest="max_model_len",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dest="max_model_len",

Copy link
Contributor Author

@Datta0 Datta0 Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to make sure we're writing to same argument and hence avoid doing something like
final_max_len = args.max_length or args.max_model_len

type=int,
required=False,
help="max sequence length for the tokenizer",
help="max sequence length for the tokenizer. will be deprecated in favour of --max-model-length",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help="max sequence length for the tokenizer. will be deprecated in favour of --max-model-length",
help="max sequence length for the tokenizer. will be deprecated in favour of --max-model-len",

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>
Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>
Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>
Copy link
Member

@yuzisun yuzisun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@yuzisun yuzisun merged commit 1bd82fb into kserve:master Aug 25, 2024
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants