Rename max_length parameter to max_model_len to be in sync with vLLM #3827

Datta0 · 2024-07-30T10:26:10Z

What this PR does / why we need it:
vLLM uses parameter called max-model-len to denote the max length allowed for tokenisation and processing. We have similar parameter for HF case but under a diff name called max_length. Rename that to use the same parameter for both backends for consistency.
Now we're adding --max_model_len which would essentially write to args.max_model_len. We'll slowly move away from --max_length .

Checklist:

Have you added unit/e2e tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

Add `--max_model_len` to huggingface runtime and deprecate `--max_length`

sivanantha321 · 2024-07-31T09:36:06Z

python/huggingfaceserver/huggingfaceserver/__main__.py

@@ -126,6 +120,14 @@ def list_of_strings(arg):
        choices=dtype_choices,
        help=f"data type to load the weights in. One of {dtype_choices}. Defaults to float16 for GPU and float32 for CPU systems",
    )
+    # vLLM uses max-model-len as paramter to denote max tokens. Register the same for HuggingFace (if vllm not available)
+    parser.add_argument(
+        "--max-model-len",


All the other args are using _(underscore). We should be consistent with the naming.

vLLM uses - hyphenated separation. Using the same to ensure we can make use of only single parameter name.

Wouldn't be a breaking change for current users ?

Yeah this would be a breaking change for existing users. Should we allow both for time being and warn that max_length would be deprecated in favour of max-model-len while giving the latter precedence?
I'm curious if they're internally handling the setting of parameters with diff name for diff backends.

@yuzisun What's your thought on this ?

@yuzisun thoughts on this?

this should be possible:

arg_parser.add_argument('--example-one', '--example_one')

So I just added max_length back which internally sets args.max_model_len.
Let me know if this works

@yuzisun @sivanantha321 @spolti please take a look

Can we support both args.max_model_len and args.max-model-len and deprecate max_length ?

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

Datta0 · 2024-08-16T06:40:24Z

/rerun-all

yuzisun · 2024-08-17T02:05:10Z

python/huggingfaceserver/huggingfaceserver/__main__.py

@@ -126,6 +127,14 @@ def list_of_strings(arg):
        choices=dtype_choices,
        help=f"data type to load the weights in. One of {dtype_choices}. Defaults to float16 for GPU and float32 for CPU systems",
    )
+    # vLLM uses max-model-len as paramter to denote max tokens. Register the same for HuggingFace (if vllm not available)
+    parser.add_argument(
+        "--max-model-len",


Suggested change

"--max-model-len",

"--max-model-len",

"--max_model_len",

We're just trying to have the same argument as vLLM. Hence the hyphen - instead of underscore _

Also the problem with adding --max_model_len here is, this would not work if vLLM exists.
If we want to support --max_model_len, we'd need to add it separately. This ends up creating 3 CLI flags for one variable.

I thought --max-model-len is parsed to max_model_len as well

Yeah --max-model-len is parsed to args.max_model_len
Just that if we want to support it, it should be added outside of if _vllm:

parser.add_argument('--foo-bar') parser.add_argument('--foo_bar') args = parser.parse_args(['--foo-bar', '24', '--foo_bar', '30']) Namespace(foo_bar='30')

if vLLM exists, we import that parser and hence it won't let us add --max-model-len again
so the final code would end up looking like

parser.add_argument('--max_length', dest='max_model_len') parser.add_argument('--max_model_len') if not vllm_available(): parser.add_argument('--max-model-len')

All three for essentially the same variable. Do we want this?

vllm actually implements a flexible arg parser so it can automatically convert underscore to dashes
https://github.com/vllm-project/vllm/blob/80162c44b1d1e59a2c10f65b6adb9b0407439b1f/vllm/utils.py#L1088

yuzisun · 2024-08-17T02:05:30Z

python/huggingfaceserver/huggingfaceserver/__main__.py

@@ -71,9 +71,10 @@ def list_of_strings(arg):
 )
 parser.add_argument(
    "--max_length",
+    dest="max_model_len",


Suggested change

dest="max_model_len",

This is to make sure we're writing to same argument and hence avoid doing something like
final_max_len = args.max_length or args.max_model_len

yuzisun · 2024-08-24T09:35:10Z

python/huggingfaceserver/huggingfaceserver/__main__.py

    type=int,
    required=False,
-    help="max sequence length for the tokenizer",
+    help="max sequence length for the tokenizer. will be deprecated in favour of --max-model-length",


Suggested change

help="max sequence length for the tokenizer. will be deprecated in favour of --max-model-length",

help="max sequence length for the tokenizer. will be deprecated in favour of --max-model-len",

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>

yuzisun

/lgtm
/approve

Datta0 marked this pull request as ready for review July 30, 2024 11:10

sivanantha321 reviewed Jul 31, 2024

View reviewed changes

Datta0 added 2 commits August 16, 2024 05:49

Rename max_length parameter to max_model_len to be in sync with vLLM

13f025c

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

Add max_length back which also writes to max_model_len

1f3dc7b

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

Datta0 force-pushed the rename_max_length branch from 662a64b to 1f3dc7b Compare August 16, 2024 05:49

yuzisun reviewed Aug 17, 2024

View reviewed changes

yuzisun reviewed Aug 24, 2024

View reviewed changes

Datta0 added 3 commits August 24, 2024 11:53

Fixup CLI helper message

559a192

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

Also support max_model_len

52ea217

Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>

Cleanup and refactor to max_model_len

a9da69d

Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>

yuzisun approved these changes Aug 25, 2024

View reviewed changes

yuzisun merged commit 1bd82fb into kserve:master Aug 25, 2024
57 checks passed

yuzisun added lgtm approved labels Aug 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename max_length parameter to max_model_len to be in sync with vLLM #3827

Rename max_length parameter to max_model_len to be in sync with vLLM #3827

Datta0 commented Jul 30, 2024 •

edited by yuzisun

Loading

sivanantha321 Jul 31, 2024

Datta0 Jul 31, 2024

sivanantha321 Aug 2, 2024

Datta0 Aug 2, 2024 •

edited

Loading

sivanantha321 Aug 2, 2024

Datta0 Aug 12, 2024

spolti Aug 12, 2024

Datta0 Aug 16, 2024

Datta0 Aug 16, 2024 •

edited

Loading

yuzisun Aug 17, 2024

Datta0 commented Aug 16, 2024

yuzisun Aug 17, 2024

Datta0 Aug 19, 2024

Datta0 Aug 22, 2024

yuzisun Aug 24, 2024

Datta0 Aug 24, 2024

yuzisun Aug 24, 2024 •

edited

Loading

Datta0 Aug 24, 2024 •

edited

Loading

yuzisun Aug 25, 2024

yuzisun Aug 17, 2024

Datta0 Aug 19, 2024 •

edited

Loading

yuzisun Aug 24, 2024

yuzisun left a comment

	help="max sequence length for the tokenizer. will be deprecated in favour of --max-model-length",
	help="max sequence length for the tokenizer. will be deprecated in favour of --max-model-len",

Rename max_length parameter to max_model_len to be in sync with vLLM #3827

Rename max_length parameter to max_model_len to be in sync with vLLM #3827

Conversation

Datta0 commented Jul 30, 2024 • edited by yuzisun Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Datta0 Aug 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Datta0 Aug 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Datta0 commented Aug 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuzisun Aug 24, 2024 • edited Loading

Choose a reason for hiding this comment

Datta0 Aug 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Datta0 Aug 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuzisun left a comment

Choose a reason for hiding this comment

Datta0 commented Jul 30, 2024 •

edited by yuzisun

Loading

Datta0 Aug 2, 2024 •

edited

Loading

Datta0 Aug 16, 2024 •

edited

Loading

yuzisun Aug 24, 2024 •

edited

Loading

Datta0 Aug 24, 2024 •

edited

Loading

Datta0 Aug 19, 2024 •

edited

Loading