Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Upstream codebase diff #470

Draft
wants to merge 985 commits into
base: main
Choose a base branch
from
Draft

[DO NOT MERGE] Upstream codebase diff #470

wants to merge 985 commits into from

Conversation

kzawora-intel
Copy link

@kzawora-intel kzawora-intel commented Nov 6, 2024

Scope of changes:

  • Contiguous PA
  • Multi-step scheduling
  • Automatic prefix caching
  • Padding-aware scheduling/max_num_prefill_seqs
  • Guided decoding fixes
  • FP8 support (INC/w8a8/weights_load_device)
  • ApplyToppTopkScalar sampler optimization
  • LoRA/MultiLoRA support
  • FusedMoE support
  • Model changes (adding mark_steps)
  • Tests
  • FakeHPU mode
  • CI stuff (.jenkins, .github)
  • Lots of minor stuff (RNG, FSDPA flag, reduced block fragmentation)

@@ -0,0 +1,35 @@
name: cpu-test

Check failure

Code scanning / Scorecard

Token-Permissions High

score is 0: no topLevel permission defined
Remediation tip: Visit https://app.stepsecurity.io/secureworkflow.
Tick the 'Restrict permissions for GITHUB_TOKEN'
Untick other options
NOTE: If you want to resolve multiple issues at once, you can visit https://app.stepsecurity.io/securerepo instead.
Click Remediation section below for further remediation help
@kzawora-intel kzawora-intel marked this pull request as draft November 6, 2024 13:49
@kzawora-intel kzawora-intel added the habana Issues or PRs submitted by Habana Labs label Nov 8, 2024
@@ -0,0 +1,45 @@
name: codespell

Check failure

Code scanning / Scorecard

Token-Permissions High

score is 0: no topLevel permission defined
Remediation tip: Visit https://app.stepsecurity.io/secureworkflow.
Tick the 'Restrict permissions for GITHUB_TOKEN'
Untick other options
NOTE: If you want to resolve multiple issues at once, you can visit https://app.stepsecurity.io/securerepo instead.
Click Remediation section below for further remediation help
def test_stateless_process_group(worker):
port1 = get_open_port()
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("", port1))

Check warning

Code scanning / CodeQL

Binding a socket to all network interfaces Medium test

'' binds a socket to all interfaces.

Copilot Autofix AI 7 days ago

To fix the problem, we need to bind the socket to a specific interface instead of all interfaces. In this case, we can bind it to the loopback interface 127.0.0.1, which is commonly used for local testing and development. This change will limit the socket to accept connections only from the local machine, reducing the security risks.

Suggested changeset 1
tests/distributed/test_utils.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/tests/distributed/test_utils.py b/tests/distributed/test_utils.py
--- a/tests/distributed/test_utils.py
+++ b/tests/distributed/test_utils.py
@@ -124,3 +124,3 @@
     with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
-        s.bind(("", port1))
+        s.bind(("127.0.0.1", port1))
         port2 = get_open_port()
EOF
@@ -124,3 +124,3 @@
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("", port1))
s.bind(("127.0.0.1", port1))
port2 = get_open_port()
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options

sock = socket.socket(family=family, type=socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(addr)

Check warning

Code scanning / CodeQL

Binding a socket to all network interfaces Medium

'' binds a socket to all interfaces.

Copilot Autofix AI 4 days ago

To fix the problem, we need to ensure that the socket binds to a specific interface rather than all interfaces. This can be achieved by modifying the create_server_socket function to check if the provided address is empty or 0.0.0.0 and raise an error or use a default specific interface instead.

  1. Modify the create_server_socket function to validate the address.
  2. If the address is empty or 0.0.0.0, raise an error or use a default specific interface.
  3. Update the run_server function to handle the potential error raised by create_server_socket.
Suggested changeset 1
vllm/entrypoints/openai/api_server.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/vllm/entrypoints/openai/api_server.py b/vllm/entrypoints/openai/api_server.py
--- a/vllm/entrypoints/openai/api_server.py
+++ b/vllm/entrypoints/openai/api_server.py
@@ -612,2 +612,5 @@
 def create_server_socket(addr: Tuple[str, int]) -> socket.socket:
+    if addr[0] in ("", "0.0.0.0"):
+        raise ValueError("Binding to all interfaces is not allowed. Please specify a valid IP address.")
+
     family = socket.AF_INET
@@ -640,3 +643,7 @@
     sock_addr = (args.host or "", args.port)
-    sock = create_server_socket(sock_addr)
+    try:
+        sock = create_server_socket(sock_addr)
+    except ValueError as e:
+        logger.error(e)
+        return
 
EOF
@@ -612,2 +612,5 @@
def create_server_socket(addr: Tuple[str, int]) -> socket.socket:
if addr[0] in ("", "0.0.0.0"):
raise ValueError("Binding to all interfaces is not allowed. Please specify a valid IP address.")

family = socket.AF_INET
@@ -640,3 +643,7 @@
sock_addr = (args.host or "", args.port)
sock = create_server_socket(sock_addr)
try:
sock = create_server_socket(sock_addr)
except ValueError as e:
logger.error(e)
return

Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
# Llama3.2 models more reliable.

TOOL_CALL_REGEX = re.compile(
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[' and containing many repetitions of 'AA(),'.
# Llama3.2 models more reliable.

TOOL_CALL_REGEX = re.compile(
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(' and containing many repetitions of 'AA=,'.
# Llama3.2 models more reliable.

TOOL_CALL_REGEX = re.compile(
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(A=' and containing many repetitions of ',A='.
# Llama3.2 models more reliable.

TOOL_CALL_REGEX = re.compile(
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(' and containing many repetitions of 'AA= ),A('.
# Llama3.2 models more reliable.

TOOL_CALL_REGEX = re.compile(
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[' and containing many repetitions of 'AA()'.
# Llama3.2 models more reliable.

TOOL_CALL_REGEX = re.compile(
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(' and containing many repetitions of 'AA=,'.
# Llama3.2 models more reliable.

TOOL_CALL_REGEX = re.compile(
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(A=' and containing many repetitions of ',A='.
# Llama3.2 models more reliable.

TOOL_CALL_REGEX = re.compile(
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(' and containing many repetitions of 'AA=)A('.
# Llama3.2 models more reliable.

TOOL_CALL_REGEX = re.compile(
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(A=' and containing many repetitions of ')A(A='.
CNTRYROA and others added 24 commits November 22, 2024 21:13
Signed-off-by: Chen Wu <cntryroa@gmail.com>
Signed-off-by: rickyx <rickyx@anyscale.com>
…ject#10485)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: statelesshz <hzji210@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Co-authored-by: Flavia Beo <flavia.beo@ibm.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
…#10616)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Limit decode bucket size to num_hpu_blocks
DarkLight1337 and others added 20 commits December 10, 2024 13:45
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
We want to start enforcing codeowners on our repo, but without calling
vllm upstream owners for review each time.
This PR adds `PunicaWrapperHPU` class to handle LoRA computations in
HPU. These changes are to align LoRA flow refactoring done in the
upstream branch.
it's that time of the week again

---------

Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: Sam Stoelinga <sammiestoel@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: xffxff <1247714429@qq.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Signed-off-by: kevin <kevin@anyscale.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: Jerzy Zagorski <jzagorsk@amazon.com>
Signed-off-by: Richard Liu <ricliu@google.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Sam Stoelinga <sammiestoel@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: zhou fan <1247714429@qq.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Kevin H. Luu <kevin@anyscale.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: xendo <xendoo@gmail.com>
Co-authored-by: Jerzy Zagorski <jzagorsk@amazon.com>
Co-authored-by: Richard Liu <39319471+richardsliu@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Jeff Cook <jeff@jeffcook.io>
Co-authored-by: Diego Marinho <dmztheone@gmail.com>
Co-authored-by: Gene Der Su <e870252314@gmail.com>
Co-authored-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
Co-authored-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>
michalkuligowski and others added 7 commits December 11, 2024 12:51
i think inception was a decent movie overall
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
With this patch, mp executor does not hang at the end of application out
of the box, and exits gracefully.
New useful checks were added, and we're not running them on habana_main
per PR. This PR fixes that.
@@ -0,0 +1,32 @@
name: Lint documentation

Check failure

Code scanning / Scorecard

Token-Permissions High

score is 0: no topLevel permission defined
Remediation tip: Visit https://app.stepsecurity.io/secureworkflow.
Tick the 'Restrict permissions for GITHUB_TOKEN'
Untick other options
NOTE: If you want to resolve multiple issues at once, you can visit https://app.stepsecurity.io/securerepo instead.
Click Remediation section below for further remediation help
Without this change we can observe below error:
```
[rank0]:   File "/software/users/kdamaszke/repos/vllm-fork/vllm/model_executor/models/mllama.py", line 959, in forward
[rank0]:     full_text_row_masked_out_mask = full_text_row_masked_out_mask.view(
[rank0]: RuntimeError: shape '[4, -1, 1]' is invalid for input of size 3
```
It occurs when one of the requests is removed from the batch earlier. In
that case, language model is still working on the shapes padded to the
bucketed batch size, while encoder input doesn't. This change is
aligning the batch size on `encoder_seq_lens` to the expected one.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
habana Issues or PRs submitted by Habana Labs
Projects
None yet
Development

Successfully merging this pull request may close these issues.