Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG fix] Rebase caused spec decode fix #613

Open
wants to merge 4 commits into
base: habana_main
Choose a base branch
from

Conversation

xuechendi
Copy link

@xuechendi xuechendi commented Dec 11, 2024

Error reported in https://jira.habana-labs.com/browse/SW-212516

Found two recent merged PR breaks down Spec Decode functionality:

  1. Support mllama (llama 3.2) model for HPU #491 overrides existing workerwrapperBase design for speculative decoding.
if model_runner_cls is not None:
    ModelRunnerClass = model_runner_cls

is not needed since we now use codes as below for init model_runner_cls to follow upstream design.

if model_runner_cls is not None:
            self.model_runner = model_runner_cls(self.model_runner)
  1. Prepare sin/cos buffers for rope outside model forward #566 is not working in Spec Decode Eagle mode
    Due to input tensors is now different to the pre-assumption that decode_fwd only provide one token per seq. Spec Decode provides multiple candidates tokens as q.
    To fix that, added a new ENV - "VLLM_COS_SIN_RECOMPUTE=true", need to use it to trigger recompute to cos and sin for spec decode.

upstream PR10555

Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
For spec decode eagle mode, need to VLLM_COS_SIN_RECOMPUTE=true

Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
@xuechendi
Copy link
Author

@michalkuligowski , please help to review.

@xuechendi
Copy link
Author

@kzawora-intel , please check a fix here:
previous mllama PR will break spec decode, I added a fix PR
de79b5c

Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant