Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Mixtral quantization using INC #267

Merged
merged 6 commits into from
Sep 12, 2024

Conversation

dudilester
Copy link

No description provided.

@dudilester
Copy link
Author

jenkins vllm-perf run of this branch:
https://tf-jenkins-ctrl01.habana-labs.com/job/static-benchmarks/job/vllm-benchmarks/185/
functionality passed, acc is at 95% (will test combinations of enabling quantization/high-precision of different layers to achieve 99% acc, however this does not require any code change only run-time configuration change so should not impact code merging)

Copy link

@szutenberg szutenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dudilester ,

Your changes look fine but the testing is insufficient.

  • your branch does not contain the latest flat_pa changes Port flat PA from habana_next to habana_main #169 . Please rebase your branch and rerun tests.
  • gerrit change used in your tests installs custom inc binary. Please make sure that all required changes are merged and do not install any custom binaries.

Thanks!

@dudilester dudilester force-pushed the dev/dlester/mixtral_10.9 branch from 3c21080 to 61cc9a3 Compare September 11, 2024 11:38
@dudilester
Copy link
Author

dudilester commented Sep 11, 2024

Hi @szutenberg,

Thank you for the review.
I've rebased the branch to include latest habana_main commits and removed the INC installation from the patch to use the regular INC of build 1.18.0.395.

https://tf-jenkins-ctrl01.habana-labs.com/job/static-benchmarks/job/vllm-benchmarks/195/

{"mlperf-moe/mixtral-8x7b-tp1-fp8": {"rouge1": 45.9205, "rouge2": 23.7295, "rougeL": 30.8274, "rougeLsum": 42.9159, "gsm8k": 72.66, "mbxp": 57.58, "gen_len": 4163594, "gen_num": 15000, "gen_tok_len": 2132769, "tokens_per_sample": 142.2, "accuracy": 95.71, "valid": true, "warmup_duration": 512.0, "test_duration": 4143.0, "eval_duration": 243.0}}

Copy link

@szutenberg szutenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. Thanks @dudilester 👍

@dudilester dudilester merged commit acf7d54 into habana_main Sep 12, 2024
13 checks passed
jbyczkow added a commit to jbyczkow/vllm-fork that referenced this pull request Sep 13, 2024
zhouyu5 pushed a commit to zhouyu5/vllm-fork that referenced this pull request Sep 13, 2024
zhouyu5 pushed a commit to zhouyu5/vllm-fork that referenced this pull request Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants