Support Mixtral quantization using INC #267

dudilester · 2024-09-11T04:26:04Z

No description provided.

dudilester · 2024-09-11T07:50:55Z

jenkins vllm-perf run of this branch:
https://tf-jenkins-ctrl01.habana-labs.com/job/static-benchmarks/job/vllm-benchmarks/185/
functionality passed, acc is at 95% (will test combinations of enabling quantization/high-precision of different layers to achieve 99% acc, however this does not require any code change only run-time configuration change so should not impact code merging)

szutenberg

Hi @dudilester ,

Your changes look fine but the testing is insufficient.

your branch does not contain the latest flat_pa changes Port flat PA from habana_next to habana_main #169 . Please rebase your branch and rerun tests.
gerrit change used in your tests installs custom inc binary. Please make sure that all required changes are merged and do not install any custom binaries.

Thanks!

dudilester · 2024-09-11T14:08:30Z

Hi @szutenberg,

Thank you for the review.
I've rebased the branch to include latest habana_main commits and removed the INC installation from the patch to use the regular INC of build 1.18.0.395.

https://tf-jenkins-ctrl01.habana-labs.com/job/static-benchmarks/job/vllm-benchmarks/195/

{"mlperf-moe/mixtral-8x7b-tp1-fp8": {"rouge1": 45.9205, "rouge2": 23.7295, "rougeL": 30.8274, "rougeLsum": 42.9159, "gsm8k": 72.66, "mbxp": 57.58, "gen_len": 4163594, "gen_num": 15000, "gen_tok_len": 2132769, "tokens_per_sample": 142.2, "accuracy": 95.71, "valid": true, "warmup_duration": 512.0, "test_duration": 4143.0, "eval_duration": 243.0}}

szutenberg

LGTM now. Thanks @dudilester 👍

This reverts commit acf7d54.

dudilester requested review from HolyFalafel, MrGeva, szutenberg, Tiefen-boop, Yantom1, kzawora-intel, ulivne and jkaniecki September 11, 2024 10:32

szutenberg requested changes Sep 11, 2024

View reviewed changes

dudilester added 6 commits September 11, 2024 14:37

Support Mixtral quantization using INC

bdcaa25

fix formatter errors

bc586d2

fix mypy error

1c808ad

fix isort error

6de3939

fix more format errors

1703daf

fix format errors

61cc9a3

dudilester force-pushed the dev/dlester/mixtral_10.9 branch from 3c21080 to 61cc9a3 Compare September 11, 2024 11:38

szutenberg approved these changes Sep 12, 2024

View reviewed changes

szutenberg mentioned this pull request Sep 12, 2024

Support Mixtral quantization using INC #277

Closed

dudilester merged commit acf7d54 into habana_main Sep 12, 2024
13 checks passed

jbyczkow added a commit to jbyczkow/vllm-fork that referenced this pull request Sep 13, 2024

Revert "Support Mixtral quantization using INC (HabanaAI#267)"

e9badfd

This reverts commit acf7d54.

zhouyu5 pushed a commit to zhouyu5/vllm-fork that referenced this pull request Sep 13, 2024

Support Mixtral quantization using INC (HabanaAI#267)

a7128c4

zhouyu5 pushed a commit to zhouyu5/vllm-fork that referenced this pull request Sep 20, 2024

Support Mixtral quantization using INC (HabanaAI#267)

a5904b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Mixtral quantization using INC #267

Support Mixtral quantization using INC #267

dudilester commented Sep 11, 2024

dudilester commented Sep 11, 2024

szutenberg left a comment

dudilester commented Sep 11, 2024 •

edited by wpyszka

Loading

szutenberg left a comment •

edited by wpyszka

Loading

Support Mixtral quantization using INC #267

Support Mixtral quantization using INC #267

Conversation

dudilester commented Sep 11, 2024

dudilester commented Sep 11, 2024

szutenberg left a comment

Choose a reason for hiding this comment

dudilester commented Sep 11, 2024 • edited by wpyszka Loading

szutenberg left a comment • edited by wpyszka Loading

Choose a reason for hiding this comment

dudilester commented Sep 11, 2024 •

edited by wpyszka

Loading

szutenberg left a comment •

edited by wpyszka

Loading