Skip to content

Releases: microsoft/MInference

V0.1.5.post1: Support LLaMA-3-70B, Multi-gpu, fix kernel / sqrt(dk)

13 Aug 09:17
5e37e0d
Compare
Choose a tag to compare

What's Changed

  • Feature(MInference): support LLaMA-3-70B-1M and multi-gpu PP by @iofu728 in #59
  • Fix(MInference): fix e2e benchmark guideline & fix A-shape multi gpu by @iofu728 in #66
  • Fix(MInference): fix the vs pattern loss / sqrt(dk) by @PiotrNawrot in #70

Full Changelog: v0.1.5...v0.1.5.post1

v0.1.5

24 Jul 11:25
b5b8745
Compare
Choose a tag to compare

What's Changed #27

  • add vllm>=0.4.1 by @liyucheng09 in #19, #44
  • Feature(MInference): update HF demo information, thanks @ak's sponsoring by @iofu728 in #22
  • Feature(MInference): add unittest by @iofu728 in #31, #32
  • Feature(MInference): add triton-based decoding in case flash_attn is not available by @liyucheng09 in #35
  • Feature(MInference): add e2e benchmark using vllm by @iofu728 in #49
  • Feature(MInference): support llama 3.1 by @iofu728 in #54
  • Hotfix(MInference): fix the import warnings, fix the apply_rotary_pos… by @iofu728 in #30

New Contributors

Full Changelog: v0.1.4...v0.1.5

V0.1.4.post4: Hotfix vllm >= 0.4.1

16 Jul 07:41
0b9c81b
Compare
Choose a tag to compare

What's Changed

  • Hotfix(MInference): fix vllm>=0.4.1 by @iofu728 in #44

Full Changelog: v0.1.4.post3...v0.1.4.post4

V0.1.4.post3: remove flash_attn dependency

15 Jul 05:48
50d17d9
Compare
Choose a tag to compare

What's Changed

  • Feature(MInference): add triton-based decoding in case flash_attn is not available by @liyucheng09 in #35

Full Changelog: v0.1.4.post2...v0.1.4.post3

V0.1.4.post2: support multi-gpu, remove pycuda

12 Jul 07:41
a880a6e
Compare
Choose a tag to compare

What's Changed

  • Feature(MInference): update HF demo information, thanks @ak's sponsoring by @iofu728 in #22
  • Feature(MInference): remove pycuda; #20
  • Feature(MInference): support multi-gpu; #25
  • Feature(MInference): add unittest by @iofu728 in #31 #32
  • Fixes #28 the import warnings;
  • Fixed #25 fix the apply_rotary_pos_emb_single;
  • Fixed phi-3 vs kernel;

Full Changelog: v0.1.4.post1...v0.1.4.post2

V0.1.4.post1: support other vllm versions

07 Jul 07:55
dbe9029
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.1.4...v0.1.4.post1

V0.1.4: Hotfix config in pip

05 Jul 08:01
427b1dd
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.1.3...v0.1.4

V0.1.3: fix the pip setup and add bdist cache

04 Jul 06:09
00666fb
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.1.0...v0.1.3

v0.1.2

04 Jul 05:29
41f8cea
Compare
Choose a tag to compare
v0.1.2 Pre-release
Pre-release

What's Changed

  • Hotfix(MInference): fix the pip setup by @iofu728 in #8
  • Hotfix(MInference): fix the yaml by @iofu728 in #9
  • Hotfix(Minference): fix the yaml by @iofu728 in #10

Full Changelog: v0.1.1...v0.1.2

V0.1.0: Release MInference

03 Jul 01:47
bf437a2
Compare
Choose a tag to compare

Features

  • release MInference Code, experiments scirpts, examples, pip package;
  • build the github action release pipeline;
  • add MInference experiments document @liyucheng09, FAQ;
  • add supported models, demo, demo video;
  • add project page;
  • add three dynamic sparse attention kernel @Starmys;

What's Changed

New Contributors

Full Changelog: https://github.com/microsoft/MInference/commits/v0.1.0