Releases: microsoft/MInference
Releases · microsoft/MInference
V0.1.5.post1: Support LLaMA-3-70B, Multi-gpu, fix kernel / sqrt(dk)
What's Changed
- Feature(MInference): support LLaMA-3-70B-1M and multi-gpu PP by @iofu728 in #59
- Fix(MInference): fix e2e benchmark guideline & fix A-shape multi gpu by @iofu728 in #66
- Fix(MInference): fix the vs pattern loss / sqrt(dk) by @PiotrNawrot in #70
Full Changelog: v0.1.5...v0.1.5.post1
v0.1.5
What's Changed #27
- add vllm>=0.4.1 by @liyucheng09 in #19, #44
- Feature(MInference): update HF demo information, thanks @ak's sponsoring by @iofu728 in #22
- Feature(MInference): add unittest by @iofu728 in #31, #32
- Feature(MInference): add triton-based decoding in case flash_attn is not available by @liyucheng09 in #35
- Feature(MInference): add e2e benchmark using vllm by @iofu728 in #49
- Feature(MInference): support llama 3.1 by @iofu728 in #54
- Hotfix(MInference): fix the import warnings, fix the apply_rotary_pos… by @iofu728 in #30
New Contributors
- @liyucheng09 made their first contribution in #19
Full Changelog: v0.1.4...v0.1.5
V0.1.4.post4: Hotfix vllm >= 0.4.1
What's Changed
Full Changelog: v0.1.4.post3...v0.1.4.post4
V0.1.4.post3: remove flash_attn dependency
What's Changed
- Feature(MInference): add triton-based decoding in case flash_attn is not available by @liyucheng09 in #35
Full Changelog: v0.1.4.post2...v0.1.4.post3
V0.1.4.post2: support multi-gpu, remove pycuda
What's Changed
- Feature(MInference): update HF demo information, thanks @ak's sponsoring by @iofu728 in #22
- Feature(MInference): remove pycuda; #20
- Feature(MInference): support multi-gpu; #25
- Feature(MInference): add unittest by @iofu728 in #31 #32
- Fixes #28 the import warnings;
- Fixed #25 fix the apply_rotary_pos_emb_single;
- Fixed phi-3 vs kernel;
Full Changelog: v0.1.4.post1...v0.1.4.post2
V0.1.4.post1: support other vllm versions
What's Changed
- add vllm support for 0.4.2 and 0.4.3 by @liyucheng09 in #19
New Contributors
- @liyucheng09 made their first contribution in #19
Full Changelog: v0.1.4...v0.1.4.post1
V0.1.4: Hotfix config in pip
What's Changed
Full Changelog: v0.1.3...v0.1.4
V0.1.3: fix the pip setup and add bdist cache
v0.1.2
V0.1.0: Release MInference
Features
- release MInference Code, experiments scirpts, examples, pip package;
- build the github action release pipeline;
- add MInference experiments document @liyucheng09, FAQ;
- add supported models, demo, demo video;
- add project page;
- add three dynamic sparse attention kernel @Starmys;
What's Changed
- PreRelease: v0.1.0 by @iofu728 @liyucheng09 @Starmys in #2
- Doc(MInference): update paper information by @iofu728 @liyucheng09 @Starmys in #3
New Contributors
- @iofu728 @liyucheng09 @Starmys made their first contribution in #2
Full Changelog: https://github.com/microsoft/MInference/commits/v0.1.0