Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

microsoft / MInference Public

Notifications You must be signed in to change notification settings
Fork 39
Star 853

Code
Issues 43
Pull requests 2
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Releases: microsoft/MInference

Releases · microsoft/MInference

V0.1.5.post1: Support LLaMA-3-70B, Multi-gpu, fix kernel / sqrt(dk)

13 Aug 09:17

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

V0.1.5.post1: Support LLaMA-3-70B, Multi-gpu, fix kernel / sqrt(dk) Latest

Latest

What's Changed

Feature(MInference): support LLaMA-3-70B-1M and multi-gpu PP by @iofu728 in #59
Fix(MInference): fix e2e benchmark guideline & fix A-shape multi gpu by @iofu728 in #66
Fix(MInference): fix the vs pattern loss / sqrt(dk) by @PiotrNawrot in #70

Full Changelog: v0.1.5...v0.1.5.post1

Contributors

PiotrNawrot and iofu728

Assets 30

minference-0.1.5.post1+cu118torch2.0-cp310-cp310-linux_x86_64.whl

3.27 MB 2024-08-13T09:29:39Z
minference-0.1.5.post1+cu118torch2.0-cp311-cp311-linux_x86_64.whl

3.28 MB 2024-08-13T09:29:05Z
minference-0.1.5.post1+cu118torch2.0-cp38-cp38-linux_x86_64.whl

3.27 MB 2024-08-13T09:29:37Z
minference-0.1.5.post1+cu118torch2.0-cp39-cp39-linux_x86_64.whl

3.27 MB 2024-08-13T09:30:17Z
minference-0.1.5.post1+cu118torch2.1-cp310-cp310-linux_x86_64.whl

3.3 MB 2024-08-13T09:29:07Z
minference-0.1.5.post1+cu118torch2.1-cp311-cp311-linux_x86_64.whl

3.3 MB 2024-08-13T09:29:24Z
minference-0.1.5.post1+cu118torch2.1-cp38-cp38-linux_x86_64.whl

3.3 MB 2024-08-13T09:29:11Z
minference-0.1.5.post1+cu118torch2.1-cp39-cp39-linux_x86_64.whl

3.29 MB 2024-08-13T09:30:03Z
minference-0.1.5.post1+cu118torch2.2-cp310-cp310-linux_x86_64.whl

3.45 MB 2024-08-13T09:29:27Z
minference-0.1.5.post1+cu118torch2.2-cp311-cp311-linux_x86_64.whl

3.46 MB 2024-08-13T09:30:06Z
Source code (zip)

2024-08-13T09:17:18Z
Source code (tar.gz)

2024-08-13T09:17:18Z

All reactions

v0.1.5

24 Jul 11:25

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.1.5

What's Changed #27

add vllm>=0.4.1 by @liyucheng09 in #19, #44
Feature(MInference): update HF demo information, thanks @ak's sponsoring by @iofu728 in #22
Feature(MInference): add unittest by @iofu728 in #31, #32
Feature(MInference): add triton-based decoding in case flash_attn is not available by @liyucheng09 in #35
Feature(MInference): add e2e benchmark using vllm by @iofu728 in #49
Feature(MInference): support llama 3.1 by @iofu728 in #54
Hotfix(MInference): fix the import warnings, fix the apply_rotary_pos… by @iofu728 in #30

New Contributors

@liyucheng09 made their first contribution in #19

Full Changelog: v0.1.4...v0.1.5

Contributors

ak, liyucheng09, and iofu728

Assets 30

Loading

All reactions

V0.1.4.post4: Hotfix vllm >= 0.4.1

16 Jul 07:41

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

V0.1.4.post4: Hotfix vllm >= 0.4.1

What's Changed

Hotfix(MInference): fix vllm>=0.4.1 by @iofu728 in #44

Full Changelog: v0.1.4.post3...v0.1.4.post4

Contributors

iofu728

Assets 30

Loading

All reactions

V0.1.4.post3: remove flash_attn dependency

15 Jul 05:48

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

V0.1.4.post3: remove flash_attn dependency

What's Changed

Feature(MInference): add triton-based decoding in case flash_attn is not available by @liyucheng09 in #35

Full Changelog: v0.1.4.post2...v0.1.4.post3

Contributors

liyucheng09

Assets 30

Loading

All reactions

V0.1.4.post2: support multi-gpu, remove pycuda

12 Jul 07:41

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

V0.1.4.post2: support multi-gpu, remove pycuda

What's Changed

Feature(MInference): update HF demo information, thanks @ak's sponsoring by @iofu728 in #22
Feature(MInference): remove pycuda; #20
Feature(MInference): support multi-gpu; #25
Feature(MInference): add unittest by @iofu728 in #31 #32

Fixes #28 the import warnings;
Fixed #25 fix the apply_rotary_pos_emb_single;
Fixed phi-3 vs kernel;

Full Changelog: v0.1.4.post1...v0.1.4.post2

Contributors

ak and iofu728

Assets 30

Loading

All reactions

V0.1.4.post1: support other vllm versions

07 Jul 07:55

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

V0.1.4.post1: support other vllm versions

What's Changed

add vllm support for 0.4.2 and 0.4.3 by @liyucheng09 in #19

New Contributors

@liyucheng09 made their first contribution in #19

Full Changelog: v0.1.4...v0.1.4.post1

Contributors

liyucheng09

Assets 30

Loading

All reactions

V0.1.4: Hotfix config in pip

05 Jul 08:01

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

V0.1.4: Hotfix config in pip

What's Changed

Hotfix(MInference): fix the configs in pip by @iofu728 in #14, #15

Full Changelog: v0.1.3...v0.1.4

Contributors

iofu728

Assets 30

Loading

All reactions

V0.1.3: fix the pip setup and add bdist cache

04 Jul 06:09

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

V0.1.3: fix the pip setup and add bdist cache

What's Changed

Feature(MInference): fix unittest by @iofu728 in #4
Feature(MInference): add arXiv paper by @iofu728 in #5
Hotfix(MInference): fix the pip setup issue by @iofu728 in #6, #8, #9, #10
Feature(MInference): add bdist cache by @iofu728 in #11

Full Changelog: v0.1.0...v0.1.3

Contributors

iofu728

Assets 30

Loading

All reactions

v0.1.2

04 Jul 05:29

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.1.2 Pre-release

Pre-release

What's Changed

Hotfix(MInference): fix the pip setup by @iofu728 in #8
Hotfix(MInference): fix the yaml by @iofu728 in #9
Hotfix(Minference): fix the yaml by @iofu728 in #10

Full Changelog: v0.1.1...v0.1.2

Contributors

iofu728

Assets 10

Loading

All reactions

V0.1.0: Release MInference

03 Jul 01:47

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

V0.1.0: Release MInference

Features

release MInference Code, experiments scirpts, examples, pip package;
build the github action release pipeline;
add MInference experiments document @liyucheng09, FAQ;
add supported models, demo, demo video;
add project page;
add three dynamic sparse attention kernel @Starmys;

What's Changed

PreRelease: v0.1.0 by @iofu728 @liyucheng09 @Starmys in #2
Doc(MInference): update paper information by @iofu728 @liyucheng09 @Starmys in #3

New Contributors

@iofu728 @liyucheng09 @Starmys made their first contribution in #2

Full Changelog: https://github.com/microsoft/MInference/commits/v0.1.0

Contributors

liyucheng09, iofu728, and Starmys

Assets 10

Loading

All reactions

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.