Skip to content

Navigation Menu

Explore
By size
By industry
By use case
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

DefTruth / Awesome-LLM-Inference Public

Notifications You must be signed in to change notification settings
Fork 169
Star 2.5k

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: DefTruth/Awesome-LLM-Inference

Releases · DefTruth/Awesome-LLM-Inference

v2.4

18 Sep 05:10

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4 Latest

Latest

What's Changed

🔥[RetrievalAttention] Accelerating Long-Context LLM Inference via Vector Retrieval by @DefTruth in #62
🔥[Inf-MLLM] Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU by @DefTruth in #63
Bump up to v2.4 by @DefTruth in #64

Full Changelog: v2.3...v2.4

Contributors

DefTruth

Assets 2

Loading

All reactions

v2.3

09 Sep 01:25

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.3

What's Changed

🔥[CHESS] CHESS : Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification by @DefTruth in #59
🔥[SpMM] High Performance Unstructured SpMM Computation Using Tensor Cores by @DefTruth in #60
Bump up to v2.3 by @DefTruth in #61

Full Changelog: v2.2...v2.3

Contributors

DefTruth

Assets 2

Loading

All reactions

v2.2

04 Sep 06:22

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.2

What's Changed

Add NanoFlow code link by @DefTruth in #51
🔥[ACTIVATION SPARSITY] TRAINING-FREE ACTIVATION SPARSITY IN LARGE LANGUAGE MODELS by @DefTruth in #52
🔥[Decentralized LLM] Decentralized LLM Inference over Edge Networks with Energy Harvesting by @DefTruth in #53
🔥[SJF Scheduling] Efficient LLM Scheduling by Learning to Rank by @DefTruth in #54
🔥[Speculative Decoding] Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation by @DefTruth in #55
🔥🔥[Prompt Compression] Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference by @DefTruth in #56
🔥🔥[Context Distillation] Efficient LLM Context Distillation by @DefTruth in #57
Bump up to v2.2 by @DefTruth in #58

Full Changelog: v2.1...v2.2

Contributors

DefTruth

Assets 2

Loading

All reactions

v2.1

28 Aug 01:53

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.1

What's Changed

Update README.md by @DefTruth in #40
🔥[Speculative Decoding] Parallel Speculative Decoding with Adaptive Draft Length by @DefTruth in #41
🔥[FocusLLM] FocusLLM: Scaling LLM’s Context by Parallel Decoding by @DefTruth in #42
🔥[NanoFlow] NanoFlow: Towards Optimal Large Language Model Serving Throughput by @DefTruth in #43
🔥[MagicDec] MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding by @DefTruth in #44
Add ABQ-LLM code link by @DefTruth in #46
🔥🔥[MARLIN] MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models by @DefTruth in #47
🔥[1-bit LLMs] Matmul or No Matmal in the Era of 1-bit LLMs by @DefTruth in #48
🔥🔥[FLA] FLA: A Triton-Based Library for Hardware-Efficient Implementa… by @DefTruth in #49
Bump up to v2.1 by @DefTruth in #50

Full Changelog: v2.0...v2.1

Contributors

DefTruth

Assets 2

Loading

fan2goa1, jeejeelee, lihuahua123, and ItsAbdula reacted with thumbs up emoji

All reactions

👍 4 reactions

4 people reacted

v2.0

19 Aug 01:22

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.0

What's Changed

🔥🔥[LUT TENSOR CORE] Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration by @DefTruth in #33
🔥🔥[Eigen Attention] Attention in Low-Rank Space for KV Cache Compression by @DefTruth in #34
KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning by @DefTruth in #35
Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference by @DefTruth in #36
🔥[ABQ-LLM] Arbitrary-Bit Quantized Inference Acceleration for Large Language Models by @DefTruth in #37
[Token Recycling] Turning Trash into Treasure: Accelerating Inference… by @DefTruth in #38
Bump up to v2.0 by @DefTruth in #39

Full Changelog: v1.9...v2.0

Contributors

DefTruth

Assets 2

Loading

All reactions

v1.9

12 Aug 01:27

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.9

What's Changed

🔥[DynamoLLM] DynamoLLM: Designing LLM Inference Clusters for Performa… by @DefTruth in #28
🔥[Zero-Delay QKV Compression] Zero-Delay QKV Compression for Mitigati… by @DefTruth in #29
🔥[Automatic Inference Engine Tuning] Towards SLO-Optimized LLM Servin… by @DefTruth in #30
🔥🔥[500xCompressor] 500xCompressor: Generalized Prompt Compression for… by @DefTruth in #31
Bump up to v1.9 by @DefTruth in #32

Full Changelog: v1.8...v1.9

Contributors

DefTruth

Assets 2

Loading

QAQdev reacted with heart emoji

All reactions

❤️ 1 reaction

1 person reacted

v1.8

05 Aug 02:33

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.8

What's Changed

🔥[flashinfer] FlashInfer: Kernel Library for LLM Serving(@flashinfer-ai) by @DefTruth in #24
🔥[Palu] Palu: Compressing KV-Cache with Low-Rank Projection(@nycu.edu… by @DefTruth in #25
🔥[SentenceVAE] SentenceVAE: Faster, Longer and More Accurate Inferenc… by @DefTruth in #26
Bump up to v1.8 by @DefTruth in #27

Full Changelog: v1.7...v1.8

Contributors

DefTruth and flashinfer-ai

Assets 2

Loading

All reactions

v1.7

29 Jul 00:46

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.7

What's Changed

Add paper "Internal Consistency and Self-Feedback in Large Language Models: A Survey" by @fan2goa1 in #21
Update README.md by @clevercool in #22

New Contributors

@fan2goa1 made their first contribution in #21
@clevercool made their first contribution in #22

Full Changelog: v1.6...v1.7

Contributors

clevercool and fan2goa1

Assets 2

Loading

All reactions

v1.6

23 Jul 01:09

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.6

Full Changelog: v1.5...v1.6

Assets 2

Loading

All reactions

v1.5

15 Jul 01:23

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.5

What's Changed

add MInference 1.0 from microsoft by @liyucheng09 in #20

Full Changelog: v1.3...v1.5

Contributors

liyucheng09

Assets 2

Loading

All reactions

Previous 1 2 3 Next

Previous Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.