Releases · vectorch-ai/ScaleLLM

26 Oct 03:12

github-actions

v0.2.2

d11c948

v0.2.2 Latest

Latest

What's Changed

kernel: added flash infer attention impl by @guocuimi in #327
refactor: flatten block tables to 1d tensor by @guocuimi in #328
kernel: added script to generate instantiation for flashinfer kernels by @guocuimi in #329
refactor: move flash attn and flash infer into attention folder by @guocuimi in #330
kernel: port flash infer handler + wrapper logics by @guocuimi in #331
ut: added unittests for flash infer kernels by @guocuimi in #332
refactor: replaced last_page_len with kv_indptr for flash infer kernel by @guocuimi in #333
feat: added pass-in alibi slopes support for flash infer kernel by @guocuimi in #334
refactor: move paged kv related logic into paged_kv_t by @guocuimi in #335
ut: added fp8 kv unittests for flash infer kernel by @guocuimi in #336
ci: added pip cache to avoid redownloading by @guocuimi in #337
upgrade pytorch to 2.4.1 by @guocuimi in #341
ci: run package test in docker by @guocuimi in #345
ci: build cuda 12.4 for scalellm cpp images by @guocuimi in #346
Upgrade pytorch to 2.5.0 by @guocuimi in #347
ut: add more tests for different warp layout by @guocuimi in #340
misc: attention kernel refactoring by @guocuimi in #339

Full Changelog: v0.2.1...v0.2.2

Contributors

guocuimi

Assets 29

scalellm-0.2.2+cu118torch2.4.1-cp310-cp310-linux_x86_64.whl

149 MB 2024-10-26T03:12:13Z
scalellm-0.2.2+cu118torch2.4.1-cp311-cp311-linux_x86_64.whl

149 MB 2024-10-26T03:12:13Z
scalellm-0.2.2+cu118torch2.4.1-cp312-cp312-linux_x86_64.whl

149 MB 2024-10-26T03:12:13Z
scalellm-0.2.2+cu118torch2.4.1-cp38-cp38-linux_x86_64.whl

103 MB 2024-10-26T03:12:12Z
scalellm-0.2.2+cu118torch2.4.1-cp39-cp39-linux_x86_64.whl

149 MB 2024-10-26T03:12:13Z
scalellm-0.2.2+cu118torch2.5.0-cp310-cp310-linux_x86_64.whl

149 MB 2024-10-26T03:12:13Z
scalellm-0.2.2+cu118torch2.5.0-cp311-cp311-linux_x86_64.whl

149 MB 2024-10-26T03:12:13Z
scalellm-0.2.2+cu118torch2.5.0-cp312-cp312-linux_x86_64.whl

149 MB 2024-10-26T03:12:12Z
scalellm-0.2.2+cu118torch2.5.0-cp39-cp39-linux_x86_64.whl

149 MB 2024-10-26T03:12:12Z
scalellm-0.2.2+cu121torch2.4.1-cp310-cp310-linux_x86_64.whl

149 MB 2024-10-26T03:12:12Z
Source code (zip)

2024-10-25T22:32:03Z
Source code (tar.gz)

2024-10-25T22:32:03Z

04 Sep 23:00

github-actions

v0.2.1

c28c441

v0.2.1

What's Changed

feat: added awq marlin qlinear by @guocuimi in #315
build: speed up compilation for marlin kernels by @guocuimi in #316
test: added unittests for marlin kernels by @guocuimi in #317
refactor: clean up build warnings and refactor marlin kernels by @guocuimi in #318
fix: clean up build warnings: "LOG" redefined by @guocuimi in #319
cmake: make includes private and disable jinja2cpp build by @guocuimi in #320
ci: allow build without requiring a physical gpu device by @guocuimi in #321
fix: put item into asyncio.Queue in a thread-safe way by @guocuimi in #324
refactor: added static switch for marlin kernel dispatch by @guocuimi in #325
feat: fix and use marlin kernel for awq by default by @guocuimi in #326

Full Changelog: v0.2.0...v0.2.1

Contributors

guocuimi

Assets 37

22 Aug 01:49

github-actions

v0.2.0

96b8127

v0.2.0

What's Changed

kernel: port softcap support for flash attention by @guocuimi in #298
test: added unittests for attention sliding window by @guocuimi in #299
model: added gemma2 with softcap and sliding window support by @guocuimi in #300
kernel: support kernel test in python via pybind by @guocuimi in #301
test: added unittests for marlin fp16xint4 gemm by @guocuimi in #302
fix: move eos out of stop token list to honor ignore_eos option by @guocuimi in #305
refactor: move models to upper folder by @guocuimi in #306
kernel: port gptq marlin kernel and fp8 marlin kernel by @guocuimi in #307
rust: upgrade rust libs to latest version by @guocuimi in #309
refactor: remove the logic loading individual weight from shared partitions by @guocuimi in #311
feat: added fused column parallel linear by @guocuimi in #313
feat: added gptq marlin qlinear layer by @guocuimi in #312
kernel: port awq repack kernel by @guocuimi in #314

Full Changelog: v0.1.9...v0.2.0

Contributors

guocuimi

Assets 37

04 Aug 00:38

github-actions

v0.1.9

b6f707f

v0.1.9

What's Changed

ci: cancel all previous runs if a new one is triggered by @guocuimi in #283
pypi: fix invalid classifier by @guocuimi in #284
refactor: remove exllama kernels by @guocuimi in #285
kernel: added marlin dense and sparse kernels by @guocuimi in #287
debug: added environment collection script. by @guocuimi in #288
kernel: added triton kernel build support by @guocuimi in #289
feat: added THUDM/glm-4* support by @guocuimi in #292
fix: handle unfinished utf8 bytes for tiktoken tokenizer by @guocuimi in #293
triton: fix build error and add example with unittest by @guocuimi in #294
model: added qwen2 support by @guocuimi in #295
feat: added sliding window support for QWen2 by @guocuimi in #296
ci: fix pytest version to avoid flakiness by @guocuimi in #297

Full Changelog: v0.1.8...v0.1.9

Contributors

guocuimi

Assets 37

25 Jul 12:02

github-actions

v0.1.8

2e14170

v0.1.8

What's Changed

ci: increase ccache max size from 5GB(default) to 25GB by @guocuimi in #279
upgrade torch to 2.4.0 by @guocuimi in #280
default use cuda 12.1 for wheel package by @guocuimi in #281
ci: fix cuda version for wheel build workflow by @guocuimi in #282

Full Changelog: v0.1.7...v0.1.8

Contributors

guocuimi

Assets 37

24 Jul 06:12

github-actions

v0.1.7

f0f7e07

v0.1.7

What's Changed

build: fix build error with gcc-13 by @guocuimi in #264
kernel: upgrade cutlass to 3.5.0 + cuda 12.4 for sm89 fp8 support by @guocuimi in #265
cmake: define header only library instead of symbol link for cutlass and flashinfer by @guocuimi in #266
feat: added range to support Range-for loops by @guocuimi in #267
kernel: added attention cpu implementation for testing by @guocuimi in #268
build: added nvbench as submodule by @guocuimi in #269
build: upgrade cmake required version from 3.18 to 3.26 by @guocuimi in #270
ci: build and test in devel docker image by @guocuimi in #272
ci: use manylinux image to build wheel and run pytest by @guocuimi in #271
attention: added tile logic using cute::local_tile into cpu attention by @guocuimi in #273
kernel: added playground for learning and experimenting cute. by @guocuimi in #274
feat: added rope scaling support for llama3.1 by @guocuimi in #277
update docs for llama3.1 support and bump up version by @guocuimi in #278

Full Changelog: v0.1.6...v0.1.7

Contributors

guocuimi

Assets 26

04 Jul 00:34

github-actions

v0.1.6

7aeb7fa

v0.1.6

What's Changed

alllow deploy docs when triggered on demand by @guocuimi in #253
[model] support vision language model llava. by @liutongxuan in #178
dev: fix issues in run_in_docker script by @guocuimi in #254
dev: added cuda 12.4 build support by @guocuimi in #255
build: fix multiple definition issue by @guocuimi in #256
fix: check against num_tokens instead of num_prompt_tokens for shared blocks by @guocuimi in #257
bugfix: fix invalid max_cache_size when device is cpu. by @liutongxuan in #259
ci: fail test if not all tests were passed successfully by @guocuimi in #263
Revert "[model] support vision language model llava. (#178)" by @guocuimi in #262

Full Changelog: v0.1.5...v0.1.6

Contributors

liutongxuan and guocuimi

Assets 26

21 Jun 22:54

github-actions

v0.1.5

ed0c74e

v0.1.5

Major changes

added stream options to include usage info in response
fix multiple gpu cuda graph capture issue

What's Changed

feat: added include_usage into stream options for stream scenarios by @guocuimi in #243
feat: added unittests for openai server by @guocuimi in #244
[minor] use available memory to caculate cache_size by default. by @liutongxuan in #245
refactor: only do sampling in driver worker (rank=0) by @guocuimi in #247
fix multiple devices cuda graph capture issue by @guocuimi in #248
revert torch.cuda.empty_cache change by @guocuimi in #249
ci: added release workflow by @guocuimi in #250
fix workflow by @guocuimi in #251
fix: pass in secrets for workflow calls. by @guocuimi in #252

Full Changelog: v0.1.4...v0.1.5

Contributors

liutongxuan and guocuimi

Assets 26

15 Jun 17:16

github-actions

v0.1.4

7ee34e7

v0.1.4

Major changes

Added logprobs for completion and chat apis
Added best_of for completion and chate apis

What's Changed

feat: added openai compatible logprobs support by @guocuimi in #232
feat: added logprobs support for legacy completion api by @guocuimi in #233
feat: added logprobs for grpc server by @guocuimi in #234
feat: added best_of functionality for completion apis by @guocuimi in #236
feat: added token_ids into sequence output for better debuggability. by @guocuimi in #237
feat: added id_to_token for tokenizer to handle unfinished byte sequence, ending with "�" by @guocuimi in #238
refactor: split pybind11 binding definitions into seperate files by @guocuimi in #239
feat: added logprobs support for speculative decoding by @guocuimi in #240
feat: added synchronization for batch inference by @guocuimi in #241
feat: added 'repr' function for scalellm package by @guocuimi in #242

Full Changelog: v0.1.3...v0.1.4

Contributors

guocuimi

Assets 26

07 Jun 04:59

github-actions

v0.1.3

c4cba4a

v0.1.3

Major changes

Model arg hotfix for llama3
Added more help functions

What's Changed

fix: load vocab_size first then use it to decide model type for model sharing between llama3, llama2 and Yi. by @guocuimi in #230
feat: added with statement support to release memory and exposed help function for tokenizer by @guocuimi in #231

Full Changelog: v0.1.2...v0.1.3

Contributors

guocuimi

Assets 26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

Major changes

What's Changed

Contributors

Major changes

What's Changed

Contributors

Major changes

What's Changed

Contributors

Releases: vectorch-ai/ScaleLLM

v0.2.2

What's Changed

Contributors

v0.2.1

What's Changed

Contributors

v0.2.0

What's Changed

Contributors

v0.1.9

What's Changed

Contributors

v0.1.8

What's Changed

Contributors

v0.1.7

What's Changed

Contributors

v0.1.6

What's Changed

Contributors

v0.1.5

Major changes

What's Changed

Contributors

v0.1.4

Major changes

What's Changed

Contributors

v0.1.3

Major changes

What's Changed

Contributors