Release DJLServing v0.22.1 release · deepjavalibrary/djl-serving

Key Features

Add pytorch inf2 by @lanking520 in #535
Adds chunked encoding support by @frankfliu in #551
Ahead of Time Partitioning Support in FT default handler and test cases by @sindhuvahinis in #539
Python engine streaming initial support by @rohithkrn in #573
Adds async inference API by @frankfliu in #570
Optimize batch inference for text generation by @siddvenk in #586
Add default handler for AOT by @sindhuvahinis in #588
Support text2text-generation task in deepspeed by @siddvenk in #606
Throttles request if all workers are busy by @frankfliu in #656
Infer recommended LMI engine by @siddvenk in #623

Bug Fixes

[fix] requirements.txt install check testcase by @sindhuvahinis in #537
[python] Fixes typo in unit test by @frankfliu in #554
[serving] Fixes GPU auto scaling bug by @frankfliu in #561
Fix typo in streaming utils by @rohithkrn in #581
KServe data to bytes fix by @sindhuvahinis in #577
[serving] Fixes NeuronUtils for SageMaker by @frankfliu in #583
[python] Fixes python startup race condition by @frankfliu in #589
[serving] Avoid download from s3 multiple time by @frankfliu in #596
make output consistent by @lanking520 in #616
[workflow] Fixes workflow loading issue by @frankfliu in #662

Enhancement

[ci] Upgrades gradle to 8.0.2 by @frankfliu in #540
[ci] Uses recommended way to create task in build.gradle by @frankfliu in #541
update deepspeed container python version to 3.9 by @rohithkrn in #546
[inf2] Adding gptj to transformers handler by @maaquib in #542
install git by default for all python releases by @lanking520 in #555
Load external dependencies for workflows by @xyang16 in #556
[python] Infer default entryPoint if not provided by @frankfliu in https://github.com/deepjavalibrary/djl-serving/pull/5631
[python] flush logging output before process end by @frankfliu in #567
[serving] support load entryPoint with url by @frankfliu in #566
[serving] deprecate s3Url and replace it with model_id by @frankfliu in #568
Sets huggingface cache directory to /tmp in container by @lanking520 in #571
add finalize callback function by @lanking520 in #572
add pad token if not set by @lanking520 in #550
Include Kserve plugins to distribution by @sindhuvahinis in #552
[python] Passing arguments to model.py by @frankfliu in #560
update pytorch docker to py3.9 by @rohithkrn in #547
[serving] Detect triton engine by @frankfliu in #574
[python] Refactor PyEngine with PassthroughNDManager by @frankfliu in #578
Minimal followup for BytesSupplier changes by @zachgk in #580
[serving] Sets djl cache directory to /tmp by @frankfliu in #585
[python] Makes download entryPoint atomic by @frankfliu in #587
[python] Use NeuronUtils to detect neuron cores by @frankfliu in #593
[python] Fixes visible neuron cores environment variable by @frankfliu in #595
[serving] Refactor per model configuration initialization by @frankfliu in #594
Refactor CacheManager, Working Async by @zachgk in #591
[ci] bump up deepspeed version by @tosterberg in #597
[serving] Avoid compile time dependency on log4j by @frankfliu in #603
[serving] add default dtype when running in deepspeed by @tosterberg in #617
[serving] Adds deps folder to classpath in MutableClassLoader constructor by @frankfliu in #611
Add support for streaming batch size > 1 by @rohithkrn in #605
add ddb paginator for DJLServing by @lanking520 in #609
update fastertransformer to follow huggingface parameters by @lanking520 in #610
Change billing model to pay per request by @frankfliu in #612
Upgrade dependencies version by @frankfliu in #613
clean up docker build script and remove transformers docker image build by @lanking520 in #61
[AOT] Upload sharded checkpoints to S3 by @sindhuvahinis in #604
[serving] Upgrade to DJL 0.22.0 by @frankfliu in #622
Unify tnx experience by @lanking520 in #619
[serving] Update DJL version to 0.22.1 by @frankfliu in #627
[Docker] update a few versions by @lanking520 in #620
[serving] Make chunked read timeout configurable by @frankfliu in #652
[python][streaming]Do best effort model type validation to fix configs without arch list by @rohithkrn in #649
[AOT] Entrypoint download from url by @sindhuvahinis in #628
[wlm] Moves LmiUtils.inferLmiEngine() into separate class by @frankfliu in #630
[python][streaming]Batching fix and validate model architecture by @rohithkrn in #626
skip special tokens by default by @lanking520 in #635
[serving] Read x-synchronus and x-starting-token from input payload by @frankfliu in #637
add torchvision by @lanking520 in #638
[serving] Keep original content-type header by @frankfliu in #642
[serving] Override inferred options in criteria by @frankfliu in #644
Pinning aws-neuronx-* packages for Inf2 containers by @maaquib in #621
[serving] Stop model server if plugin init failed by @frankfliu in #655

Documentation

[docs] Fix serving doc by @xyang16 in #548
[docs] Adds streaming configuration document by @rohithkrn in #659
update docs to djl 0.22.1 by @siddvenk in #664

Full Changelog: v0.21.0...v0.22.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DJLServing v0.22.1 release

Key Features

Bug Fixes

Enhancement

Documentation

Contributors