DJLServing v0.22.1 release
Key Features
- Add pytorch inf2 by @lanking520 in #535
- Adds chunked encoding support by @frankfliu in #551
- Ahead of Time Partitioning Support in FT default handler and test cases by @sindhuvahinis in #539
- Python engine streaming initial support by @rohithkrn in #573
- Adds async inference API by @frankfliu in #570
- Optimize batch inference for text generation by @siddvenk in #586
- Add default handler for AOT by @sindhuvahinis in #588
- Support text2text-generation task in deepspeed by @siddvenk in #606
- Throttles request if all workers are busy by @frankfliu in #656
- Infer recommended LMI engine by @siddvenk in #623
Bug Fixes
- [fix] requirements.txt install check testcase by @sindhuvahinis in #537
- [python] Fixes typo in unit test by @frankfliu in #554
- [serving] Fixes GPU auto scaling bug by @frankfliu in #561
- Fix typo in streaming utils by @rohithkrn in #581
- KServe data to bytes fix by @sindhuvahinis in #577
- [serving] Fixes NeuronUtils for SageMaker by @frankfliu in #583
- [python] Fixes python startup race condition by @frankfliu in #589
- [serving] Avoid download from s3 multiple time by @frankfliu in #596
- make output consistent by @lanking520 in #616
- [workflow] Fixes workflow loading issue by @frankfliu in #662
Enhancement
- [ci] Upgrades gradle to 8.0.2 by @frankfliu in #540
- [ci] Uses recommended way to create task in build.gradle by @frankfliu in #541
- update deepspeed container python version to 3.9 by @rohithkrn in #546
- [inf2] Adding gptj to transformers handler by @maaquib in #542
- install git by default for all python releases by @lanking520 in #555
- Load external dependencies for workflows by @xyang16 in #556
- [python] Infer default entryPoint if not provided by @frankfliu in https://github.com/deepjavalibrary/djl-serving/pull/5631
- [python] flush logging output before process end by @frankfliu in #567
- [serving] support load entryPoint with url by @frankfliu in #566
- [serving] deprecate s3Url and replace it with model_id by @frankfliu in #568
- Sets huggingface cache directory to /tmp in container by @lanking520 in #571
- add finalize callback function by @lanking520 in #572
- add pad token if not set by @lanking520 in #550
- Include Kserve plugins to distribution by @sindhuvahinis in #552
- [python] Passing arguments to model.py by @frankfliu in #560
- update pytorch docker to py3.9 by @rohithkrn in #547
- [serving] Detect triton engine by @frankfliu in #574
- [python] Refactor PyEngine with PassthroughNDManager by @frankfliu in #578
- Minimal followup for BytesSupplier changes by @zachgk in #580
- [serving] Sets djl cache directory to /tmp by @frankfliu in #585
- [python] Makes download entryPoint atomic by @frankfliu in #587
- [python] Use NeuronUtils to detect neuron cores by @frankfliu in #593
- [python] Fixes visible neuron cores environment variable by @frankfliu in #595
- [serving] Refactor per model configuration initialization by @frankfliu in #594
- Refactor CacheManager, Working Async by @zachgk in #591
- [ci] bump up deepspeed version by @tosterberg in #597
- [serving] Avoid compile time dependency on log4j by @frankfliu in #603
- [serving] add default dtype when running in deepspeed by @tosterberg in #617
- [serving] Adds deps folder to classpath in MutableClassLoader constructor by @frankfliu in #611
- Add support for streaming batch size > 1 by @rohithkrn in #605
- add ddb paginator for DJLServing by @lanking520 in #609
- update fastertransformer to follow huggingface parameters by @lanking520 in #610
- Change billing model to pay per request by @frankfliu in #612
- Upgrade dependencies version by @frankfliu in #613
- clean up docker build script and remove transformers docker image build by @lanking520 in #61
- [AOT] Upload sharded checkpoints to S3 by @sindhuvahinis in #604
- [serving] Upgrade to DJL 0.22.0 by @frankfliu in #622
- Unify tnx experience by @lanking520 in #619
- [serving] Update DJL version to 0.22.1 by @frankfliu in #627
- [Docker] update a few versions by @lanking520 in #620
- [serving] Make chunked read timeout configurable by @frankfliu in #652
- [python][streaming]Do best effort model type validation to fix configs without arch list by @rohithkrn in #649
- [AOT] Entrypoint download from url by @sindhuvahinis in #628
- [wlm] Moves LmiUtils.inferLmiEngine() into separate class by @frankfliu in #630
- [python][streaming]Batching fix and validate model architecture by @rohithkrn in #626
- skip special tokens by default by @lanking520 in #635
- [serving] Read x-synchronus and x-starting-token from input payload by @frankfliu in #637
- add torchvision by @lanking520 in #638
- [serving] Keep original content-type header by @frankfliu in #642
- [serving] Override inferred options in criteria by @frankfliu in #644
- Pinning aws-neuronx-* packages for Inf2 containers by @maaquib in #621
- [serving] Stop model server if plugin init failed by @frankfliu in #655
Documentation
- [docs] Fix serving doc by @xyang16 in #548
- [docs] Adds streaming configuration document by @rohithkrn in #659
- update docs to djl 0.22.1 by @siddvenk in #664
Full Changelog: v0.21.0...v0.22.1