Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Add support for Streaming Inference #4497

Merged
merged 3 commits into from
Mar 14, 2024

Conversation

mufaddal-rohawala
Copy link
Member

@mufaddal-rohawala mufaddal-rohawala commented Mar 13, 2024

Issue #, if available:

Description of changes:
Recently SageMaker added support for real-time inference to help building interactive experiences for various generative AI applications such as chatbots, virtual assistants, and music generators. More Information in Launch Announcement.

The support in SageMaker Python SDK will now allow usage of existing Predictor Class abstractions to interact with the API_runtime_InvokeEndpointWithResponseStream. An example usage is as follows:

from sagemaker.iterators import LineIterator

initial_args = {"ContentType":"application/json"}
data = {"inputs": "what does AWS stand for?", "parameters": {"max_new_tokens":400}}
stream_iterator = predictor.predict_stream(
    data=json.dumps(data),
    initial_args=initial_args,
    iterator=LineIterator,
)

for line in stream_iterator:
    resp = json.loads(line)
    print(resp.get("outputs")[0], end='')

Testing done:
Tested the execution of notebook using the predict_stream functionality. Added an integ test for the same.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@mufaddal-rohawala mufaddal-rohawala requested a review from a team as a code owner March 13, 2024 17:42
@mufaddal-rohawala mufaddal-rohawala requested review from benieric and removed request for a team March 13, 2024 17:42
Copy link

codecov bot commented Mar 13, 2024

Codecov Report

Attention: Patch coverage is 91.25000% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 87.35%. Comparing base (15a40ff) to head (9b43eab).

Files Patch % Lines
src/sagemaker/iterators.py 89.83% 6 Missing ⚠️
src/sagemaker/exceptions.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4497   +/-   ##
=======================================
  Coverage   87.34%   87.35%           
=======================================
  Files         388      389    +1     
  Lines       36545    36625   +80     
=======================================
+ Hits        31921    31994   +73     
- Misses       4624     4631    +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@akrishna1995 akrishna1995 requested review from akrishna1995 and removed request for benieric March 13, 2024 18:36
@akrishna1995 akrishna1995 self-assigned this Mar 13, 2024
@mufaddal-rohawala
Copy link
Member Author

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: 936e9a1
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member Author

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 936e9a1
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member Author

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: 936e9a1
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member Author

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 936e9a1
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member Author

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: 9b43eab
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@liujiaorr liujiaorr merged commit fada4bf into aws:master Mar 14, 2024
8 checks passed
benieric added a commit that referenced this pull request Mar 19, 2024
* fix: make sure gpus are found in local_gpu run (#4384)

* fix: make sure gpus are found in local_gpu run

* fix: black formatting

* fix: adjust unit test

* feat: pin dll version to support python3.11 to the sdk (#4472)

Co-authored-by: Ashwin Krishna <ashwikri@amazon.com>

* fix: Skip No Canvas regions for test_deploy_best_candidate (#4477)

* prepare release v2.211.0

* update development version to v2.211.1.dev0

* change: Enhance model builder selection logic to include model size (#4429)

* change: Enhance model builder selection logic to include model size

* Fix conflicts

* Address PR comments

* fix formatting

* fix formatting of test

* Fix token in tasks.json

* Increase coverage for tests

* fix formatting

* Fix requirements

* Import code instead of importing accelerate

* Fix formatting

* Setup dependencies

* change: Upgrade smp to version 2.2 (#4479)

* upgrading smp to version 2.2

* fixing linting issue

* fixing syntax error with multiline if statement

* upgrading smp to version 2.2

* fixing linting issue

* fixing syntax error with multiline if statement

* fixing formatting

---------

Co-authored-by: Andrew Tian <tinandr@amazon.com>

* feat: Update SM Python SDK for PT 2.2.0 SM DLC (#4481)

* update pt2.2 sm training dlc pysdk

* update pt2.2 sm inference dlc pysdk and region list

* fix: Create custom tarfile extractall util to fix backward compatibility issue (#4476)

* fix: Create custom tarfile extractall util to fix backward compatibility issue

* Address review comments

* fix logger.error statements

* prepare release v2.212.0

* update development version to v2.212.1.dev0

* change: Update tblib constraint (#4452)

* fix: make unit tests compatible with pytest-xdist (#4486)

* fix: make unit tests compatible with pytest-xdist

* fix failing test

* feature: Add overriding logic in ModelBuilder when task is provided (#4460)

* feat: Add Optional task to Model

* Revert "feat: Add Optional task to Model"

This reverts commit fd3e86b.

* Add override logic in ModelBuilder with task provided

* Adjusted formatting

* Add extra unit tests for invalid inputs

* Address PR comments

* Add more test inputs to integration test

* Add model_metadata field to ModelBuilder

* Update doc

* Update doc

* Adjust formatting

---------

Co-authored-by: Samrudhi Sharma <samruds@amazon.com>
Co-authored-by: Xiong Zeng <xionzeng@amazon.com>

* feature: Accept user-defined env variables for the entry-point (#4175)

* fix: Move sagemaker pysdk version check after bootstrap in remote job (#4487)

* change: enable github actions for PRs (#4489)

* change: enable github actions for PRs

* Update codebuild-ci.yml

* trigger on pull_request_target

* add source-version-override

* fix permission

* feature: Add ModelDataSource and SourceUri support for model package and while registering (#4492)

Co-authored-by: Erick Benitez-Ramos <141277478+benieric@users.noreply.github.com>

* feat: support JumpStart proprietary models (#4467)

* feat: add proprietary manifest/specs parsing

add unittests for test_cache

small refactoring

address comments and more unittests

fix linting and fix more tests

fix: pylint

feat: JumpStartModel class for prop models

* remove unused imports and fix docstyle

* fix: remove unused args

* fix: remove unused args

* fix: more unused vars

* fix: slow tests

* fix: unittests

* added more tests to cover some lines

* remove estimator warn check

* chore: address comments re performance

* fix: address comments

* complete list experience and other fixes

* fix: pylint

* add doc utils and fix pylint

* fix: docstyle

* fix: doc

* fix: default payloads

* fix: doc and tags and enums

* fix: jumpstart doc

* rename to open_weights and fix filtering

* update filter name

* doc update

* fix: black

* rename to proprietary model and fix unittests

* address comments

* fix: docstyle and flake8

* address more comments and fix doc

* put back doc utils for future refactoring

* add prop model title in doc

* doc update

---------

Co-authored-by: liujiaor <128006184+liujiaorr@users.noreply.github.com>

* chore: emit warning when no instance specific gated training env var is available, and raise exception when accept_eula flag is not supplied (#4485)

* fix: raise exception when no instance specific gated training env var available

* chore: raise client exception if accept_eula flag is not set for gated models

* chore: address flake8 errors

* chore: emit warning when instance type is chosen with no gated training artifacts

* fix: sagemaker session region not being used (#4469)

* fix: sagemaker session region not being used

* chore: add unit tests

* fix: remove all JUMPSTART_DEFAULT_REGION_NAME default arguments

* chore: use get_region_fallback throughout

* chore: remove unnecessary if statement

* chore: remove unnecessary if statement (2)

---------

Co-authored-by: Erick Benitez-Ramos <141277478+benieric@users.noreply.github.com>

* fix: add PT 2.2 support for smdistributed, pytorchddp, and torch_distributed distributions (#4480)

* Add support for smdistributed, pytorchddp, torch_distributed for PT 2.2

* formatting

* formatting

---------

Co-authored-by: liujiaor <128006184+liujiaorr@users.noreply.github.com>

* change: split coverage out from testenv in tox.ini (#4495)

Co-authored-by: Ashwin Krishna <ashwikri@amazon.com>

* change: add ci-health checks (#4493)

* feat: tgi optimum 0.0.19, 0.0.20 releases (#4496)

* feature: Add support for Streaming Inference (#4497)

* feature: Add support for Streaming Inference

* fix: codestyle-docs-test

* fix: codestyle-docs-test

* Add AutoML -> AutoMLV2 mapper (#4500)

Co-authored-by: liujiaor <128006184+liujiaorr@users.noreply.github.com>

* Skip of tests which are long running and causing the ResourceLimitInUse exception (#4504)

* Improvement of the tuner documentation (#4506)

* prepare release v2.213.0

* update development version to v2.213.1.dev0

* fix:urge customers to install latest version (#4507)

* fix: list jumpstart models with invalid version strings (#4511)

* fix: list jumpstart models with invalid versions

* docstyle

* docstyle

* pylint

* add more test

* fix

* fix: skip failing pt test (#4512)

* fix: skip failing pt test

* black-format

---------

Co-authored-by: gv <gverkes@users.noreply.github.com>
Co-authored-by: akrishna1995 <38850354+akrishna1995@users.noreply.github.com>
Co-authored-by: Ashwin Krishna <ashwikri@amazon.com>
Co-authored-by: Kalyani Nikure <110067132+knikure@users.noreply.github.com>
Co-authored-by: ci <ci>
Co-authored-by: Samrudhi Sharma <154457034+samruds@users.noreply.github.com>
Co-authored-by: adtian2 <55163384+adtian2@users.noreply.github.com>
Co-authored-by: Andrew Tian <tinandr@amazon.com>
Co-authored-by: Sirut Buasai <73297481+sirutBuasai@users.noreply.github.com>
Co-authored-by: Danny Bushkanets <d.bushkanets@gmail.com>
Co-authored-by: xiongz945 <54782408+xiongz945@users.noreply.github.com>
Co-authored-by: Samrudhi Sharma <samruds@amazon.com>
Co-authored-by: Xiong Zeng <xionzeng@amazon.com>
Co-authored-by: martinRenou <martin.renou@gmail.com>
Co-authored-by: qidewenwhen <32910701+qidewenwhen@users.noreply.github.com>
Co-authored-by: mrudulmn <161017394+mrudulmn@users.noreply.github.com>
Co-authored-by: Haotian An <33510317+Captainia@users.noreply.github.com>
Co-authored-by: liujiaor <128006184+liujiaorr@users.noreply.github.com>
Co-authored-by: evakravi <69981223+evakravi@users.noreply.github.com>
Co-authored-by: ruhanprasad <52712386+ruhanprasad@users.noreply.github.com>
Co-authored-by: Mufaddal Rohawala <89424143+mufaddal-rohawala@users.noreply.github.com>
Co-authored-by: Anton Repushko <repushko.a@gmail.com>
malav-shastri pushed a commit to malav-shastri/sagemaker-python-sdk that referenced this pull request Jun 20, 2024
* feature: Add support for Streaming Inference

* fix: codestyle-docs-test

* fix: codestyle-docs-test
jiapinw pushed a commit to jiapinw/sagemaker-python-sdk that referenced this pull request Jun 25, 2024
* feature: Add support for Streaming Inference

* fix: codestyle-docs-test

* fix: codestyle-docs-test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants