Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: JumpStart CuratedHub class creation and function definitions #4448

Merged

Conversation

jinyoung-lim
Copy link
Contributor

@jinyoung-lim jinyoung-lim commented Feb 22, 2024

Issue #, if available:

Description of changes:

Set up class and function definitions for CuratedHub .

Testing done:

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • [] I have checked that my tests are not configured for a specific region or account (if appropriate)
  • [] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

Ran below locally:

black -l 100 .               
flake8
python -m pylint --rcfile=.pylintrc -j 0 src/**/sagemaker/**/jumpstart/*hub*
python -m pytest tests/unit/**/jumpstart -W ignore::DeprecationWarning

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jinyoung-lim jinyoung-lim requested a review from a team as a code owner February 22, 2024 21:25
@jinyoung-lim jinyoung-lim requested review from mohanasudhan and removed request for a team February 22, 2024 21:25
@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 4922511
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

return f"sagemaker-hubs-{region}-{account_id}"


def create_hub(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not add this to sagemaker.session module? It seems more appropriate there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see PR description but after having a discussion with @bencrabtree , we want to keep the session HubAPI calls to be just a bare-bone wrapper for Hub API calls and have hubutils to handle any custom logics.

Comment on lines 122 to 132
document_schema_version,
hub_name,
hub_content_name,
hub_content_type,
hub_content_document,
hub_content_display_name,
hub_content_description,
hub_content_version,
hub_content_markdown,
hub_content_search_keywords,
tags,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this many arguments, can we use kwarg style args? hub_name=hub_name, ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will do.

"""

if hub_bucket_name is None:
hub_bucket_name = _generate_default_hub_bucket_name(sagemaker_session)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to create the default bucket here too if we're going to have a default value. I'd actually suggest to make the value required until this is the case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm that is a good point. Will handle one way or the other.


def list_hub_contents(
hub_name: str,
hub_content_type: HubDataType.MODEL or HubDataType.NOTEBOOK,
Copy link
Collaborator

@bencrabtree bencrabtree Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this type be just HubDataType?

Edit: we actually should rename it HubContentType

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HubDataType also has HUB

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 4922511
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 4922511
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: 4922511
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: 4922511
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 4922511
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 45a4a4d
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 45a4a4d
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: 45a4a4d
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 45a4a4d
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: 45a4a4d
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: f4da2ad
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: f4da2ad
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: f4da2ad
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: f4da2ad
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: f4da2ad
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

self.get_region(),
self._s3_client
)
model_specs = JumpStartModelSpecs(hub_content.content_document, is_hub_content=True)
# TODO: Parse HubContentDescription
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll actually parse hub_model_description.hub_content_document inside of JumpStartModelSpecs. You can see I stubbed out the code for from_content_document in that data class

src/sagemaker/jumpstart/cache.py Outdated Show resolved Hide resolved
self,
hub_name: str,
region: str = JUMPSTART_DEFAULT_REGION_NAME,
sagemaker_session: Optional[Session] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have conflicts in my PR #4439, but we should default to DEFAULT_JUMPSTART_SAGEMAKER_SESSION here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually thought about that and decided to derive session from the region if left None.DEFAULT_JUMPSTART_SAGEMAKER_SESSION is basically Session with JUMPSTART_DEFAULT_REGION_NAME which is us-west-2. It is a bit weird to get both region and the session, even with the regions check in line 37. I guess we can derive the region from the sagemaker_session actually.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry just seeing this reply- yes you're totally right. We don't need the region param since we can derive that from session

src/sagemaker/jumpstart/curated_hub/curated_hub.py Outdated Show resolved Hide resolved
) -> HubContentDescription:
"""Returns descriptive information about the Hub model."""

sagemaker_session = Session(boto3.Session(region_name=region))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For both fns in this file, I think we should 1/ accept a custom SM Session and 2/ default to the DEFAULT_JUMPSTART_SAGEMAKER_SESSION

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason behind not using the DEFAULT_JUMPSTART_SAGEMAKER_SESSION was because Hub is technically not just for JS.

"""Enum for Hub data storage objects."""

HUB = "hub"
MODEL = "model"
NOTEBOOK = "notebook"

@classmethod
@property
def content_only(cls):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice!

@@ -1001,6 +1007,206 @@ def __init__(
self.id_info = id_info


class HubContentDependency(JumpStartDataHolderType):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually didn't know we had this in the response :) Curious to see how we can leverage this in the copy workflow

return json_obj


class HubContentDescription(JumpStartDataHolderType):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just call this HubContent

self.hub_content_version: str = json_obj["hub_content_version"]
self.hub_name: str = json_obj["hub_name"]

def to_json(self) -> Dict[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can this method be inherited from a parent class? Looks the same as to_json of HubDescription

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: dddd0a6
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

hub_info = hub.describe()
return JumpStartCachedContentValue(formatted_content=hub_info)
if data_type == HubType.HUB:
hub_name, _, _, _ = hub_utils.get_info_from_hub_resource_arn(id_info)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u make the return type a data class in the next PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarifying question: Did you mean something like this?

            hub_info: HubInfo = hub_utils.get_info_from_hub_resource_arn(id_info)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, exactly!

if bucket_name is None:
bucket_name: str = generate_default_hub_bucket_name(sagemaker_session)

sagemaker_session._create_s3_bucket_if_it_does_not_exist(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may want to do an ownership check, in case someone snipes the bucket

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 there is also the question of how we would work around this if the bucket is already taken. Perhaps we say this is very unlikely, but we should at least emit a warn in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aye aye it seems that the base function does have the ownership check. Wanted to try reusing existing functions but I think we just need to write up our version so we can check ownership.

)

model_specs = JumpStartModelSpecs(DescribeHubContentsResponse(hub_model_description), is_hub_content=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative would be to wrap the response in Session with DescribeHubContentResponse. I'd like to hear @evakravi or @akozd opinion on this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should do one or the other, and stick with that throughout the session module. Personally I'd prefer returning the class rather than a dictionary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did make all data classes for HubAPI responses. But it seems that having those as return types in session would cause a circular dependency as we have to import jumpstart.types in session and import session in jumpstart.types, which says it is not a good organization. I will leave the session return types as dictionary and type case somewhere else.

return json_obj


class DescribeHubContentsResponse(JumpStartDataHolderType):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo? I think this should be DescribeHubContentResponse, singular

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: dddd0a6
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: dddd0a6
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: dddd0a6
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: dddd0a6
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: dddd0a6
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: dddd0a6
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: dddd0a6
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: dddd0a6
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: dddd0a6
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@evakravi evakravi merged commit 352a5c1 into aws:master-jumpstart-curated-hub Feb 29, 2024
5 of 6 checks passed
hub = CuratedHub(hub_name=info.hub_name, region=info.region)
hub_content = hub.describe_model(
model_name=info.hub_content_name, model_version=info.hub_content_version
hub_model_description: Dict[str, Any] = self._sagemaker_session.describe_hub_content(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need an existence check on the session variable since it could be None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had self._sagemaker_session to default to DEFAULT_JUMPSTART_SAGEMAKER_SESSION unless it changed... I will take a look in the next PR.

)

model_specs = JumpStartModelSpecs(DescribeHubContentsResponse(hub_model_description), is_hub_content=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep the constructor for JumpStartModelSpecs as taking just the spec dictionary? The from_hub_content() method could easily be a function outside of that class. This keeps us from going down a route where we may end up violating Single Responsibility Principle on JumpStartModelSpecs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whittech1 To clarify, are you pointing out the typecase in

model_specs = JumpStartModelSpecs(DescribeHubContentsResponse(hub_model_description), is_hub_content=True)

and instead it should be:

model_specs = JumpStartModelSpecs(hub_model_description, is_hub_content=True)

?

Copy link
Contributor

@whittech1 whittech1 Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am saying that the constructor for JumpStartModelSpecs does not need is_hub_content as a flag. The caller can do:

def convert_hub_content_to_js_model_specs()
    ....
    # return a dict that matches `JumpStartModelSpecs` constructor's dictionary


....

spec_dict = convert_hub_content_to_js_model_spec(DescribeHubContentsResponse(hub_model_description))
model_specs = JumpStartModelSpecs(spec_dict)

if bucket_name is None:
bucket_name: str = generate_default_hub_bucket_name(sagemaker_session)

sagemaker_session._create_s3_bucket_if_it_does_not_exist(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 there is also the question of how we would work around this if the bucket is already taken. Perhaps we say this is very unlikely, but we should at least emit a warn in that case.

jinyoung-lim added a commit to jinyoung-lim/sagemaker-python-sdk that referenced this pull request Mar 8, 2024
bencrabtree pushed a commit to bencrabtree/sagemaker-python-sdk that referenced this pull request Mar 13, 2024
bencrabtree pushed a commit to bencrabtree/sagemaker-python-sdk that referenced this pull request Mar 13, 2024
benieric added a commit that referenced this pull request Mar 15, 2024
* prepare release v2.210.0

* update development version to v2.210.1.dev0

* feat: Add new Triton DLC URIs (#4432)

* Add new Triton DLC URIs

* Update according to black and pylint

* feat: Support selective pipeline execution between function step and regular step (#4392)

* feat: Add AutoMLV2 support (#4461)

* Add AutoMLV2 support

* Improvements of the integration tests

---------

Co-authored-by: Anton Repushko <repuanto@amazon.com>

* feature: Add TensorFlow 2.14 image configs (#4446)

* fix: remove enable_network_isolation from the python doc (#4465)

Co-authored-by: Rohan Gujarathi <gujrohan@amazon.com>

* doc: Add doc for new feature processor APIs and classes (#4250)

* fix: properly close sagemaker config file after loading config (#4457)

Closes #4456

* feat: instance specific jumpstart host requirements (#4397)

* feat: instance specific jumpstart host requirements

* chore: add js support for copies resource requirement, enforce coupling with ResourceRequirements class

* fix: typing

* fix: pylint

* change: Bump Apache Airflow version to 2.8.2 (#4470)

* Update tox.ini

* Update test_requirements.txt

* fix: make sure gpus are found in local_gpu run (#4384)

* fix: make sure gpus are found in local_gpu run

* fix: black formatting

* fix: adjust unit test

* feat: pin dll version to support python3.11 to the sdk (#4472)

Co-authored-by: Ashwin Krishna <ashwikri@amazon.com>

* fix: Skip No Canvas regions for test_deploy_best_candidate (#4477)

* prepare release v2.211.0

* update development version to v2.211.1.dev0

* change: Enhance model builder selection logic to include model size (#4429)

* change: Enhance model builder selection logic to include model size

* Fix conflicts

* Address PR comments

* fix formatting

* fix formatting of test

* Fix token in tasks.json

* Increase coverage for tests

* fix formatting

* Fix requirements

* Import code instead of importing accelerate

* Fix formatting

* Setup dependencies

* change: Upgrade smp to version 2.2 (#4479)

* upgrading smp to version 2.2

* fixing linting issue

* fixing syntax error with multiline if statement

* upgrading smp to version 2.2

* fixing linting issue

* fixing syntax error with multiline if statement

* fixing formatting

---------

Co-authored-by: Andrew Tian <tinandr@amazon.com>

* feat: Update SM Python SDK for PT 2.2.0 SM DLC (#4481)

* update pt2.2 sm training dlc pysdk

* update pt2.2 sm inference dlc pysdk and region list

* fix: Create custom tarfile extractall util to fix backward compatibility issue (#4476)

* fix: Create custom tarfile extractall util to fix backward compatibility issue

* Address review comments

* fix logger.error statements

* prepare release v2.212.0

* update development version to v2.212.1.dev0

* change: Update tblib constraint (#4452)

* fix: make unit tests compatible with pytest-xdist (#4486)

* fix: make unit tests compatible with pytest-xdist

* fix failing test

* feature: Add overriding logic in ModelBuilder when task is provided (#4460)

* feat: Add Optional task to Model

* Revert "feat: Add Optional task to Model"

This reverts commit fd3e86b.

* Add override logic in ModelBuilder with task provided

* Adjusted formatting

* Add extra unit tests for invalid inputs

* Address PR comments

* Add more test inputs to integration test

* Add model_metadata field to ModelBuilder

* Update doc

* Update doc

* Adjust formatting

---------

Co-authored-by: Samrudhi Sharma <samruds@amazon.com>
Co-authored-by: Xiong Zeng <xionzeng@amazon.com>

* feature: Accept user-defined env variables for the entry-point (#4175)

* fix: Move sagemaker pysdk version check after bootstrap in remote job (#4487)

* change: enable github actions for PRs (#4489)

* change: enable github actions for PRs

* Update codebuild-ci.yml

* trigger on pull_request_target

* add source-version-override

* fix permission

* feature: Add ModelDataSource and SourceUri support for model package and while registering (#4492)

Co-authored-by: Erick Benitez-Ramos <141277478+benieric@users.noreply.github.com>

* feat: support JumpStart proprietary models (#4467)

* feat: add proprietary manifest/specs parsing

add unittests for test_cache

small refactoring

address comments and more unittests

fix linting and fix more tests

fix: pylint

feat: JumpStartModel class for prop models

* remove unused imports and fix docstyle

* fix: remove unused args

* fix: remove unused args

* fix: more unused vars

* fix: slow tests

* fix: unittests

* added more tests to cover some lines

* remove estimator warn check

* chore: address comments re performance

* fix: address comments

* complete list experience and other fixes

* fix: pylint

* add doc utils and fix pylint

* fix: docstyle

* fix: doc

* fix: default payloads

* fix: doc and tags and enums

* fix: jumpstart doc

* rename to open_weights and fix filtering

* update filter name

* doc update

* fix: black

* rename to proprietary model and fix unittests

* address comments

* fix: docstyle and flake8

* address more comments and fix doc

* put back doc utils for future refactoring

* add prop model title in doc

* doc update

---------

Co-authored-by: liujiaor <128006184+liujiaorr@users.noreply.github.com>

* chore: emit warning when no instance specific gated training env var is available, and raise exception when accept_eula flag is not supplied (#4485)

* fix: raise exception when no instance specific gated training env var available

* chore: raise client exception if accept_eula flag is not set for gated models

* chore: address flake8 errors

* chore: emit warning when instance type is chosen with no gated training artifacts

* change: bump jinja2 to 3.1.3 in doc/requirments.txt (#4421) (#4423)

* change: bump jinja2 to 3.1.3 in doc/requirments.txt (#4421)

* change: bump jinja2 to 3.1.3 in doc/requirments.txt

* Update requirements.txt

* feature: TGI 1.4.0 (#4424)

* documentation: fix the ClarifyCheckStep documentation to mention PDP (#4259)

* documentation: fix the ClarifyCheckStep documentation to mention PDP support

* fix: break the lines to meet pylint requirement

---------

Co-authored-by: Shing Lyu <shinglyu@amazon.nl>

* documentation: Explain the ClarifyCheckStep and QualityCheckStep parameters (#4261)

* documentation: explain the ClarifyCheckStep and QualityCheckStep parameters

* fix: remove trailing space

---------

Co-authored-by: Shing Lyu <shinglyu@amazon.nl>

* feat: Telemetry metrics (#4414)

* Emit additional telemetry metrics

* Fix unit tests

* Emit endpoint failure to telemetry

* Address PR Comments

* Emit latency in telemetry

* Address PR Comments

* Addressed PR Comments

* Address PR Comments

* Fix tests

* Fix integ tests

---------

Co-authored-by: Jonathan Makunga <makung@amazon.com>
Co-authored-by: Erick Benitez-Ramos <141277478+benieric@users.noreply.github.com>

* documentation: change order of pipelines topics (#4427)

* prepare release v2.208.0

* update development version to v2.208.1.dev0

* feature: AutoGluon 1.0.0 image_uris update (#4426)

---------

Co-authored-by: Erick Benitez-Ramos <141277478+benieric@users.noreply.github.com>
Co-authored-by: Jinyoung Lim <jj.lim418@gmail.com>
Co-authored-by: Shing Lyu <shing.lyu@gmail.com>
Co-authored-by: Shing Lyu <shinglyu@amazon.nl>
Co-authored-by: Jonathan Makunga <54963715+makungaj1@users.noreply.github.com>
Co-authored-by: Jonathan Makunga <makung@amazon.com>
Co-authored-by: stacicho <stacicho@amazon.com>
Co-authored-by: ci <ci>
Co-authored-by: tonyhu <tonyhoo@users.noreply.github.com>

* feat: add hub and hubcontent support in retrieval function for jumpstart model cache (#4438)

* feat: jsch jumpstart estimator support (#4439)

* Master jumpstart curated hub (#4464)

* add hub_arn support for accept_types, content_types, serializers, deserializers, and predictor (#4463)

* feature: JumpStart CuratedHub class creation and function definitions (#4448)

* MultiPartCopy with Sync Algorithm (#4475)

* first pass at sync function with util classes

* adding tests and update clases

* linting

* file generator class inheritance

* lint

* multipart copy and algorithm updates

* modularize sync

* reformatting folders

* testing for sync

* do not tolerate vulnerable

* remove prints

* handle multithreading progress bar

* update tests

* optimize function and add hub bucket prefix

* docstrings and linting

* rebase with master

* bad rebase

* trying to fix codecov

* uncomment codebuild-ci

---------

Co-authored-by: ci <ci>
Co-authored-by: Nikhil Kulkarni <knikhil29@gmail.com>
Co-authored-by: qidewenwhen <32910701+qidewenwhen@users.noreply.github.com>
Co-authored-by: Anton Repushko <repushko.a@gmail.com>
Co-authored-by: Anton Repushko <repuanto@amazon.com>
Co-authored-by: Sai Parthasarathy Miduthuri <54188298+saimidu@users.noreply.github.com>
Co-authored-by: Rohan Gujarathi <gujarathi.rohan@gmail.com>
Co-authored-by: Rohan Gujarathi <gujrohan@amazon.com>
Co-authored-by: cansun <80425164+can-sun@users.noreply.github.com>
Co-authored-by: Justin <justinm088@hotmail.com>
Co-authored-by: evakravi <69981223+evakravi@users.noreply.github.com>
Co-authored-by: Kalyani Nikure <110067132+knikure@users.noreply.github.com>
Co-authored-by: gv <gverkes@users.noreply.github.com>
Co-authored-by: akrishna1995 <38850354+akrishna1995@users.noreply.github.com>
Co-authored-by: Ashwin Krishna <ashwikri@amazon.com>
Co-authored-by: Samrudhi Sharma <154457034+samruds@users.noreply.github.com>
Co-authored-by: adtian2 <55163384+adtian2@users.noreply.github.com>
Co-authored-by: Andrew Tian <tinandr@amazon.com>
Co-authored-by: Sirut Buasai <73297481+sirutBuasai@users.noreply.github.com>
Co-authored-by: Danny Bushkanets <d.bushkanets@gmail.com>
Co-authored-by: Erick Benitez-Ramos <141277478+benieric@users.noreply.github.com>
Co-authored-by: xiongz945 <54782408+xiongz945@users.noreply.github.com>
Co-authored-by: Samrudhi Sharma <samruds@amazon.com>
Co-authored-by: Xiong Zeng <xionzeng@amazon.com>
Co-authored-by: martinRenou <martin.renou@gmail.com>
Co-authored-by: mrudulmn <161017394+mrudulmn@users.noreply.github.com>
Co-authored-by: Haotian An <33510317+Captainia@users.noreply.github.com>
Co-authored-by: liujiaor <128006184+liujiaorr@users.noreply.github.com>
Co-authored-by: Jinyoung Lim <jj.lim418@gmail.com>
Co-authored-by: Shing Lyu <shing.lyu@gmail.com>
Co-authored-by: Shing Lyu <shinglyu@amazon.nl>
Co-authored-by: Jonathan Makunga <54963715+makungaj1@users.noreply.github.com>
Co-authored-by: Jonathan Makunga <makung@amazon.com>
Co-authored-by: stacicho <stacicho@amazon.com>
Co-authored-by: tonyhu <tonyhoo@users.noreply.github.com>
bencrabtree pushed a commit to bencrabtree/sagemaker-python-sdk that referenced this pull request Mar 18, 2024
benieric pushed a commit that referenced this pull request Mar 18, 2024
* fix: Move sagemaker pysdk version check after bootstrap in remote job (#4487)

* feat: support JumpStart proprietary models (#4467)

* feat: add proprietary manifest/specs parsing

add unittests for test_cache

small refactoring

address comments and more unittests

fix linting and fix more tests

fix: pylint

feat: JumpStartModel class for prop models

* remove unused imports and fix docstyle

* fix: remove unused args

* fix: remove unused args

* fix: more unused vars

* fix: slow tests

* fix: unittests

* added more tests to cover some lines

* remove estimator warn check

* chore: address comments re performance

* fix: address comments

* complete list experience and other fixes

* fix: pylint

* add doc utils and fix pylint

* fix: docstyle

* fix: doc

* fix: default payloads

* fix: doc and tags and enums

* fix: jumpstart doc

* rename to open_weights and fix filtering

* update filter name

* doc update

* fix: black

* rename to proprietary model and fix unittests

* address comments

* fix: docstyle and flake8

* address more comments and fix doc

* put back doc utils for future refactoring

* add prop model title in doc

* doc update

---------

Co-authored-by: liujiaor <128006184+liujiaorr@users.noreply.github.com>

* feat: add hub and hubcontent support in retrieval function for jumpstart model cache (#4438)

* feat: jsch jumpstart estimator support (#4439)

* Master jumpstart curated hub (#4464)

* add hub_arn support for accept_types, content_types, serializers, deserializers, and predictor (#4463)

* feature: JumpStart CuratedHub class creation and function definitions (#4448)

* MultiPartCopy with Sync Algorithm (#4475)

* first pass at sync function with util classes

* adding tests and update clases

* linting

* file generator class inheritance

* lint

* multipart copy and algorithm updates

* modularize sync

* reformatting folders

* testing for sync

* do not tolerate vulnerable

* remove prints

* handle multithreading progress bar

* update tests

* optimize function and add hub bucket prefix

* docstrings and linting

* rebase with master

* bad rebase

* support for gated and training unsupported

* merge with master-curated-jumpstart

* linting

* update types

* update

* update bootstrap

* fix codecov

---------

Co-authored-by: qidewenwhen <32910701+qidewenwhen@users.noreply.github.com>
Co-authored-by: Haotian An <33510317+Captainia@users.noreply.github.com>
Co-authored-by: liujiaor <128006184+liujiaorr@users.noreply.github.com>
Co-authored-by: Jinyoung Lim <jj.lim418@gmail.com>
bencrabtree pushed a commit to bencrabtree/sagemaker-python-sdk that referenced this pull request Mar 20, 2024
bencrabtree pushed a commit to bencrabtree/sagemaker-python-sdk that referenced this pull request Mar 21, 2024
bencrabtree pushed a commit to bencrabtree/sagemaker-python-sdk that referenced this pull request Mar 23, 2024
bencrabtree pushed a commit to bencrabtree/sagemaker-python-sdk that referenced this pull request Mar 23, 2024
bencrabtree pushed a commit to bencrabtree/sagemaker-python-sdk that referenced this pull request Mar 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants