You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While Model / FrameworkModel's prepare_container_def() supports (here) manually configuring script mode environment variables for an existing model.tar.gz package, HuggingFaceModel's override implementation does not (here). User-configured env={ "SAGEMAKER_PROGRAM", "SAGEMAKER_SUBMIT_DIRECTORY", ...} are ignored regardless of whether re-packing of new entrypoint code is requested.
This is important for importing large (multi-GB) pre-trained models to SageMaker inference, because it forces us to use the SDK class' re-packing functionality to add inference code... Which is significantly slower in some cases: Can reach tens of minutes extra delay.
To reproduce
Prepare a model.tar.gz in S3, already containing a code/inference.py alongside (whatever) model artifacts. For a simple reproduction, could use no model artifacts at all - and add trivial custom model loader to inference.py something like model_fn(data_dir): return lambda x: x.
In my current use case, my model artifacts are about 5GB and constructing/uploading this archive takes ~10min - regardless of whether the small script code is included.
Create and deploy a Hugging Face Model from the archive on S3 via SageMaker Python SDK, indicating what code directory and entry point should be used:
The endpoint will fail to find the inference.py entry point (and therefore will not correctly use the model_fn() and fail to load).
This is because the HuggingFaceModel overrides the SAGEMAKER_PROGRAM and SAGEMAKER_SUBMIT_DIRECTORY environment variables to empty even though no entry_point or source_dir are provided.
Expected behavior
The HuggingFaceModel should correctly propagate the user-specified environment variables, to support using a pre-prepared model.tar.gz without re-packing. In this case, the container would find the pre-loaded inference.py entry point and correctly use the override model_fn.
Screenshots or logs
N/A
System information
A description of your system. Please provide:
SageMaker Python SDK version: 2.92.1
Framework name (eg. PyTorch) or algorithm (eg. KMeans): HuggingFace
Framework version: 4.17
Python version: py38
CPU or GPU: GPU
Custom Docker image (Y/N): N
Additional context
I am able to deploy a working endpoint by having my code folder and inference.py locally and adding these options to the model: HuggingFaceModel(source_dir="code", entry_point="inference.py", ...).
The problem is this >doubles the time and resources taken to prepare the package:
10min to produce an initial "model-raw.tar.gz" and load to S3
10min for the SageMaker SDK to download that archive, extract and re-pack it to add code folder, and re-upload to a new location
Since the use case here is just to prepare the model from local artifacts+code, it would also be OK if model_data was able to accept a local, uncompressed folder: As the 10min tarball creation would still only need to be done once. From my tests though, this doesn't seem to be possible?
The text was updated successfully, but these errors were encountered:
As an interim measure, users could override this behaviour with a patch like this:
classPatchedHuggingFaceModel(HuggingFaceModel):
"""Modified Model class to allow manually setting SM Script Mode env vars"""defprepare_container_def(self, *args, **kwargs):
# Call the parent function:result=super().prepare_container_def(*args, **kwargs)
# ...But allow our manual env vars configuration to override the internals:manual_env=dict(self.env)
result["Environment"].update(manual_env)
returnresult
Describe the bug
While Model / FrameworkModel's prepare_container_def() supports (here) manually configuring script mode environment variables for an existing
model.tar.gz
package, HuggingFaceModel's override implementation does not (here). User-configuredenv={ "SAGEMAKER_PROGRAM", "SAGEMAKER_SUBMIT_DIRECTORY", ...}
are ignored regardless of whether re-packing of new entrypoint code is requested.This is important for importing large (multi-GB) pre-trained models to SageMaker inference, because it forces us to use the SDK class' re-packing functionality to add inference code... Which is significantly slower in some cases: Can reach tens of minutes extra delay.
To reproduce
model.tar.gz
in S3, already containing acode/inference.py
alongside (whatever) model artifacts. For a simple reproduction, could use no model artifacts at all - and add trivial custom model loader to inference.py something likemodel_fn(data_dir): return lambda x: x
.In my current use case, my model artifacts are about 5GB and constructing/uploading this archive takes ~10min - regardless of whether the small script code is included.
Observed behavior
The endpoint will fail to find the inference.py entry point (and therefore will not correctly use the
model_fn()
and fail to load).This is because the
HuggingFaceModel
overrides theSAGEMAKER_PROGRAM
andSAGEMAKER_SUBMIT_DIRECTORY
environment variables to empty even though noentry_point
orsource_dir
are provided.Expected behavior
The
HuggingFaceModel
should correctly propagate the user-specified environment variables, to support using a pre-preparedmodel.tar.gz
without re-packing. In this case, the container would find the pre-loadedinference.py
entry point and correctly use the overridemodel_fn
.Screenshots or logs
N/A
System information
A description of your system. Please provide:
Additional context
I am able to deploy a working endpoint by having my
code
folder andinference.py
locally and adding these options to the model:HuggingFaceModel(source_dir="code", entry_point="inference.py", ...)
.The problem is this >doubles the time and resources taken to prepare the package:
Since the use case here is just to prepare the model from local artifacts+code, it would also be OK if
model_data
was able to accept a local, uncompressed folder: As the 10min tarball creation would still only need to be done once. From my tests though, this doesn't seem to be possible?The text was updated successfully, but these errors were encountered: