Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HuggingFaceModel does not properly accept script mode environment variables #3361

Closed
athewsey opened this issue Sep 16, 2022 · 1 comment · Fixed by #4175
Closed

HuggingFaceModel does not properly accept script mode environment variables #3361

athewsey opened this issue Sep 16, 2022 · 1 comment · Fixed by #4175

Comments

@athewsey
Copy link
Collaborator

Describe the bug

While Model / FrameworkModel's prepare_container_def() supports (here) manually configuring script mode environment variables for an existing model.tar.gz package, HuggingFaceModel's override implementation does not (here). User-configured env={ "SAGEMAKER_PROGRAM", "SAGEMAKER_SUBMIT_DIRECTORY", ...} are ignored regardless of whether re-packing of new entrypoint code is requested.

This is important for importing large (multi-GB) pre-trained models to SageMaker inference, because it forces us to use the SDK class' re-packing functionality to add inference code... Which is significantly slower in some cases: Can reach tens of minutes extra delay.

To reproduce

  • Prepare a model.tar.gz in S3, already containing a code/inference.py alongside (whatever) model artifacts. For a simple reproduction, could use no model artifacts at all - and add trivial custom model loader to inference.py something like model_fn(data_dir): return lambda x: x.

In my current use case, my model artifacts are about 5GB and constructing/uploading this archive takes ~10min - regardless of whether the small script code is included.

  • Create and deploy a Hugging Face Model from the archive on S3 via SageMaker Python SDK, indicating what code directory and entry point should be used:
model = HuggingFaceModel(
    model_data = "s3://.../model.tar.gz",  # (Contains code/inference.py)  
    role=sagemaker.get_execution_role(),
    py_version="py38",
    pytorch_version="1.10",
    transformers_version="4.17",
    env={
        "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
        "SAGEMAKER_PROGRAM": "inference.py",
        "SAGEMAKER_REGION": "ap-southeast-1",
        "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
    },
)
predictor = model.deploy(instance_type="ml.g4dn.xlarge", initial_instance_count=1)

Observed behavior

The endpoint will fail to find the inference.py entry point (and therefore will not correctly use the model_fn() and fail to load).

This is because the HuggingFaceModel overrides the SAGEMAKER_PROGRAM and SAGEMAKER_SUBMIT_DIRECTORY environment variables to empty even though no entry_point or source_dir are provided.

Expected behavior

The HuggingFaceModel should correctly propagate the user-specified environment variables, to support using a pre-prepared model.tar.gz without re-packing. In this case, the container would find the pre-loaded inference.py entry point and correctly use the override model_fn.

Screenshots or logs

N/A

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.92.1
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): HuggingFace
  • Framework version: 4.17
  • Python version: py38
  • CPU or GPU: GPU
  • Custom Docker image (Y/N): N

Additional context

I am able to deploy a working endpoint by having my code folder and inference.py locally and adding these options to the model: HuggingFaceModel(source_dir="code", entry_point="inference.py", ...).

The problem is this >doubles the time and resources taken to prepare the package:

  • 10min to produce an initial "model-raw.tar.gz" and load to S3
  • 10min for the SageMaker SDK to download that archive, extract and re-pack it to add code folder, and re-upload to a new location

Since the use case here is just to prepare the model from local artifacts+code, it would also be OK if model_data was able to accept a local, uncompressed folder: As the 10min tarball creation would still only need to be done once. From my tests though, this doesn't seem to be possible?

@athewsey athewsey added the bug label Sep 16, 2022
@athewsey
Copy link
Collaborator Author

As an interim measure, users could override this behaviour with a patch like this:

class PatchedHuggingFaceModel(HuggingFaceModel):
    """Modified Model class to allow manually setting SM Script Mode env vars"""

    def prepare_container_def(self, *args, **kwargs):
        # Call the parent function:
        result = super().prepare_container_def(*args, **kwargs)
        # ...But allow our manual env vars configuration to override the internals:
        manual_env = dict(self.env)
        result["Environment"].update(manual_env)
        return result

Now you are free to:

model = PatchedHuggingFaceModel(
    model_data="s3://doc-example-bucket/pre-packed/model.tar.gz"
    env={
        # "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
        "SAGEMAKER_PROGRAM": "inference.py",
        # "SAGEMAKER_REGION": region,
        "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
    },
    ...
)
model.deploy(...)

(...Where the pre-uploaded tarball already contains code/inference.py, code/requirements.txt and whatever else)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants