Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLServer model-settings.json uri conversion #40

Closed
xvnyv opened this issue Mar 27, 2023 · 5 comments · Fixed by #43
Closed

MLServer model-settings.json uri conversion #40

xvnyv opened this issue Mar 27, 2023 · 5 comments · Fixed by #43

Comments

@xvnyv
Copy link
Contributor

xvnyv commented Mar 27, 2023

I am experiencing some strange behaviour regarding the uri conversion of the parameters.uri field in the model-settings.json file. I have created a model-settings.json file that is placed in the same S3 folder as my model. In most cases, the uri conversion works as expected.

For example, with my model filename as my-model.joblib, I provided the following model-settings.json file

{
  "implementation": "mlserver_sklearn.SKLearnModel",
  "name": "sklearn-example",
  "parameters": {
    "uri": "my-model.joblib"
  }
}

and the name and parameters.uri fields were correctly overwritten into these values when I checked the file from within the serving runtime pod

{
  "implementation": "mlserver_sklearn.SKLearnModel",
  "name": "sklearn-mnist-2__isvc-a6929dd134",
  "parameters": {
    "uri": "/models/_mlserver_models/sklearn-mnist-2__isvc-a6929dd134/my-model.joblib"
  }
}

However, the peculiar thing is that if I were to name my model file as mnist.joblib instead and change my model-settings.json to the following,

{
  "implementation": "mlserver_sklearn.SKLearnModel",
  "name": "sklearn-example",
  "parameters": {
    "uri": "mnist.joblib"
  }
}

then my model would fail to load due to this error in the mlserver container:

2023-03-27 09:46:20,423 [mlserver.grpc] ERROR - Invalid URI specified for model sklearn-mnist-2__isvc-a6929dd134 (/models/_mlserver_models/sklearn-mnist-2__isvc-a6929dd134/models/sklearn-mnist-2__isvc-a6929dd134/sklearn-model/mnist.joblib)
Traceback (most recent call last):
  File "/venv/lib/python3.11/site-packages/mlserver/grpc/utils.py", line 44, in _inner
    return await f(self, request, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/mlserver/grpc/model_repository.py", line 24, in _inner
    return await f(self, request, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/mlserver/grpc/model_repository.py", line 48, in RepositoryModelLoad
    await self._handlers.load(request.model_name)
  File "/venv/lib/python3.11/site-packages/mlserver/handlers/model_repository.py", line 67, in load
    model = await self._model_registry.load(model_settings)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/mlserver/registry.py", line 283, in load
    return await self._models[model_settings.name].load(model_settings)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/mlserver/registry.py", line 141, in load
    await self._load_model(new_model)
  File "/venv/lib/python3.11/site-packages/mlserver/registry.py", line 158, in _load_model
    await model.load()
  File "/venv/lib/python3.11/site-packages/mlserver_sklearn/sklearn.py", line 34, in load
    model_uri = await get_model_uri(
                ^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/mlserver/utils.py", line 44, in get_model_uri
    raise InvalidModelURI(settings.name, full_model_path)
mlserver.errors.InvalidModelURI: Invalid URI specified for model sklearn-mnist-2__isvc-a6929dd134 (/models/_mlserver_models/sklearn-mnist-2__isvc-a6929dd134/models/sklearn-mnist-2__isvc-a6929dd134/sklearn-model/mnist.joblib)

I have checked the model-settings.json file in the serving runtime pod and it appears that the parameters.uri value has indeed been set to /models/_mlserver_models/sklearn-mnist-2__isvc-a6929dd134/models/sklearn-mnist-2__isvc-a6929dd134/sklearn-model/mnist.joblib.

I am currently using MLServer v1.2.3 and kserve v0.9.0.

May I know if anyone is able to reproduce this? If so, are there any clues as to what the issue might be? And if not, what might I be doing wrong that is causing this error?

@tjohnson31415
Copy link
Contributor

Hello @xvnyv. Thanks for including all of these details in the issue! That is strange behavior, it seems that the path is being replicated for some reason, and I don't see how the name of the file should matter 🤔

The model files in /models/_mlserver_models/$MODEL_ID should have symlinks back to the unmodified downloaded files in /models/$MODEL_ID (the model-settings.json file is not a symlink because it is modified). So the path logged as /models/_mlserver_models/sklearn-mnist-2__isvc-a6929dd134/models/sklearn-mnist-2__isvc-a6929dd134/sklearn-model/mnist.joblib looks like the resolved symlink path was treated as a relative path instead of absolute (./models instead of /models). I'll see if I can reproduce this, but a few ideas of things to check in the meantime:

  • Please check the structure of the files in both /models and /models/_mlserver_models to see if the symlinks are correct (i.e. pointing to valid files)
  • You should be able to see the original model-settings.json file in /models/$MODEL_ID just to double-check that the source JSON has the content you expect in case the uri somehow got set to models/sklearn-mnist-2__isvc-a6929dd134/sklearn-model/mnist.joblib in the source file
  • Something to try (that shouldn't be necessary), is to specify the uri with ./ prefixed

@xvnyv
Copy link
Contributor Author

xvnyv commented Mar 28, 2023

Hi @tjohnson31415, thanks for the quick response! I have tried out your suggestions and this is what I have found:

  • I have checked and the symlinks are pointing to the correct files:
nobody@modelmesh-serving-mlserver-1-5d9bd78df5-zc4r9:/opt/mlserver$ ls -la /models/_mlserver_models/mnist-name__isvc-9539dbe97a/
total 16
drwxr-sr-x 2 nobody nogroup 4096 Mar 28 01:37 .
drwxr-sr-x 5 nobody nogroup 4096 Mar 28 01:37 ..
lrwxrwxrwx 1 nobody nogroup   73 Mar 28 01:37 mnist.joblib -> /models/mnist-name__isvc-9539dbe97a/sklearn-model-mnist-name/mnist.joblib
-rw-r--r-- 1 nobody nogroup  258 Mar 28 01:37 model-settings.json

nobody@modelmesh-serving-mlserver-1-5d9bd78df5-zc4r9:/opt/mlserver$ ls -la /models/mnist-name__isvc-9539dbe97a/sklearn-model-mnist-name/
total 352
drwxr-sr-x 2 nobody nogroup   4096 Mar 28 01:35 .
drwxr-sr-x 3 nobody nogroup   4096 Mar 28 01:35 ..
-rw-r--r-- 1 nobody nogroup 344817 Mar 28 01:35 mnist.joblib
-rw-r--r-- 1 nobody nogroup    134 Mar 28 01:35 model-settings.json
  • I have also checked the original model-settings.json and it looks like the uri is the same as the value I provided
nobody@modelmesh-serving-mlserver-1-5d9bd78df5-zc4r9:/opt/mlserver$ cat /models/mnist-name__isvc-9539dbe97a/sklearn-model-mnist-name/model-settings.json
{
  "implementation": "mlserver_sklearn.SKLearnModel",
  "name": "sklearn-example",
  "parameters": {
    "uri": "mnist.joblib"
  }
}
  • I have attempted to set the uri to ./mnist.joblib instead, but the same issue persists
nobody@modelmesh-serving-mlserver-1-5d9bd78df5-zc4r9:/opt/mlserver$ cat /models/mnist-name__isvc-9539dbe97a/sklearn-model-mnist-name/model-settings.json
{
  "implementation": "mlserver_sklearn.SKLearnModel",
  "name": "sklearn-example",
  "parameters": {
    "uri": "./mnist.joblib"
  }
}
2023-03-28 01:40:26,412 [mlserver.grpc] ERROR - Invalid URI specified for model mnist-name__isvc-9539dbe97a (/models/_mlserver_models/mnist-name__isvc-9539dbe97a/models/mnist-name__isvc-9539dbe97a/sklearn-model-mnist-name/mnist.joblib)
Traceback (most recent call last):
  File "/venv/lib/python3.11/site-packages/mlserver/grpc/utils.py", line 44, in _inner
    return await f(self, request, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/mlserver/grpc/model_repository.py", line 24, in _inner
    return await f(self, request, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/mlserver/grpc/model_repository.py", line 48, in RepositoryModelLoad
    await self._handlers.load(request.model_name)
  File "/venv/lib/python3.11/site-packages/mlserver/handlers/model_repository.py", line 67, in load
    model = await self._model_registry.load(model_settings)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/mlserver/registry.py", line 283, in load
    return await self._models[model_settings.name].load(model_settings)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/mlserver/registry.py", line 141, in load
    await self._load_model(new_model)
  File "/venv/lib/python3.11/site-packages/mlserver/registry.py", line 158, in _load_model
    await model.load()
  File "/venv/lib/python3.11/site-packages/mlserver_sklearn/sklearn.py", line 34, in load
    model_uri = await get_model_uri(
                ^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/mlserver/utils.py", line 44, in get_model_uri
    raise InvalidModelURI(settings.name, full_model_path)
mlserver.errors.InvalidModelURI: Invalid URI specified for model mnist-name__isvc-9539dbe97a (/models/_mlserver_models/mnist-name__isvc-9539dbe97a/models/mnist-name__isvc-9539dbe97a/sklearn-model-mnist-name/mnist.joblib)

@tjohnson31415
Copy link
Contributor

tjohnson31415 commented Mar 28, 2023

Yep, everything there checks out.

I was able to reproduce it as well and the adapter code logs the incorrect rewrite directly:

1.6799840658311071e+09 INFO Rewrote model uri in settings file {"before": "mnist.joblib", "after": "/opt/app/model-mesh-mlserver-adapter/server/testdata/generated/_mlserver_models/sklearn-rewrite/opt/app/model-mesh-mlserver-adapter/server/testdata/generated/sklearn-rewrite/model/mnist.joblib"}

The relevant code in the MLServer adapter is here. I did some additional testing and think that I understand the problem: util.SecureJoin will resolve symlinks if the path that it is joining has any symlinks and it enforces that the symlink is within the scope of the first path element. So this issue comes up if the model files are symlinked before the model-settings.json is processed, which is dependent on the filesystem ordering of the files from ioutil.ReadDir... Definitely a bug in the code.

Since you found the bug, would you like to contribute a fix?

@xvnyv
Copy link
Contributor Author

xvnyv commented Mar 30, 2023

@tjohnson31415 ohh thanks for finding the issue, that makes sense. Sure, I'll do up a fix once I have some time available.

kserve-oss-bot pushed a commit that referenced this issue Apr 13, 2023
#### Motivation

As documented in [this issue](#40), there is currently a bug in the conversion of the `uri` provided in the MLServer `model-settings.json` file if the model files are symlinked before the uri conversion occurs. This PR is intended to fix the issue by ensuring that `model-settings.json` will always be processed first.

#### Modifications

The function `adaptModelLayoutForRuntime` in the MLServer adapter `server.go` was modified to ensure that, if present, the `model-settings.json` file will be placed at the head of the list of files to be processed. This makes sure that the config file will always be processed before any symlinks are created for the model files.

A unit test was also added to ensure that the bug was fixed.

#### Result

Incorrect uri conversion bug no longer occurs even when the model files alphabetically precede the filename`model-settings.json`.


Signed-off-by: xinyitan <xin.yi.tan@sap.com>

Co-authored-by: xinyitan <xin.yi.tan@sap.com>
@ckadner
Copy link
Member

ckadner commented Apr 14, 2023

Resolved by #43

@ckadner ckadner closed this as completed Apr 14, 2023
Jooho pushed a commit to red-hat-data-services/modelmesh-runtime-adapter that referenced this issue May 16, 2023
…ahub-io#43)

#### Motivation

As documented in [this issue](kserve#40), there is currently a bug in the conversion of the `uri` provided in the MLServer `model-settings.json` file if the model files are symlinked before the uri conversion occurs. This PR is intended to fix the issue by ensuring that `model-settings.json` will always be processed first.

#### Modifications

The function `adaptModelLayoutForRuntime` in the MLServer adapter `server.go` was modified to ensure that, if present, the `model-settings.json` file will be placed at the head of the list of files to be processed. This makes sure that the config file will always be processed before any symlinks are created for the model files.

A unit test was also added to ensure that the bug was fixed.

#### Result

Incorrect uri conversion bug no longer occurs even when the model files alphabetically precede the filename`model-settings.json`.


Signed-off-by: xinyitan <xin.yi.tan@sap.com>

Co-authored-by: xinyitan <xin.yi.tan@sap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants