Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic Image Annotation: Steps for custom model deployment #3457

Closed
vgupta13 opened this issue Jul 25, 2021 · 9 comments
Closed

Automatic Image Annotation: Steps for custom model deployment #3457

vgupta13 opened this issue Jul 25, 2021 · 9 comments
Labels
question Further information is requested

Comments

@vgupta13
Copy link

Hello team, I have a doubt on how to create our own function.yaml, main.py and model_loader.py files for a fine tuned object detection model on a custom dataset from the pre-trained model zoo of tensorflow (e.g., ssdmobilenet). Could you please help me in creating these files or share the documentation, if available?

As far as I remember, in the previous version of cvat, it was possible through the use of .bin and .xml files (derived from the openvino model optimizer) along with the label_map.json and interp.py script. But, with the architectural changes (e.g., introduction of nuclio for serverless deployment, etc), the tradional way is deprecated; hence making things difficult for me to grasp.

PS: I have gone through the serverless-tutorial instructions and other related issues, but couldn't get the anwser.

@nmanovic
Copy link
Contributor

@vgupta13 , could you please read https://openvinotoolkit.github.io/cvat/docs/manual/advanced/serverless-tutorial/ and clarify what is not clear in the tutorial? We will improve it.

@nmanovic nmanovic added the question Further information is requested label Jul 26, 2021
@vgupta13
Copy link
Author

@nmanovic for example: the changes are required to be made in the function.yaml file when the user wants to symbolically link the local directory path of the custom model instead of downloading it from the internet?

Here is the postCopy section from function.yaml that I have configured:

  postCopy:
    - kind: RUN
      value: ln -s /mnt/c/Users/vgupta/models/faster_rcnn_inception_resnet_v2_640_640_coco17_tpu_8 faster_rcnn
    - kind: RUN
      value: pip install pillow pyyaml

The deployment was successful, but the container is attempting to restart repeatedly. On further investigation at docker level, the log information (see below) indicates that the symbolic link might not be working.

l{"datetime": "2021-07-29 17:31:35,325", "level": "error", "message": "Caught unhandled exception while initializing", "with": {"err": "/opt/nuclio/faster_rcnn/frozen_inference_graph.pb; No such file or directory", "traceback": "Traceback (most recent call last):\n File "/opt/nuclio/_nuclio_wrapper.py", line 350, in run_wrapper\n args.trigger_name)\n File "/opt/nuclio/_nuclio_wrapper.py", line 80, in init\n getattr(entrypoint_module, 'init_context')(self._context)\n File "/opt/nuclio/main.py", line 12, in init_context\n model_handler = ModelLoader(model_path)\n File "/opt/nuclio/model_loader.py", line 15, in init\n serialized_graph = fid.read()\n File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/lib/io/file_io.py", line 122, in read\n self._preread_check()\n File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/lib/io/file_io.py", line 84, in _preread_check\n compat.as_bytes(self.__name), 1024 * 512)\ntensorflow.python.framework.errors_impl.NotFoundError: /opt/nuclio/faster_rcnn/frozen_inference_graph.pb; No such file or directory\n", "worker_id": "0"}

Could you please help me in configuring the function.yaml file correctly in this case?

@vgupta13
Copy link
Author

@nmanovic here is the deployment log:

Deploying custommodel/faster_rcnn_inception_v2_coco function...
21.07.31 10:30:12.880 nuctl (I) Deploying function {"name": ""}
21.07.31 10:30:12.880 nuctl (I) Building {"versionInfo": "Label: 1.5.16, Git commit: ae43a6a560c2bec42d7ccfdf6e8e11a1e3cc3774, OS: linux, Arch: amd64, Go version: go1.14.3", "name": ""}
21.07.31 10:30:13.704 nuctl (I) Cleaning up before deployment {"functionName": "tf-faster-rcnn-inception-v2-coco"}
21.07.31 10:30:13.866 nuctl (I) Staging files and preparing base images
21.07.31 10:30:13.927 nuctl (I) Building processor image {"imageName": "cvat/tf.faster_rcnn_inception_v2_coco:latest"}
21.07.31 10:30:13.927 nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/handler-builder-python-onbuild:1.5.16-amd64"}
21.07.31 10:30:18.362 nuctl.platform.docker (W) Docker command outputted to stderr - this may result in errors {"workingDir": "/tmp/nuclio-build-016383421/staging", "cmd": "docker build --network host --force-rm -t nuclio-onbuild-c42gk6evpa8ds2nq3b90 -f /tmp/nuclio-build-016383421/staging/Dockerfile.onbuild --build-arg NUCLIO_BUILD_LOCAL_HANDLER_DIR=handler --build-arg NUCLIO_LABEL=1.5.16 --build-arg NUCLIO_ARCH=amd64 .", "stderr": "#1 [internal] load build definition from Dockerfile.onbuild\n#1 sha256:326e0e7c4bdbf056b396dc918fcb7a8fbb2dcc136135d072f7a1d263dbd0c110\n#1 transferring dockerfile: 148B done\n#1 DONE 0.0s\n\n#2 [internal] load .dockerignore\n#2 sha256:8e59153fefd67665dc68fa40823b08776fe7436a505cc35c17f18a2b95d896cd\n#2 transferring context: 2B done\n#2 DONE 0.0s\n\n#3 [internal] load metadata for quay.io/nuclio/handler-builder-python-onbuild:1.5.16-amd64\n#3 sha256:3d4c217b8dd4735b204a58a813d2ff5a52b7e584fa30653b3b5b38b4c94539f0\n#3 DONE 0.0s\n\n#4 [1/1] FROM quay.io/nuclio/handler-builder-python-onbuild:1.5.16-amd64\n#4 sha256:a3bb79ea4dc1e3f9ab4814a1d071094b6d7a3946ba3c32c40ce3f16b9f75d0fd\n#4 CACHED\n\n#5 exporting to image\n#5 sha256:e8c613e07b0b7ff33893b694f7759a10d42e180f2b4dc349fb57dc6b71dcab00\n#5 exporting layers done\n#5 writing image sha256:708a12fe13b56186c5cc413457939deb77bd05f4cea06e23899aee91ce2adb48 done\n#5 naming to docker.io/library/nuclio-onbuild-c42gk6evpa8ds2nq3b90 done\n#5 DONE 0.0s\n\nUse 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them\n"}
21.07.31 10:30:19.715 nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/uhttpc:0.0.1-amd64"}
21.07.31 10:30:24.337 nuctl.platform.docker (W) Docker command outputted to stderr - this may result in errors {"workingDir": "/tmp/nuclio-build-016383421/staging", "cmd": "docker build --network host --force-rm -t nuclio-onbuild-c42gk7uvpa8ds2nq3b9g -f /tmp/nuclio-build-016383421/staging/Dockerfile.onbuild --build-arg NUCLIO_LABEL=1.5.16 --build-arg NUCLIO_ARCH=amd64 --build-arg NUCLIO_BUILD_LOCAL_HANDLER_DIR=handler .", "stderr": "#1 [internal] load build definition from Dockerfile.onbuild\n#1 sha256:54a1f2da32310304b7d2eb125b04d1989a25f1f892bc9858622ed0c1a07ed4b8\n#1 transferring dockerfile: 123B done\n#1 DONE 0.0s\n\n#2 [internal] load .dockerignore\n#2 sha256:434af54efe86cfbbbde8403b2d7cd06068991034a1de07d317ff2fd915c26dd7\n#2 transferring context: 2B done\n#2 DONE 0.0s\n\n#3 [internal] load metadata for quay.io/nuclio/uhttpc:0.0.1-amd64\n#3 sha256:4aba9ea98350709bc44db6d4b0f46352caf4a7f2c8d89585e87a03a7a14007b3\n#3 DONE 0.0s\n\n#4 [1/1] FROM quay.io/nuclio/uhttpc:0.0.1-amd64\n#4 sha256:92bc73b1ee90a814263ae3231095a19e72949aebd7bf4477b119cea12b042ff9\n#4 CACHED\n\n#5 exporting to image\n#5 sha256:e8c613e07b0b7ff33893b694f7759a10d42e180f2b4dc349fb57dc6b71dcab00\n#5 exporting layers done\n#5 writing image sha256:1d7a80c3d68af5b05150876d5d966124a5b38181a67443addc1eef553b11de08 done\n#5 naming to docker.io/library/nuclio-onbuild-c42gk7uvpa8ds2nq3b9g done\n#5 DONE 0.0s\n\nUse 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them\n"}
21.07.31 10:30:25.334 nuctl.platform (I) Building docker image {"image": "cvat/tf.faster_rcnn_inception_v2_coco:latest"}
21.07.31 10:30:43.543 nuctl.platform.docker (W) Docker command outputted to stderr - this may result in errors {"workingDir": "/tmp/nuclio-build-016383421/staging", "cmd": "docker build --network host --force-rm -t cvat/tf.faster_rcnn_inception_v2_coco:latest -f /tmp/nuclio-build-016383421/staging/Dockerfile.processor --build-arg NUCLIO_ARCH=amd64 --build-arg NUCLIO_BUILD_LOCAL_HANDLER_DIR=handler --build-arg NUCLIO_LABEL=1.5.16 .", "stderr": "#1 [internal] load build definition from Dockerfile.processor\n#1 sha256:ac0b95736b20df537b0ce6ea86c327f5800e51ba2603b12e4a9727ac331f121b\n#1 DONE 0.0s\n\n#1 [internal] load build definition from Dockerfile.processor\n#1 sha256:ac0b95736b20df537b0ce6ea86c327f5800e51ba2603b12e4a9727ac331f121b\n#1 transferring dockerfile: 965B done\n#1 DONE 0.0s\n\n#2 [internal] load .dockerignore\n#2 sha256:5a4ba6c83eed4b662dbd7570974a90a7552ce544aefacba608f465815f9e76d3\n#2 transferring context: 2B done\n#2 DONE 0.0s\n\n#3 [internal] load metadata for docker.io/tensorflow/tensorflow:2.1.1\n#3 sha256:7fa3c602749e04f3e28638c9c1b1e7bdcb2b985270e5d63e4f0916d207fd9b37\n#3 ...\n\n#4 [auth] tensorflow/tensorflow:pull token for registry-1.docker.io\n#4 sha256:6ec0c2fc726b5d0f923b1e41c7ab94c168049aada9bbee274d1910116d393730\n#4 DONE 0.0s\n\n#3 [internal] load metadata for docker.io/tensorflow/tensorflow:2.1.1\n#3 sha256:7fa3c602749e04f3e28638c9c1b1e7bdcb2b985270e5d63e4f0916d207fd9b37\n#3 DONE 4.9s\n\n#5 [1/9] FROM docker.io/tensorflow/tensorflow:2.1.1@sha256:2904f332656af61c76145523676f8431dd32f64800b4211a97bc7b7d0176a8db\n#5 sha256:f8a4395b3ab109af9030013f88dae77d74a9d1a9114ee8fbde8584bbfe6f267c\n#5 DONE 0.0s\n\n#7 [internal] load build context\n#7 sha256:2b05d18e8e32d73eb7e89d26a44d88841e400eaf0a2ce0969d2c725079328cc3\n#7 transferring context: 11.86kB done\n#7 DONE 0.0s\n\n#9 [4/9] COPY artifacts/py /opt/nuclio/\n#9 sha256:e318a963a4127a6567080a5ab2ec35d13eb498c5433a1db9e13ca39ef84a36c7\n#9 CACHED\n\n#6 [2/9] WORKDIR /opt/nuclio\n#6 sha256:b25837a8bd7302ff0e6ac34acaae355cc321d72495ca3bb7d3825798db469509\n#6 CACHED\n\n#8 [3/9] COPY artifacts/processor /usr/local/bin/processor\n#8 sha256:c2c746be05dc24a2f420a00962210f5d2c14411054fff10c3edf6113dca196b1\n#8 CACHED\n\n#10 [5/9] COPY artifacts/uhttpc /usr/local/bin/uhttpc\n#10 sha256:991d471e43b7d107a676d9977f0b65c330fee4e1995db8696569ef7def86cbe9\n#10 CACHED\n\n#11 [6/9] COPY handler /opt/nuclio\n#11 sha256:22c55155ccd923b5d4445cc0fa339fb370629337818b89bd9040ff2a6386da2e\n#11 DONE 0.1s\n\n#12 [7/9] RUN ln -s /mnt/c/users/vgupta/models/faster_rcnn_inception_resnet_v2_640_640_coco17_tpu_8 faster_rcnn\n#12 sha256:cf0d1e1d99c2aa66802950c06cfd1be7bad8ccdc618f5974cf00aef120adad07\n#12 DONE 0.2s\n\n#13 [8/9] RUN pip install pillow pyyaml\n#13 sha256:97ab69801d2809b5843e9a98a953bb6839aedf2ce744f5488c644ec1c83393f7\n#13 2.025 Collecting pillow\n#13 2.319 Downloading Pillow-8.3.1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (3.0 MB)\n#13 3.884 Collecting pyyaml\n#13 3.916 Downloading PyYAML-5.4.1-cp36-cp36m-manylinux1_x86_64.whl (640 kB)\n#13 4.453 Installing collected packages: pillow, pyyaml\n#13 4.931 Successfully installed pillow-8.3.1 pyyaml-5.4.1\n#13 5.246 WARNING: You are using pip version 20.1.1; however, version 21.2.2 is available.\n#13 5.246 You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.\n#13 DONE 5.8s\n\n#14 [9/9] RUN pip install nuclio-sdk msgpack --no-index --find-links /opt/nuclio/whl\n#14 sha256:227a4369d49ad382c407c185d3f9520a95cc230428c2173a4331162bc2f76f91\n#14 0.814 Looking in links: /opt/nuclio/whl\n#14 0.850 Processing ./whl/nuclio_sdk-0.2.0-py2.py3-none-any.whl\n#14 0.885 Processing ./whl/msgpack-0.6.1.tar.gz\n#14 1.315 Building wheels for collected packages: msgpack\n#14 1.317 Building wheel for msgpack (setup.py): started\n#14 5.554 Building wheel for msgpack (setup.py): finished with status 'done'\n#14 5.555 Created wheel for msgpack: filename=msgpack-0.6.1-cp36-cp36m-linux_x86_64.whl size=229870 sha256=b38e1f8719d077cb4daf10f09977df7c720542187df64758a15e83964c6e8b68\n#14 5.555 Stored in directory: /root/.cache/pip/wheels/19/df/c3/08424fe5285667aff8a58a69c995c723ffa81c4208138d6d0f\n#14 5.555 Successfully built msgpack\n#14 5.760 Installing collected packages: nuclio-sdk, msgpack\n#14 5.850 Successfully installed msgpack-0.6.1 nuclio-sdk-0.2.0\n#14 DONE 6.0s\n\n#15 exporting to image\n#15 sha256:e8c613e07b0b7ff33893b694f7759a10d42e180f2b4dc349fb57dc6b71dcab00\n#15 exporting layers\n#15 exporting layers 0.2s done\n#15 writing image sha256:35cb4a614ce9e5e68eee20205e117a922ce55b82ee5365f1b5ec7fc32748d909 0.0s done\n#15 naming to docker.io/cvat/tf.faster_rcnn_inception_v2_coco:latest done\n#15 DONE 0.3s\n\nUse 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them\n"}
21.07.31 10:30:43.543 nuctl.platform (I) Pushing docker image into registry {"image": "cvat/tf.faster_rcnn_inception_v2_coco:latest", "registry": ""}
21.07.31 10:30:43.543 nuctl.platform (I) Docker image was successfully built and pushed into docker registry {"image": "cvat/tf.faster_rcnn_inception_v2_coco:latest"}
21.07.31 10:30:43.543 nuctl (I) Build complete {"result": {"Image":"cvat/tf.faster_rcnn_inception_v2_coco:latest","UpdatedFunctionConfig":{"metadata":{"name":"tf-faster-rcnn-inception-v2-coco","namespace":"nuclio","labels":{"nuclio.io/project-name":"cvat"},"annotations":{"framework":"tensorflow","name":"Faster RCNN via Tensorflow","spec":"[\n { "id": 1, "name": "car" },\n { "id": 2, "name": "aeroplane" },\n { "id": 3, "name": "book" },\n { "id": 4, "name": "fish" }\n]\n","type":"detector"}},"spec":{"description":"Faster RCNN from Tensorflow Object Detection API","handler":"main:handler","runtime":"python:3.6","resources":{},"image":"cvat/tf.faster_rcnn_inception_v2_coco:latest","targetCPU":75,"triggers":{"default-http":{"class":"","kind":"http","name":"default-http","maxWorkers":1}},"volumes":[{"volume":{"name":"volume-1","hostPath":{"path":"/mnt/c/users/vgupta/cvat/serverless/common"}},"volumeMount":{"name":"volume-1","mountPath":"/opt/nuclio/common"}}],"build":{"image":"cvat/tf.faster_rcnn_inception_v2_coco","baseImage":"tensorflow/tensorflow:2.1.1","directives":{"postCopy":[{"kind":"RUN","value":"ln -s /mnt/c/users/vgupta/models/faster_rcnn_inception_resnet_v2_640_640_coco17_tpu_8 faster_rcnn"},{"kind":"RUN","value":"pip install pillow pyyaml"}],"preCopy":[{"kind":"WORKDIR","value":"/opt/nuclio"}]},"codeEntryType":"image"},"platform":{"attributes":{"mountMode":"volume","restartPolicy":{"maximumRetryCount":3,"name":"always"}}},"readinessTimeoutSeconds":60,"securityContext":{},"eventTimeout":"30s"}}}}
21.07.31 10:30:45.925 nuctl.platform (I) Waiting for function to be ready {"timeout": 60}
21.07.31 10:30:48.332 nuctl (I) Function deploy complete {"functionName": "tf-faster-rcnn-inception-v2-coco", "httpPort": 0}
NAMESPACE | NAME | PROJECT | STATE | NODE PORT | REPLICAS
nuclio | tf-faster-rcnn-inception-v2-coco | cvat | ready | 0 | 1/1

@vgupta13
Copy link
Author

vgupta13 commented Aug 1, 2021

@nmanovic I learned about the problem in establishing a symbolic link to a local storage inside docker config. My workaround is - putting the model file at cloud storage (e.g., google drive) and get a shareable link, followed by setting directives leveraging wget inside docker config to download and unpack, and then create a symbolic link. However, now I am getting another issue related to the function invocation:

Error: Inference status for the task 1 is failed. requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://nuclio:8070/api/function_invocations

I see another thread on this topic, but so far no success. Let me know if you have any leads.

PS: As you might be aware that tensorflow deprecated support to the tf.session in tf2.x. It is not trivial now to convert a tf2.x model into a tf1.x frzoen graph. Could you please consider extending the support to tf2.x?

@nmanovic
Copy link
Contributor

nmanovic commented Aug 2, 2021

@vgupta13 , why do you think that it is possible to use tf1 frozen graph only? Basically you can use even your own DL framework inside serverless functions. We don't have any limitations. If you write instructions for your serverless functions correctly, it should work. See troubleshooting section inside serverless tutorial. Probably it will answer on some your questions. Also debugging section can help as well.

@JoshChristie
Copy link

@vgupta13 what are your contents in /mnt/c/Users/vgupta/models/faster_rcnn_inception_resnet_v2_640_640_coco17_tpu_8?

The error is saying that it can't find /opt/nuclio/faster_rcnn/frozen_inference_graph.pb;.

@vgupta13
Copy link
Author

vgupta13 commented Aug 6, 2021

@JoshChristie as I already mentioned, it was due to the symbolic link directives inside function.yml file. I have resolved this issue.

@nmanovic : the HTTPError: 500 pertaining to the function invocation has been resolved after providing a port number explicitly inside the function.yml. It would be really nice if you could add instructions for defining a port inside the function.yml in the serverless-tutorial.

@OnceUponATimeMathley
Copy link

@vgupta13 ,

I want to deploy skeleton type by serverless function, but the label has the svg template, I define the label in function.yaml file. In the annnotations.spec, How can I check how a dictionary has keys and values? (ex: "id": 1, "name":"person",...)
How can I define the skeleton label in function.yaml file. Thanks.

198343140-4ea169d4-c528-4b52-95b0-694b76c11adc

@vipulg13
Copy link

@OnceUponATimeMathley sorry for being late in responding to your query. Did you manage to find a solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants