Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nuclio Automatic/SemiAutomatic AI Tool Functions not running on GPU #2489

Closed
machinsk opened this issue Nov 25, 2020 · 3 comments
Closed

Nuclio Automatic/SemiAutomatic AI Tool Functions not running on GPU #2489

machinsk opened this issue Nov 25, 2020 · 3 comments
Assignees

Comments

@machinsk
Copy link

machinsk commented Nov 25, 2020

Not sure if this is an elephant in the room with Intel not wanting support for Nvidia GPUs, but the models CVAT uses with the Nuclio serverless functions do not run with GPU acceleration. The documentation is not great on this topic, but I've pieced together everything I could find.

I'm deploying CVAT to an AWS EC2 g4dn.xlarge instance, 2nd Generation Intel Xeon, 4 vCPUs, 16GBs RAM, Nvidia Tesla T4 GPU with 16 GB vRAM, running Ubuntu 20.04. I can get all the functions to appear from serverless/deploy.sh under Models and most of them run (which is a different issue), but all that do run, run on the CPU.

Now, I've tried running with and without nvidia-docker (which is not mentioned to use in the installation instructions). I've tried the suggestion here. I have the nvidia gaming drivers installed for the G4 instances. I have docker version 19.03.13, nvidia drivers 445.48 and CUDA version 11.0.

We even tried changing cvat/requirements/base.txt to use tensorflow-gpu opposed to tensorflow.

To test, we use a 180 frame video in a task, from the /task page in the UI, select the ellipsis beside Action and select Automatic annotation and select Faster RCNN via Tensorflow Model (mostly), then in the terminal use gpustat to check for usage on the Tesla T4 GPU, which has only read zero so far. htop shows over 100% utilization on a nuclio task as the progression bar moves in the UI however.

Our next trial was gonna be to try a P3 instance as the documentation (someone please add a link to this in the installation guide) suggests. (Nuclio installation instructions need improvement badly too)

Thanks for the help.

@nmanovic
Copy link
Contributor

@machinsk , a serverless function with a DL model inside just a docker container with some specific bindings to nuclio framework and python code to run the model. If the function itself can be running on GPU, it will run on GPU. For now all functions inside CVAT repository are not optimized to run on GPU. If somebody can contribute and improve them, we will be more than happy to accept the PR.

@nmanovic nmanovic self-assigned this Nov 25, 2020
@jahaniam
Copy link
Contributor

jahaniam commented Dec 8, 2020

@nmanovic I agree that there are many missing steps in the documentation for semi annotation and I had to dig into codes in both cvat and nuclio. After some digging, finally, I was able to run the tensorflow fasterRCNN model on the GPU. I will do a PR soon.

@nmanovic
Copy link
Contributor

nmanovic commented Jul 6, 2021

It looks like the issue was resolved by @jahaniam. I will close it.

@nmanovic nmanovic closed this as completed Jul 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants