Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build successfully, but container didn't start #58

Open
bltcn opened this issue Nov 3, 2021 · 15 comments
Open

build successfully, but container didn't start #58

bltcn opened this issue Nov 3, 2021 · 15 comments

Comments

@bltcn
Copy link

bltcn commented Nov 3, 2021

win11,wsl2,ubuntu18.04
微信图片_20211103184203
How can I deal with it?

@bltcn bltcn changed the title 编译成功之后,但是不启动container build successfully, but container didn't start Nov 3, 2021
@SthPhoenix
Copy link
Owner

Hi! I haven't tested image on windows.
Have you checked container logs?

@bltcn
Copy link
Author

bltcn commented Nov 4, 2021

Preparing models...
[04:56:39] INFO - Preparing 'glintr100' model...
[04:56:39] INFO - Building TRT engine for glintr100...
[TensorRT] WARNING: onnx2trt_utils.cpp:362: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead
[TensorRT] WARNING: GPU error during getBestTactic: Conv_0 : invalid argument
[TensorRT] ERROR: 10: [optimizer.cpp::computeCosts::1855] Error Code 10: Internal Error (Could not find any implementation for node Conv_0.)
Traceback (most recent call last):
File "prepare_models.py", line 54, in
prepare_models()
File "prepare_models.py", line 49, in prepare_models
prepare_backend(model_name=model, backend_name=backend_name, im_size=max_size, force_fp16=force_fp16,
File "/app/modules/model_zoo/getter.py", line 157, in prepare_backend
convert_onnx(temp_onnx_model,
File "/app/modules/converters/onnx_to_trt.py", line 84, in convert_onnx
assert not isinstance(engine, type(None))
AssertionError
Starting InsightFace-REST using 1 workers.
[04:56:51] INFO - 1
[04:56:51] INFO - MAX_BATCH_SIZE: 1
[04:56:51] INFO - Reshaping ONNX inputs to: (1, 3, 640, 640)
[04:56:51] INFO - In shape: [dim_value: 1
, dim_value: 3
, dim_param: "?"
, dim_param: "?"
]
[04:56:51] INFO - Building TRT engine for scrfd_10g_gnkps...
[TensorRT] WARNING: onnx2trt_utils.cpp:362: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead
[TensorRT] WARNING: GPU error during getBestTactic: Conv_0 + Relu_1 : invalid argument
[TensorRT] ERROR: 10: [optimizer.cpp::computeCosts::1855] Error Code 10: Internal Error (Could not find any implementation for node Conv_0 + Relu_1.)
Traceback (most recent call last):
File "/usr/local/bin/uvicorn", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1128, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/uvicorn/main.py", line 425, in main
run(app, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/uvicorn/main.py", line 447, in run
server.run()
File "/usr/local/lib/python3.8/dist-packages/uvicorn/server.py", line 68, in run
return asyncio.run(self.serve(sockets=sockets))
File "/usr/lib/python3.8/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/usr/local/lib/python3.8/dist-packages/uvicorn/server.py", line 76, in serve
config.load()
File "/usr/local/lib/python3.8/dist-packages/uvicorn/config.py", line 448, in load
self.loaded_app = import_from_string(self.app)
File "/usr/local/lib/python3.8/dist-packages/uvicorn/importer.py", line 21, in import_from_string
module = importlib.import_module(module_str)
File "/usr/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 848, in exec_module
File "", line 219, in _call_with_frames_removed
File "/app/./app.py", line 36, in
processing = Processing(det_name=configs.models.det_name, rec_name=configs.models.rec_name,
File "/app/./modules/processing.py", line 180, in init
self.model = FaceAnalysis(det_name=det_name, rec_name=rec_name, ga_name=ga_name, device=device,
File "/app/./modules/face_model.py", line 78, in init
self.det_model = Detector(det_name=det_name, device=device, max_size=self.max_size,
File "/app/./modules/face_model.py", line 37, in init
self.retina = get_model(det_name, backend_name=backend_name, force_fp16=force_fp16, im_size=max_size,
File "/app/./modules/model_zoo/getter.py", line 203, in get_model
model_path = prepare_backend(model_name, backend_name, im_size=im_size, max_batch_size=max_batch_size,
File "/app/./modules/model_zoo/getter.py", line 157, in prepare_backend
convert_onnx(temp_onnx_model,
File "/app/./modules/converters/onnx_to_trt.py", line 84, in convert_onnx
assert not isinstance(engine, type(None))
AssertionError

@SthPhoenix
Copy link
Owner

Have you tried running other GPU based containers on wsl2, like TensorFlow benchmarks, to verify your wsl2 is properly configured for GPU usage?

@SthPhoenix
Copy link
Owner

@bltcn
Copy link
Author

bltcn commented Nov 4, 2021

Run "nbody -benchmark [-numbodies=]" to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies= (number of bodies (>= 1) to run in simulation)
-device= (where d=0,1,2.... for the CUDA device to use)
-numdevices= (where i=(number of CUDA devices > 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Windowed mode
Simulation data stored in video memory
Single precision floating point simulation
1 Devices used for simulation
GPU Device 0: "Pascal" with compute capability 6.1

Compute 6.1 CUDA device: [NVIDIA GeForce GTX 1060]
10240 bodies, total time for 10 iterations: 8.868 ms
= 118.245 billion interactions per second
= 2364.896 single-precision GFLOP/s at 20 flops per interaction

@SthPhoenix
Copy link
Owner

Hm, then TensorRT should work as expected.

I can double check that latest published version of InsightFace-REST works out of the box, but unfortunately I can't help you with running it on Windows.

@SthPhoenix
Copy link
Owner

I have checked building from scratch with clean clone from repo - everything works as intended on Ubuntu 20.04.

Looks like it's WSL related problem.

@bltcn
Copy link
Author

bltcn commented Nov 5, 2021

thanks, I have tested cpu version. it works fine.maybe there is somthing wrong with parameters in this case

@SthPhoenix
Copy link
Owner

SthPhoenix commented Nov 5, 2021

Quote from Nvidia page above:

With the NVIDIA Container Toolkit for Docker 19.03, only --gpus all is supported.

This might be the case, since deploy_trt.sh tries to set specific GPU. Try replacing line 99 with --gpus all

Though according to the same document there also might be issues with pinned memory required for TensorRT, and issues with concurrent CUDA streams.

If pinned memory is also the issue you can try add RUN $PIP_INSTALL onnxruntime-gpu to Dockerfile_trt and switch inference backend to onnx in deploy_trt.sh at line 105

@bltcn
Copy link
Author

bltcn commented Nov 5, 2021

thanks. i will try

@SthPhoenix
Copy link
Owner

Hi! Any updates? Have you managed to run it under WSL2?

@bltcn
Copy link
Author

bltcn commented Dec 23, 2021

sorry,I just see your reply. I will try.

@SthPhoenix
Copy link
Owner

sorry,I just see your reply. I will try.

Looks like WSL2 just wasn't supported by TensorRT, but according to change log latest TensorRT version should support it. Try using 21.12 TensorRT image.

@talebolano
Copy link

sorry,I just see your reply. I will try.

Looks like WSL2 just wasn't supported by TensorRT, but according to change log latest TensorRT version should support it. Try using 21.12 TensorRT image.

i try 21.12 and 22.01TensorRT image, unfortunately,all failed. 21.12 report GPU error during getBestTactic, 22.01 report Cuda failure: integrity checks failed

@SthPhoenix
Copy link
Owner

i try 21.12 and 22.01TensorRT image, unfortunately,all failed. 21.12 report GPU error during getBestTactic, 22.01 report Cuda failure: integrity checks failed

Have you tried running other GPU based containers on WSL2 to ensure everything is installed correctly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants