Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime eror before my code return #2804

Closed
yueyihua opened this issue Jan 9, 2020 · 3 comments
Closed

Runtime eror before my code return #2804

yueyihua opened this issue Jan 9, 2020 · 3 comments

Comments

@yueyihua
Copy link

yueyihua commented Jan 9, 2020

Describe the bug
Runtime eror betwen inference and code return:
terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException'
what(): /home/yyh/3rdparty/onnxruntime-bak/onnxruntime/core/providers/cuda/cuda_call.cc:97 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] /home/yyh/3rdparty/onnxruntime-bak/onnxruntime/core/providers/cuda/cuda_call.cc:91 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 4: driver shutting down ; GPU=32767 ; hostname=greenet ; expr=cudaEventSynchronize(e);
Stacktrace:

Stacktrace:

Program received signal SIGABRT, Aborted.
0x00007fffec35b2c7 in raise () from /usr/lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 cairo-1.15.12-3.el7.x86_64 expat-2.1.0-8.el7.x86_64 ffmpeg-libs-3.4.6-1.el7.x86_64 fontconfig-2.13.0-4.3.el7.x86_64 freetype-2.8-12.el7_6.1.x86_64 fribidi-1.0.2-1.el7.x86_64 gdk-pixbuf2-2.36.12-3.el7.x86_64 gflags-2.1.1-6.el7.x86_64 glib2-2.56.1-4.el7_6.x86_64 glibc-2.17-260.el7_6.6.x86_64 glog-0.3.3-8.el7.x86_64 gmp-6.0.0-11.el7.x86_64 gnutls-3.3.8-12.el7.x86_64 graphite2-1.3.10-1.el7_3.x86_64 gsm-1.0.13-11.el7.x86_64 harfbuzz-1.7.5-2.el7.x86_64 hdf5-1.8.12-11.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 lame-libs-3.100-1.el7.x86_64 leveldb-1.12.0-11.el7.x86_64 libX11-1.6.5-2.el7.x86_64 libXau-1.0.8-2.1.el7.x86_64 libXext-1.3.3-3.el7.x86_64 libXfixes-5.0.3-1.el7.x86_64 libXrender-0.9.10-1.el7.x86_64 libaec-1.0.4-1.el7.x86_64 libblkid-2.23.2-21.el7.x86_64 libbluray-0.2.3-5.el7.x86_64 libcom_err-1.42.9-7.el7.x86_64 libcroco-0.6.8-5.el7.x86_64 libdrm-2.4.91-3.el7.x86_64 libffi-3.0.13-11.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libgcrypt-1.5.3-12.el7.x86_64 libgfortran-4.8.5-39.el7.x86_64 libgomp-4.8.5-39.el7.x86_64 libgpg-error-1.12-3.el7.x86_64 libmfx-1.21-2.el7.x86_64 libmount-2.23.2-21.el7.x86_64 libogg-1.3.0-7.el7.x86_64 libpng-1.5.13-7.el7_2.x86_64 libquadmath-4.8.5-39.el7.x86_64 librsvg2-2.40.20-1.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libtasn1-3.8-2.el7.x86_64 libthai-0.1.14-9.el7.x86_64 libtheora-1.1.1-8.el7.x86_64 libuuid-2.23.2-21.el7.x86_64 libva-1.8.3-1.el7.x86_64 libvdpau-1.1.1-3.el7.x86_64 libvorbis-1.3.3-8.el7.1.x86_64 libxcb-1.13-1.el7.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64 lmdb-libs-0.9.22-2.el7.x86_64 nettle-2.7.1-4.el7.x86_64 numactl-libs-2.0.12-3.el7_7.1.x86_64 openblas-serial-0.3.3-2.el7.x86_64 opencore-amr-0.1.5-6.el7.x86_64 openjpeg2-2.3.1-1.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 opus-1.0.2-6.el7.x86_64 p11-kit-0.20.7-3.el7.x86_64 pango-1.42.4-2.el7_6.x86_64 pcre-8.32-14.el7.x86_64 pixman-0.34.0-1.el7.x86_64 snappy-1.1.0-3.el7.x86_64 soxr-0.1.2-1.el7.x86_64 speex-1.2-0.19.rc1.el7.x86_64 trousers-0.3.11.2-3.el7.x86_64 vo-amrwbenc-0.1.3-1.el7.x86_64 x264-libs-0.148-23.20170521gitaaa9aa8.el7.x86_64 x265-libs-2.9-3.el7.x86_64 xvidcore-1.3.4-2.el7.x86_64 xz-libs-5.1.2-9alpha.el7.x86_64 zlib-1.2.7-18.el7.x86_64 zvbi-0.2.35-1.el7.x86_64
(gdb) bt
#0 0x00007fffec35b2c7 in raise () from /usr/lib64/libc.so.6
#1 0x00007fffec35c9b8 in abort () from /usr/lib64/libc.so.6
#2 0x00007fffecc9a2cd in __gnu_cxx::__verbose_terminate_handler () at /home/eaverin/Downloads/gcc-build-dev/gcc-6.2.0/libstdc++-v3/libsupc++/vterminate.cc:95
#3 0x00007fffecc982a6 in __cxxabiv1::__terminate (handler=) at /home/eaverin/Downloads/gcc-build-dev/gcc-6.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:47
#4 0x00007fffecc972d9 in __cxa_call_terminate (ue_header=ue_header@entry=0xbd9ad0) at /home/eaverin/Downloads/gcc-build-dev/gcc-6.2.0/libstdc++-v3/libsupc++/eh_call.cc:54
#5 0x00007fffecc97c2d in __cxxabiv1::__gxx_personality_v0 (version=, actions=, exception_class=5138137972254386944, ue_header=, context=0x7fffffffd3a0)
at /home/eaverin/Downloads/gcc-build-dev/gcc-6.2.0/libstdc++-v3/libsupc++/eh_personality.cc:676
#6 0x00007fffec7018a3 in ?? () from /usr/lib64/libgcc_s.so.1
#7 0x00007fffec701dd7 in _Unwind_Resume () from /usr/lib64/libgcc_s.so.1
#8 0x00007fffed4e79a3 in onnxruntime::CudaCall<cudaError, true> (retCode=cudaErrorCudartUnloading, exprString=0x7fffedc6f089 "cudaEventSynchronize(e)", libName=0x7fffedc6efc7 "CUDA",
successCode=cudaSuccess, msg=0x7fffedc6efa3 "") at /home/yyh/3rdparty/onnxruntime-bak/onnxruntime/core/providers/cuda/cuda_call.cc:95
#9 0x00007fffed082e26 in onnxruntime::CUDAExecutionProvider::~CUDAExecutionProvider (this=0xc69ab0, __in_chrg=)
at /home/yyh/3rdparty/onnxruntime-bak/onnxruntime/core/providers/cuda/cuda_execution_provider.cc:100
#10 0x00007fffed082fd8 in onnxruntime::CUDAExecutionProvider::~CUDAExecutionProvider (this=0xc69ab0, __in_chrg=)
at /home/yyh/3rdparty/onnxruntime-bak/onnxruntime/core/providers/cuda/cuda_execution_provider.cc:107
#11 0x00007fffecffcf80 in std::default_deleteonnxruntime::IExecutionProvider::operator() (this=0x12d8600, __ptr=0xc69ab0) at /usr/include/c++/4.8.2/bits/unique_ptr.h:67
#12 0x00007fffecff5f09 in std::unique_ptr<onnxruntime::IExecutionProvider, std::default_deleteonnxruntime::IExecutionProvider >::~unique_ptr (this=0x12d8600, __in_chrg=)
at /usr/include/c++/4.8.2/bits/unique_ptr.h:184
#13 0x00007fffed00fa5a in std::_Destroy<std::unique_ptr<onnxruntime::IExecutionProvider, std::default_deleteonnxruntime::IExecutionProvider > > (__pointer=0x12d8600)
at /usr/include/c++/4.8.2/bits/stl_construct.h:93
#14 0x00007fffed00a552 in std::_Destroy_aux::__destroy<std::unique_ptr<onnxruntime::IExecutionProvider, std::default_deleteonnxruntime::IExecutionProvider >> (__first=0x12d8600,
__last=0x12d8610) at /usr/include/c++/4.8.2/bits/stl_construct.h:103
#15 0x00007fffed005035 in std::_Destroy<std::unique_ptr<onnxruntime::IExecutionProvider, std::default_deleteonnxruntime::IExecutionProvider >
> (__first=0x12d8600, __last=0x12d8610)
at /usr/include/c++/4.8.2/bits/stl_construct.h:126
#16 0x00007fffecffceaf in std::_Destroy<std::unique_ptr<onnxruntime::IExecutionProvider, std::default_deleteonnxruntime::IExecutionProvider >, std::unique_ptr<onnxruntime::IExecutionProvider, std::default_deleteonnxruntime::IExecutionProvider > > (__first=0x12d8600, __last=0x12d8610) at /usr/include/c++/4.8.2/bits/stl_construct.h:151
#17 0x00007fffecff5d75 in std::vector<std::unique_ptr<onnxruntime::IExecutionProvider, std::default_deleteonnxruntime::IExecutionProvider >, std::allocator<std::unique_ptr<onnxruntime::IExecutionProvider, std::default_deleteonnxruntime::IExecutionProvider > > >::~vector (this=0xc4fad0, __in_chrg=) at /usr/include/c++/4.8.2/bits/stl_vector.h:415
#18 0x00007fffed03c20a in onnxruntime::ExecutionProviders::~ExecutionProviders (this=0xc4fad0, __in_chrg=)
at /home/yyh/3rdparty/onnxruntime-bak/onnxruntime/core/framework/execution_providers.h:21
#19 0x00007fffed03d79e in onnxruntime::InferenceSession::~InferenceSession (this=0xc4f780, __in_chrg=)
at /home/yyh/3rdparty/onnxruntime-bak/onnxruntime/core/session/inference_session.cc:268
#20 0x00007fffed03da4e in onnxruntime::InferenceSession::~InferenceSession (this=0xc4f780, __in_chrg=)
at /home/yyh/3rdparty/onnxruntime-bak/onnxruntime/core/session/inference_session.cc:286
#21 0x00007fffecff2e00 in OrtApis::ReleaseSession (value=0xc4f780) at /home/yyh/3rdparty/onnxruntime-bak/onnxruntime/core/session/onnxruntime_c_api.cc:1476
#22 0x00000000004051c4 in Ort::OrtRelease(OrtSession
) ()
#23 0x0000000000405d3d in Ort::Base::~Base() ()
#24 0x000000000040578e in Ort::Session::~Session() ()
#25 0x00000000004067b0 in MNIST::~MNIST() ()
#26 0x00000000004067e6 in std::default_delete::operator()(MNIST*) const ()
#27 0x00000000004062bd in std::unique_ptr<MNIST, std::default_delete >::~unique_ptr() ()
#28 0x00007fffec35ec29 in __run_exit_handlers () from /usr/lib64/libc.so.6
#29 0x00007fffec35ec77 in exit () from /usr/lib64/libc.so.6
#30 0x00007fffec34749c in __libc_start_main () from /usr/lib64/libc.so.6
---Type to continue, or q to quit---
#31 0x00000000004021d9 in _start ()

My code:
"""
// 3. Fill input data with cv mat and inference the result
float* input = mnist_->input_image_.data();
for (int i = 0; i < pred_total; ++i)
{
std::fill(mnist_->input_image_.begin(), mnist_->input_image_.end(), 0.f);
float *one_imgs_addr = &imgs.at(i, 0);
memcpy(input, one_imgs_addr, imgs.cols * sizeof(float));
int64_t y_pred = mnist_->Run();
int64_t y_real = labs.at<uint8_t>(i, 0);
if (y_pred == y_real)
acce_total++;
}

double delta = ((double)getTickCount() - timeStart) / getTickFrequency();
std::cout << "Total infer time :" << delta << "sec" << std::endl;
std::cout << "Total accency: " << acce_total * 1.0f / pred_total << std::endl << std::endl;

return 0;

"""

Urgency
If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.

System information

  • OS Platform and Distribution: Linux CentOS 7.1
  • ONNX Runtime installed from (source or binary): source
  • ONNX Runtime version: v1.1.0
  • Python version: 3.6
  • Visual Studio version (if applicable):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: 10.1
  • GPU model and memory: Tesla T4, 15079MiB

To Reproduce
Describe steps/code to reproduce the behavior:

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.

@skottmckay
Copy link
Contributor

skottmckay commented Jan 16, 2020

Given this:
#26 0x00000000004067e6 in std::default_delete::operator()(MNIST*) const ()
#27 0x00000000004062bd in std::unique_ptr<MNIST, std::default_delete >::~unique_ptr() ()
#28 0x00007fffec35ec29 in __run_exit_handlers () from /usr/lib64/libc.so.6

It seems like your program is exiting and you still have an instance of your MNIST class, which has a onnxruntime::InferenceSession within it. During the cleanup of that InferenceSession it's attempting to cleanup the CUDA side of things, but the error seems to indicate that the CUDA driver is already shutting down (possibly due to some other call in __run_exit_handlers).

Can you try explicitly freeing your MNIST instance prior to main() returning?

@faxu faxu added the pending label Jan 28, 2020
@faxu
Copy link
Contributor

faxu commented Jan 28, 2020

@yueyihua Were you able to resolve this issue?

@faxu
Copy link
Contributor

faxu commented Feb 27, 2020

Closing due to inactivity. please reopen as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants