Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EXAMPLE] The otlp_http client crashes on exit #1867

Closed
marcalff opened this issue Dec 14, 2022 · 7 comments · Fixed by #1868
Closed

[EXAMPLE] The otlp_http client crashes on exit #1867

marcalff opened this issue Dec 14, 2022 · 7 comments · Fixed by #1868
Assignees
Labels
bug Something isn't working

Comments

@marcalff
Copy link
Member

marcalff commented Dec 14, 2022

Do not start any opentelemetry-collector process.

Execute:

./examples/otlp/example_otlp_http

It crashes in:

#0  0x0000000000668f6e in opentelemetry::v1::exporter::otlp::(anonymous namespace)::ResponseHandler::OnEvent (this=0x886610, state=opentelemetry::v1::ext::http::client::SessionState::Cancelled, reason=...) at /home/malff/CODE/MARC_GITHUB/opentelemetry-cpp/exporters/otlp/src/otlp_http_client.cc:303
#1  0x000000000078fcfa in opentelemetry::v1::ext::http::client::curl::HttpOperation::DispatchEvent (this=0x862c30, type=opentelemetry::v1::ext::http::client::SessionState::Cancelled, reason="No error") at /home/malff/CODE/MARC_GITHUB/opentelemetry-cpp/ext/src/http/client/curl/http_operation_curl.cc:224
#2  0x0000000000790d03 in opentelemetry::v1::ext::http::client::curl::HttpOperation::Cleanup (this=0x862c30) at /home/malff/CODE/MARC_GITHUB/opentelemetry-cpp/ext/src/http/client/curl/http_operation_curl.cc:344
#3  0x000000000078083f in opentelemetry::v1::ext::http::client::curl::Session::FinishOperation (this=0x888c50) at /home/malff/CODE/MARC_GITHUB/opentelemetry-cpp/ext/src/http/client/curl/http_client_curl.cc:115
#4  0x0000000000780732 in opentelemetry::v1::ext::http::client::curl::HttpClient::CleanupSession (this=0x864f30, session_id=4) at /home/malff/CODE/MARC_GITHUB/opentelemetry-cpp/ext/src/http/client/curl/http_client_curl.cc:257
#5  0x000000000078049a in opentelemetry::v1::ext::http::client::curl::Session::CancelSession (this=0x888c50) at /home/malff/CODE/MARC_GITHUB/opentelemetry-cpp/ext/src/http/client/curl/http_client_curl.cc:97
#6  0x0000000000781089 in opentelemetry::v1::ext::http::client::curl::HttpClient::CancelAllSessions (this=0x864f30) at /home/malff/CODE/MARC_GITHUB/opentelemetry-cpp/ext/src/http/client/curl/http_client_curl.cc:195
#7  0x0000000000663499 in opentelemetry::v1::exporter::otlp::OtlpHttpClient::Shutdown (this=0x865210, timeout=...) at /home/malff/CODE/MARC_GITHUB/opentelemetry-cpp/exporters/otlp/src/otlp_http_client.cc:833
#8  0x0000000000637164 in opentelemetry::v1::exporter::otlp::OtlpHttpExporter::Shutdown (this=0x869a70, timeout=...) at /home/malff/CODE/MARC_GITHUB/opentelemetry-cpp/exporters/otlp/src/otlp_http_exporter.cc:137
#9  0x0000000000627758 in opentelemetry::v1::sdk::trace::SimpleSpanProcessor::Shutdown (this=0x8841f0, timeout=...) at /home/malff/CODE/MARC_GITHUB/opentelemetry-cpp/sdk/include/opentelemetry/sdk/trace/simple_processor.h:66
#10 0x0000000000632efe in opentelemetry::v1::sdk::trace::MultiSpanProcessor::Shutdown (this=0x866180, timeout=...) at /home/malff/CODE/MARC_GITHUB/opentelemetry-cpp/sdk/include/opentelemetry/sdk/trace/multi_span_processor.h:131
#11 0x0000000000632794 in opentelemetry::v1::sdk::trace::TracerContext::Shutdown (this=0x866620) at /home/malff/CODE/MARC_GITHUB/opentelemetry-cpp/sdk/src/trace/tracer_context.cc:58
#12 0x0000000000628ec0 in opentelemetry::v1::sdk::trace::TracerProvider::~TracerProvider (this=0x86eb30) at /home/malff/CODE/MARC_GITHUB/opentelemetry-cpp/sdk/src/trace/tracer_provider.cc:49
#13 0x0000000000628f09 in opentelemetry::v1::sdk::trace::TracerProvider::~TracerProvider (this=0x86eb30) at /home/malff/CODE/MARC_GITHUB/opentelemetry-cpp/sdk/src/trace/tracer_provider.cc:43
#14 0x000000000061086c in std::default_delete<opentelemetry::v1::trace::TracerProvider>::operator() (this=0x885fc0, __ptr=0x86eb30) at /usr/bin/../lib64/gcc/x86_64-suse-linux/12/../../../../include/c++/12/bits/unique_ptr.h:95
#15 0x00000000006106ad in std::_Sp_counted_deleter<opentelemetry::v1::trace::TracerProvider*, std::default_delete<opentelemetry::v1::trace::TracerProvider>, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x885fb0) at /usr/bin/../lib64/gcc/x86_64-suse-linux/12/../../../../include/c++/12/bits/shared_ptr_base.h:527
#16 0x0000000000610d6e in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x885fb0) at /usr/bin/../lib64/gcc/x86_64-suse-linux/12/../../../../include/c++/12/bits/shared_ptr_base.h:346
#17 0x000000000061024a in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x84ea78 <opentelemetry::v1::trace::Provider::GetProvider()::provider+16>) at /usr/bin/../lib64/gcc/x86_64-suse-linux/12/../../../../include/c++/12/bits/shared_ptr_base.h:1071
#18 0x0000000000615409 in std::__shared_ptr<opentelemetry::v1::trace::TracerProvider, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x84ea70 <opentelemetry::v1::trace::Provider::GetProvider()::provider+8>) at /usr/bin/../lib64/gcc/x86_64-suse-linux/12/../../../../include/c++/12/bits/shared_ptr_base.h:1524
#19 0x000000000060fd85 in std::shared_ptr<opentelemetry::v1::trace::TracerProvider>::~shared_ptr (this=0x84ea70 <opentelemetry::v1::trace::Provider::GetProvider()::provider+8>) at /usr/bin/../lib64/gcc/x86_64-suse-linux/12/../../../../include/c++/12/bits/shared_ptr.h:175
#20 0x000000000061519a in opentelemetry::v1::nostd::shared_ptr<opentelemetry::v1::trace::TracerProvider>::shared_ptr_wrapper::~shared_ptr_wrapper (this=0x84ea68 <opentelemetry::v1::trace::Provider::GetProvider()::provider>) at /home/malff/CODE/MARC_GITHUB/opentelemetry-cpp/api/include/opentelemetry/nostd/shared_ptr.h:44
#21 0x000000000060fd5d in opentelemetry::v1::nostd::shared_ptr<opentelemetry::v1::trace::TracerProvider>::~shared_ptr (this=0x84ea68 <opentelemetry::v1::trace::Provider::GetProvider()::provider>) at /home/malff/CODE/MARC_GITHUB/opentelemetry-cpp/api/include/opentelemetry/nostd/shared_ptr.h:114
#22 0x00007ffff6bdcae9 in __run_exit_handlers () from /lib64/libc.so.6
#23 0x00007ffff6bdcc7a in exit () from /lib64/libc.so.6
#24 0x00007ffff6bc42a4 in __libc_start_main () from /lib64/libc.so.6
#25 0x000000000060efea in _start () at ../sysdeps/x86_64/start.S:120
(gdb)

The line that crashes is:

OTEL_INTERNAL_LOG_ERROR(error_message.str());

From the call stack, after main() is done, C++ exit handlers now cleanup global variables.

In particular, the global trace provider shared pointer is destroyed, causing a shutdown on the provider.
This causes a shutdown on HttpClient, and the pending sessions.
The session cancel event causes a message to be written to the global log handler,
which is most likely long gone by now.

The opentelemetry-cpp shutdown needs to be more orderly.

@marcalff marcalff added the bug Something isn't working label Dec 14, 2022
@marcalff marcalff self-assigned this Dec 14, 2022
@marcalff
Copy link
Member Author

Fix below, I will send a PR to fix all clients.

diff --git a/examples/otlp/http_main.cc b/examples/otlp/http_main.cc
index 8c52dba2..52b11562 100644
--- a/examples/otlp/http_main.cc
+++ b/examples/otlp/http_main.cc
@@ -36,6 +36,13 @@ void InitTracer()
   // Set the global trace provider
   trace::Provider::SetTracerProvider(provider);
 }
+
+void CleanupTracer()
+{
+  std::shared_ptr<opentelemetry::trace::TracerProvider> none;
+  trace::Provider::SetTracerProvider(none);
+}
+
 }  // namespace
 
 /*
@@ -77,4 +84,6 @@ int main(int argc, char *argv[])
   InitTracer();
 
   foo_library();
+
+  CleanupTracer();
 }

marcalff added a commit to marcalff/opentelemetry-cpp that referenced this issue Dec 14, 2022
@owent
Copy link
Member

owent commented Dec 15, 2022

This problem seems relate to #1718 and #1767 .Maybe it's a better way to use a guard variable to tell whether the global handle is still available when calling GlobalLogHandler::GetLogHandler() than explicit cleanup at exit?
I meet the same problems before and my solution is calling *::Shutdown() before exit.

@sirzooro
Copy link

sirzooro commented Dec 15, 2022

This problem is not limited to examples, lots of other apps where people already use opentelemetry-cpp library or will use it in the future are affected too. This problem requires proper fix in opentelemetry-cpp library itself.

Looks that library needs some global cleanup handler which would deinit everything in proper order. One approach is to have priority queue, where every part of opentelemetry-cpp could register its own deinit routine. This queue will be then processed in exit handler.

I also would like to see new global function to explicitly initiate shutdown when app exits. This can be useful for apps which register their own log handler using opentelemetry::sdk::common::internal_log::GlobalLogHandler::SetLogHandler - it may be important to shut down opentelemetry-cpp first before shutting down app logger, as opposite order may result in a crash or missed logs.

@marcalff
Copy link
Member Author

Hi @sirzooro

I agree with you that the opentelemetry-cpp library needs a more robust way to initialize and shutdown.

This will be needed in particular to support features like the OTEL_SDK_DISABLED or OTEL_LOG_LEVEL from the spec:

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/sdk-environment-variables.md

because currently there is no obvious places to put initialization code to implement these features.

Would the following pattern be better ?

main()
{
  // lookup OTEL_SDK_DISABLED, OTEL_LOG_LEVEL, perform all internal init in a controlled way
  opentelemetry::sdk::init();

  // optional, add global log handlers

  InitTracer();

  perform_work();

  // shutdown traces, metrics and logs global providers : this can write logs
  opentelemetry::sdk::shutdown();

  // optional, remove global log handlers

  // perform all internal cleanup in a controlled way
  opentelemetry::sdk::cleanup();
}

One issue here is that the global tracer provider is held with a shared_ptr, so if the application code still holds a reference to it, shutdown() can shut the tracer provider down, but can not remove it.

Agreed, CleanupTracer() is a work around, I will be happy to have more robust fix as well.

@sirzooro
Copy link

sirzooro commented Dec 15, 2022

It looks that for me it would not change things too much. You can go with it if it is better for you. I have helper library for OpenTelemetry, which performs initialization and shutdown. Main in my app looks like below. It would be fine as long as everything can be put into single OtelHelper::init() and OtelHelper::shutdown() calls.

main()
{
  AppLogger::init();
  OtelHelper::init();
  // app init also creates tracer instance for the app, and inits other libs which may create their tracers too
  App::init();

  App::run();

  App::shutdown();
  OtelHelper::shutdown();
  AppLogger::shutdown();
}

@lalitb
Copy link
Member

lalitb commented Dec 16, 2022

I can't reproduce the crash, which platform is it ?

labhas@DESKTOP-D0BLHPQ:~/obs/ot/opentelemetry-cpp/build/examples/otlp$ ./example_otlp_http
[Error] File: /home/labhas/obs/ot/opentelemetry-cpp/exporters/otlp/src/otlp_http_client.cc:213 [OTLP HTTP Client] Session state: connection failed.Couldn't connect to server
[Error] File: /home/labhas/obs/ot/opentelemetry-cpp/exporters/otlp/src/otlp_http_exporter.cc:107 [OTLP HTTP Client] ERROR: Export 1 trace span(s) error: 1
[Error] File: /home/labhas/obs/ot/opentelemetry-cpp/exporters/otlp/src/otlp_http_client.cc:213 [OTLP HTTP Client] Session state: connection failed.Couldn't connect to server
[Error] File: /home/labhas/obs/ot/opentelemetry-cpp/exporters/otlp/src/otlp_http_exporter.cc:107 [OTLP HTTP Client] ERROR: Export 1 trace span(s) error: 1
[Error] File: /home/labhas/obs/ot/opentelemetry-cpp/exporters/otlp/src/otlp_http_client.cc:213 [OTLP HTTP Client] Session state: connection failed.Couldn't connect to server
[Error] File: /home/labhas/obs/ot/opentelemetry-cpp/exporters/otlp/src/otlp_http_exporter.cc:107 [OTLP HTTP Client] ERROR: Export 1 trace span(s) error: 1
[Error] File: /home/labhas/obs/ot/opentelemetry-cpp/exporters/otlp/src/otlp_http_client.cc:213 [OTLP HTTP Client] Session state: connection failed.Couldn't connect to server
[Error] File: /home/labhas/obs/ot/opentelemetry-cpp/exporters/otlp/src/otlp_http_exporter.cc:107 [OTLP HTTP Client] ERROR: Export 1 trace span(s) error: 1
labhas@DESKTOP-D0BLHPQ:~/obs/ot/opentelemetry-cpp/build/examples/otlp$ .

I thought #1767 should have fixed the destruction order for GlobalLogHandler and *Provider, with GlobalLogHandler being destroyed in the end.

@marcalff
Copy link
Member Author

marcalff commented Jan 3, 2023

I can't reproduce the crash, which platform is it ?

Reproduced with opentelemetry-cpp 9197bf3 (current code as of Jan 3rd, 2023),
using clang 13, on linux (opensuse 15.4).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants