Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion on non-empty upstream_requests in Router::Filter destructor in stress test #9054

Closed
qiwzhang opened this issue Nov 18, 2019 · 9 comments
Labels
Milestone

Comments

@qiwzhang
Copy link
Contributor

qiwzhang commented Nov 18, 2019

Title: Envoy crash with assertion on upstream_requests not empty in Router::Filter destructor in stress test using wrk

Description:

\During stress test using wrk with command similar to following:

wrk -t 1 --timeout 2m -c 10 -d 60s

with these data

= 1 threads and 10 connections
= Thread Stats Avg Stdev Max +/- Stdev
= Latency 38.91ms 5.71ms 101.70ms 90.33%
= Req/Sec 257.68 28.93 303.00 71.83%
= 15411 requests in 1.00m, 3.48MB read
= Requests/sec: 256.63
= Transfer/sec: 59.40KB

Call Stack:

#0  raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x000055db2583a4bb in Envoy::SignalAction::sigHandler (sig=6, info=0x7f7f22b47e70, context=0x7f7f22b47d40) at external/envoy/source/common/signal/signal_action.cc:74
#2  
#3  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#4  0x00007f7f25bc9535 in __GI_abort () at abort.c:79
#5  0x000055db2530a25c in Envoy::Router::Filter::~Filter (this=0x55db28cf5b90, __in_chrg=) at external/envoy/source/common/router/router.cc:253
#6  0x000055db25308894 in Envoy::Router::ProdFilter::~ProdFilter (this=0x55db28cf5b90, __in_chrg=) at bazel-out/k8-dbg/bin/external/envoy/source/common/router/_virtual_includes/router_lib/common/router/router.h:594
#7  0x000055db24a09493 in __gnu_cxx::new_allocator::destroy (this=0x55db28cf5b90, __p=0x55db28cf5b90) at /usr/include/c++/7/ext/new_allocator.h:140
#8  0x000055db24a09301 in std::allocator_traits >::destroy (__a=..., __p=0x55db28cf5b90) at /usr/include/c++/7/bits/alloc_traits.h:487
#9  0x000055db24a08ffd in std::_Sp_counted_ptr_inplace, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x55db28cf5b80) at /usr/include/c++/7/bits/shared_ptr_base.h:535
#10 0x000055db24618270 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x55db28cf5b80) at /usr/include/c++/7/bits/shared_ptr_base.h:154
#11 0x000055db24616ceb in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x55db2b770588, __in_chrg=) at /usr/include/c++/7/bits/shared_ptr_base.h:684
#12 0x000055db24616586 in std::__shared_ptr::~__shared_ptr (this=0x55db2b770580, __in_chrg=) at /usr/include/c++/7/bits/shared_ptr_base.h:1123
#13 0x000055db246165a2 in std::shared_ptr::~shared_ptr (this=0x55db2b770580, __in_chrg=) at /usr/include/c++/7/bits/shared_ptr.h:93
#14 0x000055db253de61e in Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter::~ActiveStreamDecoderFilter (this=0x55db2b770540, __in_chrg=, __vtt_parm=)
    at bazel-out/k8-dbg/bin/external/envoy/source/common/http/_virtual_includes/conn_manager_lib/common/http/conn_manager_impl.h:206
#15 0x000055db253de692 in Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter::~ActiveStreamDecoderFilter (this=0x55db2b770540, __in_chrg=, __vtt_parm=)
    at bazel-out/k8-dbg/bin/external/envoy/source/common/http/_virtual_includes/conn_manager_lib/common/http/conn_manager_impl.h:206
#16 0x000055db253de6e2 in std::default_delete::operator() (this=0x55db28f35a30, __ptr=0x55db2b770540) at /usr/include/c++/7/bits/unique_ptr.h:78
#17 0x000055db253dc6fb in std::unique_ptr >::~unique_ptr (this=0x55db28f35a30,
    __in_chrg=) at /usr/include/c++/7/bits/unique_ptr.h:268
#18 0x000055db253e2dfe in __gnu_cxx::new_allocator > > >::destroy > > (this=0x55db2975f238, __p=0x55db28f35a30)
    at /usr/include/c++/7/ext/new_allocator.h:140
#19 0x000055db253e0673 in std::allocator_traits > > > >::destroy > > (__a=..., __p=0x55db28f35a30)
    at /usr/include/c++/7/bits/alloc_traits.h:487
#20 0x000055db253de364 in std::__cxx11::_List_base >, std::allocator > > >::_M_clear (this=0x55db2975f238) at /usr/include/c++/7/bits/list.tcc:76
#21 0x000055db253dc448 in std::__cxx11::_List_base >, std::allocator > > >::~_List_base (this=0x55db2975f238, __in_chrg=)
    at /usr/include/c++/7/bits/stl_list.h:442
#22 0x000055db253db200 in std::__cxx11::list >, std::allocator > > >::~list (this=0x55db2975f238, __in_chrg=)
    at /usr/include/c++/7/bits/stl_list.h:733
#23 0x000055db253c9ee9 in Envoy::Http::ConnectionManagerImpl::ActiveStream::~ActiveStream (this=0x55db2975f180, __in_chrg=) at external/envoy/source/common/http/conn_manager_impl.cc:542
#24 0x000055db253ca08e in Envoy::Http::ConnectionManagerImpl::ActiveStream::~ActiveStream (this=0x55db2975f180, __in_chrg=) at external/envoy/source/common/http/conn_manager_impl.cc:587
#25 0x000055db246c77ca in std::default_delete::operator() (this=0x55db29ea8800, __ptr=0x55db2975f180) at /usr/include/c++/7/bits/unique_ptr.h:78
#26 0x000055db24f5ab97 in std::unique_ptr >::reset (this=0x55db29ea8800, __p=0x55db2975f180) at /usr/include/c++/7/bits/unique_ptr.h:376
#27 0x000055db24f58392 in Envoy::Event::DispatcherImpl::clearDeferredDeleteList (this=0x55db283986e0) at external/envoy/source/common/event/dispatcher_impl.cc:95
#28 0x000055db24f576cd in Envoy::Event::DispatcherImpl::::operator()(void) const (__closure=0x55db28509b50) at external/envoy/source/common/event/dispatcher_impl.cc:42
#29 0x000055db24f59ad1 in std::_Function_handler >::_M_invoke(const std::_Any_data &) (
    __functor=...) at /usr/include/c++/7/bits/std_function.h:316
#30 0x000055db246bf992 in std::function::operator()() const (this=0x55db28509b50) at /usr/include/c++/7/bits/std_function.h:706
#31 0x000055db24f94a00 in Envoy::Event::TimerImpl::::operator()(int, short, void *) const (__closure=0x0, arg=0x55db28509ad0) at external/envoy/source/common/event/timer_impl.cc:23
#32 0x000055db24f94ab9 in Envoy::Event::TimerImpl::::_FUN(int, short, void *) () at external/envoy/source/common/event/timer_impl.cc:23
#33 0x000055db255f67ba in event_process_active_single_queue (base=0x55db2848c580, activeq=0x55db283646d0, max_to_process=2147483647, endtime=0x0)
    at /home/qiwzhang/.cache/bazel/_bazel_qiwzhang/60020ca4eb4bbfdb82eb82dc35e1335d/sandbox/linux-sandbox/1893/execroot/gcpproxy/external/com_github_libevent_libevent/event.c:1713
#34 0x000055db255f6ca1 in event_process_active (base=0x55db2848c580)
    at /home/qiwzhang/.cache/bazel/_bazel_qiwzhang/60020ca4eb4bbfdb82eb82dc35e1335d/sandbox/linux-sandbox/1893/execroot/gcpproxy/external/com_github_libevent_libevent/event.c:1805
#35 0x000055db255f7557 in event_base_loop (base=0x55db2848c580, flags=0)
    at /home/qiwzhang/.cache/bazel/_bazel_qiwzhang/60020ca4eb4bbfdb82eb82dc35e1335d/sandbox/linux-sandbox/1893/execroot/gcpproxy/external/com_github_libevent_libevent/event.c:2047
#36 0x000055db24f93516 in Envoy::Event::LibeventScheduler::run (this=0x55db28398730, mode=Envoy::Event::Dispatcher::RunType::Block) at external/envoy/source/common/event/libevent_scheduler.cc:47
#37 0x000055db24f59707 in Envoy::Event::DispatcherImpl::run (this=0x55db283986e0, type=Envoy::Event::Dispatcher::RunType::Block) at external/envoy/source/common/event/dispatcher_impl.cc:194
#38 0x000055db24f4225c in Envoy::Server::WorkerImpl::threadRoutine (this=0x55db2855a150, guard_dog=...) at external/envoy/source/server/worker_impl.cc:110
#39 0x000055db24f41b00 in Envoy::Server::WorkerImpl::::operator()(void) const (__closure=0x55db28a01268) at external/envoy/source/server/worker_impl.cc:75
#40 0x000055db24f42b6e in std::_Function_handler >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/7/bits/std_function.h:316
#41 0x000055db246bf992 in std::function::operator()() const (this=0x55db28a01268) at /usr/include/c++/7/bits/std_function.h:706
#42 0x000055db2583fae8 in Envoy::Thread::ThreadImplPosix::::operator()(void *) const (__closure=0x0, arg=0x55db28a01260) at external/envoy/source/common/common/posix/thread_impl.cc:33
#43 0x000055db2583fb0c in Envoy::Thread::ThreadImplPosix::::_FUN(void *) () at external/envoy/source/common/common/posix/thread_impl.cc:35
#44 0x00007f7f25d6ffa3 in start_thread (arg=) at pthread_create.c:486
#45 0x00007f7f25ca04cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
@qiwzhang
Copy link
Contributor Author

This is from

ENVOY_SHA1 = "619eb5735f038a568df6497f28fac1bf809b2379" # 11.06.2019

@junr03 junr03 added the bug label Nov 18, 2019
@junr03
Copy link
Member

junr03 commented Nov 18, 2019

@snowp could you take a look at this?

@snowp
Copy link
Contributor

snowp commented Nov 18, 2019

It seems like somehow we're handling the defer delete for the HCM ActiveStream without all the upstream_requests_ being removed. The only place we call deferredDelete we very explicitly call onDestroy (which resets all upstream requests) for all the HCM filters before moving the stream onto the list of deferred deletions.

I'm wondering if what's going on is that the ActiveStream is being shut down between newStream being called on the HCM and decodeHeaders called on the stream: since we defer delete the ActiveStream, I think it might be possible for a decodeHeaders event to be enqueued while the event to destroy (e.g. due to a timeout, which wouldn't yet stop the headers from being processed) the stream has yet to be executed, resulting in a new upstream request being created between onDestroy and the list of deferred deletes being cleared.

If this is the case we should probably just bail out early in decodeHeaders based on state_.destroyed_

@alyssawilk

@qiwzhang
Copy link
Contributor Author

In our test setup, we had a http filter making async_client call. While the async_client call is in progress, the filter is paused with STOP. This filter is not handling onDestroy() to cancel active async_client call. Could this be the root cause?

But, the upsteram_stream_ in Router is created on the Router::decodeHeader(). I assume if my filter decodeHeader() is paused with STOP, Router::decodeHeader() should not be called. Since Router::upstream_requests is not empty when crash, it means my filter async_client call should have returned, it should have returned CONTINUE so that Router::decodeHeader() is called.

@alyssawilk
Copy link
Contributor

If this is reproducible, any chance you can rerun with -l trace and share the logs of the offending connection with us? It might shortcut a lot of debugging!

@qiwzhang
Copy link
Contributor Author

It is a stress test. with over 10k requests within 1m, with "-l trace", it will generate too much logs. problem may not happen.

@qiwzhang
Copy link
Contributor Author

reproduce steps is: run a stress test with over 10k request per minute for a couple hours.

@qiwzhang
Copy link
Contributor Author

Update: I am trying to cancel the async_client in-flight request when filter is gone in my http custom filter to see if it fixes the problem.

Will update the result here.

@mattklein123 mattklein123 added the help wanted Needs help! label Nov 21, 2019
@mattklein123 mattklein123 added this to the 1.13.0 milestone Nov 21, 2019
@qiwzhang
Copy link
Contributor Author

After I cancel the in-flight async_client request on my custom Http Filter::onDestroy(), the problem went away. Hence I close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants