Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSL related leak #8149

Closed
rgs1 opened this issue Sep 4, 2019 · 6 comments
Closed

SSL related leak #8149

rgs1 opened this issue Sep 4, 2019 · 6 comments
Labels

Comments

@rgs1
Copy link
Member

rgs1 commented Sep 4, 2019

We are seeing a memory leak when using the following tls_context config:

   "tls_context": {
      "common_tls_context": {
        "tls_certificate_sds_secret_configs": [
          {
            "name": "certificate_chain",
            "sds_config": {
              "api_config_source": {
                "api_type": "GRPC",
                "grpc_services": [
                  {
                    "envoy_grpc": {
                      "cluster_name": "local_sds"
                    }
                  }
                ]
              },
              "initial_fetch_timeout": "10s"
            }
          }
        ],
        "validation_context_sds_secret_config": {
          "name": "foo",
          "sds_config": {
            "api_config_source": {
              "api_type": "GRPC",
              "grpc_services": [
                {
                  "envoy_grpc": {
                    "cluster_name": "local_sds"
                  }
                }
              ]
            },
            "initial_fetch_timeout": "10s"
          }
        }
      }

Per pprof:

Total: 123.1 MB                                                                                                                                            
    98.9  80.3%  80.3%     98.9  80.3% OPENSSL_malloc                                                                                                      
    10.5   8.5%  88.9%     10.5   8.5% OPENSSL_realloc                                                                                                     
     1.3   1.1%  89.9%      2.2   1.8% Envoy::Event::DispatcherImpl::createClientConnection                                                                
     1.2   0.9%  90.9%      1.8   1.4% Envoy::Event::DispatcherImpl::createServerConnection                    
     0.8   0.7%  91.6%      1.2   1.0% Envoy::Http::CodecClientProd::CodecClientProd                                 
     0.8   0.7%  92.2%      0.8   0.7% Envoy::Buffer::OwnedImpl::reserve                                 
     0.8   0.6%  92.9%      0.8   0.6% std::make_unique@8e1970
...

This still happens with master as of yesterday, but apparently started a week or so ago. I'll try to bisect and see if any of the recent BoringSSL updates might be related.

cc: @fishcakez @derekargueta

@rgs1
Copy link
Member Author

rgs1 commented Sep 4, 2019

Possible leak per pprof:

994      (0000000000ac0112) ??:0:OPENSSL_malloc
         (0000000000a46555) ??:0:asn1_enc_save
         (0000000000a43a26) ??:0:asn1_item_ex_d2i
         (0000000000a447d2) ??:0:asn1_template_noexp_d2i
         (0000000000a43e58) ??:0:asn1_template_ex_d2i
         (0000000000a436cb) ??:0:asn1_item_ex_d2i
         (0000000000a42fd2) ??:0:ASN1_item_d2i
         (0000000000ac1312) ??:0:parse_x509
         (0000000000ac1058) ??:0:PEM_X509_INFO_read_bio
         (000000000063d832) ??:0:Envoy::Extensions::TransportSockets::Tls::ContextImpl::ContextImpl
         (0000000000642af5) ??:0:Envoy::Extensions::TransportSockets::Tls::ClientContextImpl::ClientContextImpl
         (000000000064ad36) ??:0:Envoy::Extensions::TransportSockets::Tls::ContextManagerImpl::createSslClientContext
         (0000000000635d3a) ??:0:Envoy::Extensions::TransportSockets::Tls::ClientSslSocketFactory::onAddOrUpdateSecret
         (000000000063cd97) ??:0:std::_Function_handler::_M_invoke
         (00000000006d74f0) ??:0:Envoy::Secret::SdsApi::onConfigUpdate
         (0000000000850d49) ??:0:Envoy::Config::GrpcMuxSubscriptionImpl::onConfigUpdate
         (000000000084d999) ??:0:Envoy::Config::GrpcMuxImpl::onDiscoveryResponse
         (000000000084ea52) ??:0:Envoy::Grpc::AsyncStreamCallbacks::onReceiveMessageRaw
         (00000000008765c6) ??:0:Envoy::Grpc::AsyncStreamImpl::onData
         (000000000087adf3) ??:0:Envoy::Http::AsyncStreamImpl::encodeData
         (0000000000888697) ??:0:Envoy::Router::Filter::UpstreamRequest::decodeData
         (00000000008ef728) ??:0:Envoy::Http::Http2::ConnectionImpl::onFrameReceived
         (000000000098482e) ??:0:nghttp2_session_on_data_received
         (0000000000986506) ??:0:nghttp2_session_mem_recv
         (00000000008eede8) ??:0:Envoy::Http::Http2::ConnectionImpl::dispatch
         (000000000086afe5) ??:0:Envoy::Http::CodecClient::onData
         (000000000086ba8c) ??:0:Envoy::Http::CodecClient::CodecReadFilter::onData
         (00000000006c1a80) ??:0:Envoy::Network::FilterManagerImpl::onContinueReading
         (00000000006be0eb) ??:0:Envoy::Network::ConnectionImpl::onReadReady
         (00000000006bdbb0) ??:0:Envoy::Network::ConnectionImpl::onFileEvent
         (00000000006b8a24) ??:0:Envoy::Event::FileEventImpl::assignEvents::$_0::__invoke
         (00000000009c2fdc) ??:0:event_process_active_single_queue

@htuch
Copy link
Member

htuch commented Sep 4, 2019

This looks like it's on the config side during context creation. @PiotrSikora any thoughts?

@htuch htuch added the bug label Sep 4, 2019
@rgs1
Copy link
Member Author

rgs1 commented Sep 4, 2019

Yep, I am trying to confirm it's actually on the config side. On the data side I am also slightly suspicious of #6326.

@lizan
Copy link
Member

lizan commented Sep 4, 2019

Also possibly due to recent BoringSSL upgrade: #7952

@ipuustin
Copy link
Member

ipuustin commented Sep 5, 2019

@rgs1: I looked at the #6326 changes related to this, but I did not see anything too suspicious. Please tell where the git bisect ends. My first thought was that X509_up_ref() at context_impl.cc:117 might be too much, but it has been there for a long time.

rgs1 pushed a commit to rgs1/envoy that referenced this issue Sep 20, 2019
Per @ipuustin's suggestion, dropping the `X509_up_ref()` at:

https://github.com/envoyproxy/envoy/blob/master/source/extensions/transport_sockets/tls/context_impl.cc#L121

fixes the mem leak reported at envoyproxy#8149. We do have a local deterministic
repro for this, but I'll try to provide a unit test for this as well.

Fixes envoyproxy#8149.

Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
@rgs1
Copy link
Member Author

rgs1 commented Sep 24, 2019

So, this was just a badly configured connection pool with a huge upstream.. Sorry for the false alarm.

@rgs1 rgs1 closed this as completed Sep 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants