Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing or underreported errors for ActiveRecord::ConnectionTimeoutError #1563

Closed
chris-canipe opened this issue Sep 2, 2021 · 10 comments · Fixed by #1564
Closed

Missing or underreported errors for ActiveRecord::ConnectionTimeoutError #1563

chris-canipe opened this issue Sep 2, 2021 · 10 comments · Fixed by #1564
Assignees
Milestone

Comments

@chris-canipe
Copy link

Describe the bug

We're seeing ActiveRecord::ConnectionTimeoutError errors in Airbrake and Splunk that are either:

  1. Not reported in Sentry; or
  2. Are reported in Sentry at a much lower rate

To Reproduce

Use up all pooled database connections and ask for more.

Expected behavior

The number of errors reported in Sentry are comparable to those reported in other tools.

Actual behavior

The errors are not reported in Sentry, or are reported at a much lower rate than in other tools; for example:

Sentry Splunk Airbrake
0 1.5k 1.3k
84 33.7k 27.2k
0 590 506
0 6k 5k
0 25.3k 21.4k
0 3k 2.4k

Environment

  • Ruby Version: 2.6.7
  • SDK Version: We've experienced this issue in 4.3.4, 4.5.1, and 4.6.5
  • Integration Versions:
    • Rails 5.2.6
    • Sidekiq 5.2.9

Sentry Config

Sentry.init do |config|
  config.debug = true

  config.environment = ENV.fetch('SENTRY_ENVIRONMENT') do
    case ENV['NAMESPACE'].to_s.underscore
    when 'portal_live'
      'production'
    when 'portal_staging'
      'staging'
    else
      'development'
    end
  end

  config.breadcrumbs_logger = %i[active_support_logger http_logger]

  # Custom, unrelated errors have been redacted
  config.excluded_exceptions += [
    'ActiveSupport::DeprecationException',
    'Redlock::LockNotRetrievedError',
    'Sidekiq::Limiter::OverLimit'
  ]

  config.inspect_exception_causes_for_exclusion = false

  # This was a recent addition: the issue happened prior to this
  config.rails.report_rescued_exceptions = false

  config.release = "portal@#{ENV['CURRENT_SHA']}"

  config.send_default_pii = false

  # This was a recent addition: the issue happened prior to this
  config.send_modules = false

  # This has changed throughout the history of the issue.
  # Effectively, this returns a minuscule value for most transactions.
  config.traces_sampler = ->(sampling_context) do
    # ...
  end

  # This has changed throughout the history of the issue.
  # We have logging and unit testing in place to ensure this is not the culprit.
  # Additionally, there are cases where we do see ActiveRecord::ConnectionTimeoutError
  # in Sentry, just at a much lower rate, further absolving any changes here.
  config.before_send = ->(event, hint) do
    # ...
  end
end
@st0012
Copy link
Collaborator

st0012 commented Sep 2, 2021

thanks for reporting. I need additional information before diving into the issue:

  • did you use all 3 SDKs together or you tested them one at a time?
  • can you share the result of bundle exec rake middleware?

@chris-canipe
Copy link
Author

Stan, we were on one SDK at a time.

bundle exec rake middleware:

use Webpacker::DevServerProxy
use Webpacker::DevServerProxy
use Rack::Cors
use SecureHeaders::Middleware
use Rack::Sendfile
use ActionDispatch::Static
use ActionDispatch::Static
use ActionDispatch::Executor
use Sentry::Rails::CaptureExceptions
use ActiveSupport::Cache::Strategy::LocalCache::Middleware
use Rack::Runtime
use Rack::MethodOverride
use ActionDispatch::RequestId
use Rack::Timeout
use RequestStore::Middleware
use ActionDispatch::RemoteIp
use Sprockets::Rails::QuietAssets
use Rails::Rack::Logger
use ActionDispatch::ShowExceptions
use ActionDispatch::DebugExceptions
use BetterErrors::Middleware
use ActionDispatch::Reloader
use ActionDispatch::Callbacks
use ActiveRecord::Migration::CheckPending
use ActionDispatch::Cookies
use ActionDispatch::Session::CookieStore
use ActionDispatch::Flash
use ActionDispatch::ContentSecurityPolicy::Middleware
use Rack::Head
use Rack::ConditionalGet
use Rack::ETag
use Rack::TempfileReaper
use Apartment::Elevators::SubdomainWithApi
use Warden::Manager
use Rack::Static
use Makara::Middleware
use Sentry::Rails::RescuedExceptionInterceptor
use Bullet::Rack
use Apartment::Reloader
run Portal3::Application.routes

@st0012
Copy link
Collaborator

st0012 commented Sep 2, 2021

@chris-canipe thanks for the information 👍
I can confirm that the SDK doesn't handle the exception properly, probably because it's raised from a middleware.

does this backtrace looks similar to what you see in other services?

activerecord (6.0.4) lib/active_record/connection_adapters/abstract/connection_pool.rb:221:in `block in wait_poll'
activerecord (6.0.4) lib/active_record/connection_adapters/abstract/connection_pool.rb:210:in `loop'
activerecord (6.0.4) lib/active_record/connection_adapters/abstract/connection_pool.rb:210:in `wait_poll'
activerecord (6.0.4) lib/active_record/connection_adapters/abstract/connection_pool.rb:171:in `internal_poll'
activerecord (6.0.4) lib/active_record/connection_adapters/abstract/connection_pool.rb:297:in `internal_poll'
activerecord (6.0.4) lib/active_record/connection_adapters/abstract/connection_pool.rb:166:in `block in poll'
/Users/st0012/.rbenv/versions/2.7.2/lib/ruby/2.7.0/monitor.rb:202:in `synchronize'
/Users/st0012/.rbenv/versions/2.7.2/lib/ruby/2.7.0/monitor.rb:202:in `mon_synchronize'
activerecord (6.0.4) lib/active_record/connection_adapters/abstract/connection_pool.rb:175:in `synchronize'
activerecord (6.0.4) lib/active_record/connection_adapters/abstract/connection_pool.rb:166:in `poll'
activerecord (6.0.4) lib/active_record/connection_adapters/abstract/connection_pool.rb:875:in `acquire_connection'
activerecord (6.0.4) lib/active_record/connection_adapters/abstract/connection_pool.rb:593:in `checkout'
activerecord (6.0.4) lib/active_record/connection_adapters/abstract/connection_pool.rb:437:in `connection'
activerecord (6.0.4) lib/active_record/connection_adapters/abstract/connection_pool.rb:1125:in `retrieve_connection'
activerecord (6.0.4) lib/active_record/connection_handling.rb:221:in `retrieve_connection'
activerecord (6.0.4) lib/active_record/connection_handling.rb:189:in `connection'

# especially here
activerecord (6.0.4) lib/active_record/migration.rb:562:in `call'
actionpack (6.0.4) lib/action_dispatch/middleware/callbacks.rb:27:in `block in call'
activesupport (6.0.4) lib/active_support/callbacks.rb:101:in `run_callbacks'

# and here
actionpack (6.0.4) lib/action_dispatch/middleware/callbacks.rb:26:in `call' 
actionpack (6.0.4) lib/action_dispatch/middleware/executor.rb:14:in `call'
actionpack (6.0.4) lib/action_dispatch/middleware/actionable_exceptions.rb:18:in `call'
actionpack (6.0.4) lib/action_dispatch/middleware/debug_exceptions.rb:32:in `call'
web-console (4.1.0) lib/web_console/middleware.rb:132:in `call_app'

@chris-canipe
Copy link
Author

Great find!

I assume this also explains why sometimes Sentry reported errors, but at a lower rate? For example, for a few errors that made it into Sentry, I'm able to verify in Splunk that they were thrown outside of middleware.

I haven't seen any of the backtraces yet. We recently enabled debug but unfortunately those errors do not go into Splunk and we have to examine the logs live. And these errors do not occur often, so this approach of monitoring is subpar. We're working on a better solution for this.

Is this fixed by moving the Sentry middleware to a different level in the stack, or is more involved?

@st0012
Copy link
Collaborator

st0012 commented Sep 3, 2021

I think it may be fixed by moving Sentry::Rails::RescuedExceptionInterceptor right after ActionDispatch::DebugExceptions. so it'll look like

use Rails::Rack::Logger
use ActionDispatch::ShowExceptions
use ActionDispatch::DebugExceptions
use Sentry::Rails::RescuedExceptionInterceptor
# ....

I'll need to do more testing on this change to verify it works and doesn't break anything else. But it'd be great if you can try it a tell me if it works too.

@st0012
Copy link
Collaborator

st0012 commented Sep 3, 2021

@chris-canipe I've added #1564 to fix the issue. can you try the branch and check if that fixes your issue?

gem "sentry-rails", github: "getsentry/sentry-ruby", branch: "fix-#1563"

@st0012 st0012 added this to the 4.8.0 milestone Sep 3, 2021
@chris-canipe
Copy link
Author

Thank you! We've deployed this and I'll let you know what we find. It may take a while since these errors aren't common.

@st0012
Copy link
Collaborator

st0012 commented Sep 7, 2021

@chris-canipe hey do you have some numbers now? I think if you can see a few hundreds of them it should mean it's working? otherwise I'll start working toward other directions.

@chris-canipe
Copy link
Author

@st0012 we haven't seen any show up in Splunk yet in order to make the comparison. I'll let you know as soon as we do; it could take weeks (based on looking at past data).

@st0012
Copy link
Collaborator

st0012 commented Sep 8, 2021

@chris-canipe I see. I've merged the fix anyway because it works in my local testing. I'll try to ship it this week.

@st0012 st0012 modified the milestones: 4.8.0, 4.7.2 Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants