Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/util/log: flush bufferedSinks as part of crash reporter process #101562

Merged
merged 1 commit into from
Apr 24, 2023

Commits on Apr 21, 2023

  1. pkg/util/log: flush bufferedSinks as part of crash reporter process

    It was brought to our attention that SQL pods in serverless environments
    experiencing panics were not showing the panic logs in downstream log
    collection systems like Splunk.
    
    This caused an investigation into the crash reporter, where we found
    that the crash reporter *only* flushed the file logging sinks, but
    *not* the buffered network sinks.
    
    Because of this, when the crash reporter logged the panic details to
    the OPS channel, and the panic log was sent through a fluent-server
    logging sink, it would simply be buffered. The crash reported would
    then flush the *file* sinks before propogating the panic again to
    kill the process. Since we didn't trigger & wait for the flush of
    the buffered network sinks, the panic almost never made it to the
    downstream fluentbit collector in time. In the case of SQL pods,
    where log files are not maintained once the container is destroyed,
    this meant these panic logs would be lost entirely.
    
    This patch updates the log flush system with a new function called
    `FlushAll`. Previously, we had `Flush`, which would flush all the
    file logging sinks. This has been renamed to `FlushFileSinks`. Due
    to its widespread use throughout the code base, we intentionally
    maintain separation between the flushing of just the file sinks
    specifically, and the flushing of *all* buffered logging sinks
    (including network sinks), to avoid changing the semantics of the
    pre-existing function and its widespread usages.
    
    NB: The `FlushAll` is a synchronous operation. It will wait for each
    buffered logging sink to finish flushing before allowing the crash
    reporter to proceed. This ensures that the buffers are fully drained
    prior to propogating the panic and killing the process.
    
    Release note: none
    abarganier committed Apr 21, 2023
    Configuration menu
    Copy the full SHA
    69a6a8f View commit details
    Browse the repository at this point in the history