-
Notifications
You must be signed in to change notification settings - Fork 732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
subscriber: fix incorrect filter selection + improve filter performance #583
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
hawkw
force-pushed
the
eliza/fast-filters
branch
from
February 13, 2020 22:19
8003804
to
19b54fc
Compare
hawkw
changed the title
[DRAFT] various performance improvements for filters
subscriber: fix incorrect filter selection + improve filter performance
Feb 14, 2020
hawkw
added
crate/tracing
Related to the `tracing` crate
kind/bug
Something isn't working
kind/perf
Performance improvements
labels
Feb 14, 2020
I believe @samschlegel has also seen very significant reductions in the performance impact of slow filtering in more real-world benchmarks. |
davidbarsky
approved these changes
Feb 14, 2020
yaahc
approved these changes
Feb 14, 2020
hawkw
added a commit
that referenced
this pull request
Feb 14, 2020
Changed - **filter**: `EnvFilter` directive selection now behaves correctly (i.e. like `env_logger`) (#583) Fixed - **filter**: Fixed `EnvFilter` incorrectly allowing less-specific filter directives to enable events that are disabled by more-specific filters (#583) - **filter**: Multiple significant `EnvFilter` performance improvements, especially when filtering events generated by `log` records (#578, #583) - **filter**: Replaced `BTreeMap` with `Vec` in `DirectiveSet`, improving iteration performance significantly with typical numbers of filter directives (#580) Signed-off-by: Eliza Weisman <eliza@buoyant.io>
hawkw
added a commit
that referenced
this pull request
Feb 14, 2020
Changed - **filter**: `EnvFilter` directive selection now behaves correctly (i.e. like `env_logger`) (#583) Fixed - **filter**: Fixed `EnvFilter` incorrectly allowing less-specific filter directives to enable events that are disabled by more-specific filters (#583) - **filter**: Multiple significant `EnvFilter` performance improvements, especially when filtering events generated by `log` records (#578, #583) - **filter**: Replaced `BTreeMap` with `Vec` in `DirectiveSet`, improving iteration performance significantly with typical numbers of filter directives (#580) Signed-off-by: Eliza Weisman <eliza@buoyant.io>
hawkw
added a commit
that referenced
this pull request
Feb 14, 2020
Changed - **filter**: `EnvFilter` directive selection now behaves correctly (i.e. like `env_logger`) (#583) Fixed - **filter**: Fixed `EnvFilter` incorrectly allowing less-specific filter directives to enable events that are disabled by more-specific filters (#583) - **filter**: Multiple significant `EnvFilter` performance improvements, especially when filtering events generated by `log` records (#578, #583) - **filter**: Replaced `BTreeMap` with `Vec` in `DirectiveSet`, improving iteration performance significantly with typical numbers of filter directives (#580) Signed-off-by: Eliza Weisman <eliza@buoyant.io>
hawkw
added a commit
that referenced
this pull request
Feb 14, 2020
Changed - **filter**: `EnvFilter` directive selection now behaves correctly (i.e. like `env_logger`) (#583) Fixed - **filter**: Fixed `EnvFilter` incorrectly allowing less-specific filter directives to enable events that are disabled by more-specific filters (#583) - **filter**: Multiple significant `EnvFilter` performance improvements, especially when filtering events generated by `log` records (#578, #583) - **filter**: Replaced `BTreeMap` with `Vec` in `DirectiveSet`, improving iteration performance significantly with typical numbers of filter directives (#580) Signed-off-by: Eliza Weisman <eliza@buoyant.io>
hawkw
added a commit
that referenced
this pull request
Feb 14, 2020
### Changed - **filter**: `EnvFilter` directive selection now behaves correctly (i.e. like `env_logger`) (#583) ### Fixed - **filter**: Fixed `EnvFilter` incorrectly allowing less-specific filter directives to enable events that are disabled by more-specific filters (#583) - **filter**: Multiple significant `EnvFilter` performance improvements, especially when filtering events generated by `log` records (#578, #583) - **filter**: Replaced `BTreeMap` with `Vec` in `DirectiveSet`, improving iteration performance significantly with typical numbers of filter directives (#580) A big thank-you to @samschlegel for lots of help with `EnvFilter` performance tuning in this release! Signed-off-by: Eliza Weisman <eliza@buoyant.io>
hawkw
added a commit
that referenced
this pull request
Mar 5, 2020
## Motivation Recent changes to `tracing-subscriber` (#580 and #583) introduced some regressions in filter directive selection. In particular, directive selection appears to depend on the _order_ in which directives were specified in a env filter string, or on the order in which they were added using `add_directive`, rather than on specificity. This regression is due to the change that switched the storage of filter directives in `DirectiveSet`s from a `BTreeSet` to a `Vec`. Previously, the `DirectiveSet::add` and `DirectiveSet::extend` methods both relied on the inherent ordering of `BTreeSet`s. After changing to a `Vec`, the `DirectiveSet::add` method was changed to use a binary search to find the correct position for each directive, and use `Vec::insert` to add the directive at that position. This is correct behavior. However, the `Extend` (and therefore also `FromIterator`) implementations _did not use_ `add_directive` --- instead, they simply called `extend` on the underlying data structure. This was fine previously, when we could rely on the sorted nature of `BTreeSet`s, but now, it means that when a directive set is created from an iterator (such as when parsing a string with multiple filter directives!), the ordering of the directive set is based on the iterator's ordering, rather than sorted. We didn't catch this bug because all of our tests happen to put the least specific directive first. When the change to using a `Vec` broke all the existing tests, I was able to "fix" them simply by adding a `.rev()` call to the iterator, based on the incorrect assumption that we were always using the sorted insertion order, and that the test failures were simply due to the binary search inserting in the opposite order as `BTreeSet`. Adding the `.rev()` call caused issue #591, where a `DirectiveSet` built by calls to `add_directive` (which _does_ obey the correct sorting) was not selecting the right filters, since we were reversing the ordering and picking the least specific directive first. ## Solution I've changed the `DirectiveSet::extend` method to call `self.add` for each directive in the iterator. Now, they are inserted at the correct position. I've also removed the call to `.rev()` so that we iterate over the correctly sorted `DirectiveSet` in most-specific-first order. I've added new tests to reproduce both issue #591 and issue #623, and confirmed that they both fail prior to this change. Fixes #591 Fixes #623 Signed-off-by: Eliza Weisman <eliza@buoyant.io>
5 tasks
I'm seeing this same bug on the latest version. Has there possibly been a regression after this was fixed? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
crate/tracing
Related to the `tracing` crate
kind/bug
Something isn't working
kind/perf
Performance improvements
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
The benchmarks added in #581 indicate that
tracing-subscriber
'sEnvFilter
has significant overhead when filtering events generatedfrom
log
records that is not present when filtering nativetracing
events.
Additionally, the behavior of
EnvFilter
differs from theenv_logger
crate it emulates: when a less specific filter is more verbose than a
more specific filter, the more verbose filter is used. For example:
With
env_logger
- treat all targets as DEBUG except fortokio_postgres
andhyper::server::response
With
EnvFilter::from_default_env()
- treat all targets as DEBUG.This then overrides the explicit setting of INFO for the other two,
so they are DEBUG as well.
(see #512)
Solution
This branch makes some performance optimizations to filtering, to
improve performance with
log
events significantly. In particular, itmakes the following changes:
Code was previously added to record the maximum level enabled by
the static and dynamic directive sets. This would allow a fast path
for skipping events that no directive will ever enable. However, this
currently is never actually checked, so we always have to check an
event against every directive. I've fixed that.
If there are no dynamic directives that would require looking at the
dynamic span state (a TLS value), we now avoid looking at TLS.
Finally, I've fixed EnvFilter::from_default_env() acts differently to env_logger #512, by changing
enabled
to select the mostspecific static filter, rather than by checking every filter until
it finds something that enables the span. In addition to making the
behaviour correct, this also improves performance: it means we only
iterate over the statics until we find something that cares about an
event, and then return. In the previous implementation (of incorrect
behavior), we would break the iteration if the event was enabled, but
never break iteration if it was disabled. Now, we can return early.
This should improve performance significantly with a large number of
filters.
Benchmark Results
Performance in the
filter_log
benchmark is significantly improved inmost cases, while performance in the
filter
benchmark (for filteringtracing
events) is about the same as master, but significantly betterthan
filter_log
across the board. This is expected, since thetracing
events infilter
benchmark can (in most cases) participatein the callsite cache, while the
log
events cannot. Therefore, thefilter
benchmarks with static filters are measuring the overhead of asingle filter hit + multiple callsite cache loads, while the
filter_log
benchmarks are always measuring an actual filter hit.Benchmark results (vs master):
Benchmark environment:
Fixes #512
Signed-off-by: Eliza Weisman eliza@buoyant.io