Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TraceQL] Fix subtly incorrect handling of second pass conditions #3734

Merged
merged 4 commits into from
May 31, 2024

Conversation

mdisibio
Copy link
Contributor

@mdisibio mdisibio commented May 30, 2024

What this PR does:
I'm pretty happy about this PR. It started off as a metrics grouping bug, but ended up with deleting very old parquet cruft that's been on our todo list for over a year. 😎

The bug is that conditions put in the second pass could be subjected to existence checks incorrectly. The query { } | rate() by(span.foo) should plot all spans in the "nil" time-series, because the first part { } matches all spans, and the second part span.foo doesn't exist. However it was returning the dreaded No Data.

image

Adding a condition to the query fixed it.

image

The source of the bug was the very crufty and very old "requireAtLeastOneMatch" class of optimizations. These optimizations predate the AllConditions, and as we suspected they were redundant and could be deleted. On the second pass we set AllConditions = false so that the second pass is considered truly optional. However these optimizations were forcing it back into an "all-conditions-like" mode and requiring the existent of by(span.foo). Simply deleting these and relying on AllConditions led to the correct behavior.

The remaining batchRequireAtLeastOneMatchOverall cannot yet be deleted.

Here is a snippet from the query_range benchmarks before and after with cases added for this query. Notice that the queries matched 0 spans before, but now they match all spans again. (In fact because all queries have { } they match the same number of spans no matter how they are grouped)

name                                                                       old spans/op   new spans/op     delta
BackendBlockQueryRange/{}_|_rate()/3                                          2.55M ± 0%       2.55M ± 0%        ~     (all equal)
BackendBlockQueryRange/{}_|_rate()_by_(name)/3                                2.55M ± 0%       2.55M ± 0%        ~     (all equal)
BackendBlockQueryRange/{}_|_rate()_by_(resource.service.name)/3               2.55M ± 0%       2.55M ± 0%        ~     (all equal)
BackendBlockQueryRange/{}_|_rate()_by_(span.http.url)/3                       2.55M ± 0%       2.55M ± 0%        ~     (all equal)
BackendBlockQueryRange/{}_|_rate()_by_(span.foo)/3                             0.00       2552739.00 ± 0%       +Inf%  (p=0.000 n=10+10)
BackendBlockQueryRange/{}_|_rate()_by_(resource.foo)/3                         0.00       2552739.00 ± 0%       +Inf%  (p=0.000 n=10+10)
BackendBlockQueryRange/{resource.service.name=`loki-ingester`}_|_rate()/3      504k ± 0%        504k ± 0%        ~     (all equal)
BackendBlockQueryRange/{status=error}_|_rate()/3                              93.8k ± 0%       93.8k ± 0%        ~     (all equal)

Benchmarks for search show some noise, but I don't think this actually changed performance. Because queries {...} | select(...) already set AllConditions to false (because of the pipeline). TODO - This can probably be fixed now.

BenchmarkBackendBlockTraceQL
name                                              old time/op    new time/op    delta
BackendBlockTraceQL/spanAttValMatch                  133ms ± 8%     133ms ± 5%    ~     (p=0.853 n=10+10)
BackendBlockTraceQL/spanAttValNoMatch               3.80ms ± 2%    3.78ms ± 2%    ~     (p=0.633 n=10+8)
BackendBlockTraceQL/spanAttIntrinsicMatch           63.1ms ± 2%    64.5ms ± 1%  +2.26%  (p=0.001 n=10+8)
BackendBlockTraceQL/spanAttIntrinsicNoMatch         2.92ms ± 2%    2.94ms ± 2%    ~     (p=0.353 n=10+10)
BackendBlockTraceQL/resourceAttValMatch              602ms ±10%     591ms ± 2%    ~     (p=0.796 n=10+10)
BackendBlockTraceQL/resourceAttValNoMatch           2.97ms ± 4%    2.95ms ± 2%    ~     (p=0.631 n=10+10)
BackendBlockTraceQL/resourceAttIntrinsicMatch       14.4ms ± 2%    14.3ms ± 3%    ~     (p=0.315 n=10+10)
BackendBlockTraceQL/resourceAttIntrinsicMatch#01    2.95ms ± 1%    2.99ms ± 3%  +1.38%  (p=0.035 n=10+10)
BackendBlockTraceQL/mixedValNoMatch                  177ms ± 2%     176ms ± 1%  -0.89%  (p=0.013 n=10+9)
BackendBlockTraceQL/mixedValMixedMatchAnd           2.99ms ± 4%    2.96ms ± 4%    ~     (p=0.393 n=10+10)
BackendBlockTraceQL/mixedValMixedMatchOr             169ms ± 2%     168ms ± 2%    ~     (p=0.436 n=10+10)
BackendBlockTraceQL/count                            424ms ± 3%     456ms ± 2%  +7.63%  (p=0.000 n=10+9)
BackendBlockTraceQL/struct                           612ms ±10%     610ms ± 9%    ~     (p=0.796 n=10+10)
BackendBlockTraceQL/||                              83.4ms ± 6%    82.2ms ± 2%    ~     (p=0.315 n=10+10)
BackendBlockTraceQL/mixed                           52.3ms ± 1%    52.1ms ± 1%    ~     (p=0.278 n=10+9)
BackendBlockTraceQL/complex                         3.82ms ± 3%    3.79ms ± 1%    ~     (p=0.101 n=10+8)

name                                              old speed      new speed      delta
BackendBlockTraceQL/spanAttValMatch                113MB/s ± 7%   113MB/s ± 5%    ~     (p=0.853 n=10+10)
BackendBlockTraceQL/spanAttValNoMatch              541MB/s ± 2%   544MB/s ± 2%    ~     (p=0.617 n=10+8)
BackendBlockTraceQL/spanAttIntrinsicMatch          249MB/s ± 2%   244MB/s ± 1%  -2.23%  (p=0.001 n=10+8)
BackendBlockTraceQL/spanAttIntrinsicNoMatch        507MB/s ± 2%   505MB/s ± 2%    ~     (p=0.353 n=10+10)
BackendBlockTraceQL/resourceAttValMatch           24.7MB/s ± 9%  25.1MB/s ± 2%    ~     (p=0.754 n=10+10)
BackendBlockTraceQL/resourceAttValNoMatch          149MB/s ± 3%   150MB/s ± 2%    ~     (p=0.631 n=10+10)
BackendBlockTraceQL/resourceAttIntrinsicMatch      920MB/s ± 2%   925MB/s ± 3%    ~     (p=0.315 n=10+10)
BackendBlockTraceQL/resourceAttIntrinsicMatch#01   162MB/s ± 1%   160MB/s ± 3%  -1.35%  (p=0.034 n=10+10)
BackendBlockTraceQL/mixedValNoMatch               13.2MB/s ± 2%  13.3MB/s ± 1%  +0.89%  (p=0.014 n=10+9)
BackendBlockTraceQL/mixedValMixedMatchAnd          152MB/s ± 4%   154MB/s ± 4%    ~     (p=0.393 n=10+10)
BackendBlockTraceQL/mixedValMixedMatchOr          94.9MB/s ± 2%  95.2MB/s ± 2%    ~     (p=0.436 n=10+10)
BackendBlockTraceQL/count                         34.3MB/s ± 3%  32.5MB/s ± 2%  -5.41%  (p=0.000 n=10+9)
BackendBlockTraceQL/struct                        27.1MB/s ±10%  27.2MB/s ±10%    ~     (p=0.754 n=10+10)
BackendBlockTraceQL/||                             179MB/s ± 6%   181MB/s ± 2%    ~     (p=0.315 n=10+10)
BackendBlockTraceQL/mixed                          281MB/s ± 1%   282MB/s ± 1%    ~     (p=0.278 n=10+9)
BackendBlockTraceQL/complex                        488MB/s ± 1%   494MB/s ± 2%  +1.12%  (p=0.043 n=8+10)

name                                              old MB_io/op   new MB_io/op   delta
BackendBlockTraceQL/spanAttValMatch                   15.1 ± 0%      15.1 ± 0%    ~     (all equal)
BackendBlockTraceQL/spanAttValNoMatch                 2.05 ± 0%      2.05 ± 0%    ~     (all equal)
BackendBlockTraceQL/spanAttIntrinsicMatch             15.7 ± 0%      15.7 ± 0%    ~     (all equal)
BackendBlockTraceQL/spanAttIntrinsicNoMatch           1.48 ± 0%      1.48 ± 0%    ~     (all equal)
BackendBlockTraceQL/resourceAttValMatch               14.8 ± 0%      14.8 ± 0%    ~     (all equal)
BackendBlockTraceQL/resourceAttValNoMatch             0.44 ± 0%      0.44 ± 0%    ~     (all equal)
BackendBlockTraceQL/resourceAttIntrinsicMatch         13.2 ± 0%      13.2 ± 0%    ~     (all equal)
BackendBlockTraceQL/resourceAttIntrinsicMatch#01      0.48 ± 0%      0.48 ± 0%    ~     (all equal)
BackendBlockTraceQL/mixedValNoMatch                   2.34 ± 0%      2.34 ± 0%    ~     (all equal)
BackendBlockTraceQL/mixedValMixedMatchAnd             0.46 ± 0%      0.46 ± 0%    ~     (all equal)
BackendBlockTraceQL/mixedValMixedMatchOr              16.0 ± 0%      16.0 ± 0%    ~     (all equal)
BackendBlockTraceQL/count                             14.6 ± 0%      14.8 ± 0%  +1.79%  (p=0.000 n=10+10)
BackendBlockTraceQL/struct                            16.5 ± 0%      16.5 ± 0%    ~     (all equal)
BackendBlockTraceQL/||                                14.9 ± 0%      14.9 ± 0%    ~     (all equal)
BackendBlockTraceQL/mixed                             14.7 ± 0%      14.7 ± 0%    ~     (all equal)
BackendBlockTraceQL/complex                           1.86 ± 1%      1.88 ± 3%  +1.18%  (p=0.048 n=9+9)

name                                              old alloc/op   new alloc/op   delta
BackendBlockTraceQL/spanAttValMatch                 34.0MB ± 1%    33.9MB ± 1%    ~     (p=0.436 n=10+10)
BackendBlockTraceQL/spanAttValNoMatch               1.63MB ± 0%    1.63MB ± 0%    ~     (p=0.065 n=9+10)
BackendBlockTraceQL/spanAttIntrinsicMatch           15.4MB ± 1%    15.5MB ± 0%    ~     (p=0.360 n=10+8)
BackendBlockTraceQL/spanAttIntrinsicNoMatch         1.17MB ± 0%    1.17MB ± 0%    ~     (p=0.753 n=10+10)
BackendBlockTraceQL/resourceAttValMatch              323MB ± 0%     323MB ± 0%    ~     (p=0.853 n=10+10)
BackendBlockTraceQL/resourceAttValNoMatch           1.14MB ± 0%    1.14MB ± 0%    ~     (p=0.104 n=10+9)
BackendBlockTraceQL/resourceAttIntrinsicMatch       1.27MB ± 0%    1.27MB ± 0%    ~     (p=0.883 n=10+8)
BackendBlockTraceQL/resourceAttIntrinsicMatch#01    1.14MB ± 0%    1.14MB ± 0%    ~     (p=0.170 n=10+10)
BackendBlockTraceQL/mixedValNoMatch                 1.68MB ± 0%    1.68MB ± 0%    ~     (p=1.000 n=8+8)
BackendBlockTraceQL/mixedValMixedMatchAnd           1.14MB ± 0%    1.14MB ± 0%    ~     (p=0.323 n=10+10)
BackendBlockTraceQL/mixedValMixedMatchOr            2.08MB ± 4%    2.11MB ± 2%    ~     (p=0.929 n=10+10)
BackendBlockTraceQL/count                            231MB ± 0%     233MB ± 0%  +0.87%  (p=0.000 n=8+10)
BackendBlockTraceQL/struct                          12.4MB ± 2%    12.4MB ± 2%    ~     (p=1.000 n=9+9)
BackendBlockTraceQL/||                              17.3MB ± 0%    17.3MB ± 0%    ~     (p=0.052 n=10+10)
BackendBlockTraceQL/mixed                           4.13MB ± 0%    4.13MB ± 0%    ~     (p=0.912 n=10+10)
BackendBlockTraceQL/complex                         1.21MB ± 1%    1.20MB ± 0%    ~     (p=0.068 n=10+8)

name                                              old allocs/op  new allocs/op  delta
BackendBlockTraceQL/spanAttValMatch                   343k ± 0%      343k ± 0%    ~     (p=0.954 n=10+9)
BackendBlockTraceQL/spanAttValNoMatch                17.4k ± 0%     17.4k ± 0%    ~     (p=0.370 n=10+10)
BackendBlockTraceQL/spanAttIntrinsicMatch             122k ± 0%      122k ± 0%    ~     (p=0.769 n=10+9)
BackendBlockTraceQL/spanAttIntrinsicNoMatch          17.4k ± 0%     17.4k ± 0%    ~     (all equal)
BackendBlockTraceQL/resourceAttValMatch              2.34M ± 0%     2.34M ± 0%    ~     (p=0.956 n=10+10)
BackendBlockTraceQL/resourceAttValNoMatch            17.4k ± 0%     17.4k ± 0%    ~     (all equal)
BackendBlockTraceQL/resourceAttIntrinsicMatch        18.6k ± 0%     18.6k ± 0%    ~     (p=0.173 n=10+9)
BackendBlockTraceQL/resourceAttIntrinsicMatch#01     17.4k ± 0%     17.4k ± 0%    ~     (all equal)
BackendBlockTraceQL/mixedValNoMatch                  18.2k ± 0%     18.2k ± 0%    ~     (all equal)
BackendBlockTraceQL/mixedValMixedMatchAnd            17.4k ± 0%     17.4k ± 0%    ~     (all equal)
BackendBlockTraceQL/mixedValMixedMatchOr             24.4k ± 0%     24.4k ± 0%    ~     (p=0.988 n=10+10)
BackendBlockTraceQL/count                            1.53M ± 0%     1.53M ± 0%  +0.03%  (p=0.000 n=10+10)
BackendBlockTraceQL/struct                           49.7k ± 1%     49.7k ± 1%    ~     (p=0.981 n=8+8)
BackendBlockTraceQL/||                                149k ± 0%      149k ± 0%    ~     (p=0.889 n=9+10)
BackendBlockTraceQL/mixed                            44.8k ± 0%     44.8k ± 0%    ~     (p=0.892 n=9+10)
BackendBlockTraceQL/complex                          17.8k ± 0%     17.8k ± 0%    ~     (p=0.187 n=9+10)

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Member

@mapno mapno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mdisibio mdisibio merged commit 4e85a50 into grafana:main May 31, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants