Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tempo query improve search performance #1048

Merged
merged 2 commits into from
Oct 10, 2024

Conversation

pavolloffay
Copy link
Collaborator

@pavolloffay pavolloffay commented Oct 4, 2024

Depends on grafana/tempo#4159

kubectl apply -f - <<EOF
apiVersion: tempo.grafana.com/v1alpha1
kind: TempoStack
metadata:
  name: simplest
spec:
  timeout: 90m
  extraConfig:
    tempo:
      ingester:
        max_block_duration: 2m
  storage:
    secret:
      name: s3-secret
      type: s3
  storageSize: 5Gi
  template:
    queryFrontend:
      jaegerQuery:
        enabled: true
        findTracesConcurrentRequests: 30
        ingress:
          route:
            termination: reencrypt
          type: route
EOF

Generate data

docker run --rm -it --net=host ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:latest traces --otlp-http=true --otlp-insecure --otlp-endpoint localhost:4318 --child-spans=2 --workers=3  --traces=1000

@pavolloffay pavolloffay changed the title Tempo query find traces jobs Tempo query improve search performance Oct 4, 2024
twoGBQuantity = resource.MustParse("2Gi")
tenGBQuantity = resource.MustParse("10Gi")
defaultServicesDuration = metav1.Duration{Duration: time.Hour * 24 * 3}
defaultFindTracesConcurrentRequests = 1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting to 1 might not be ideal. Perhaps we can default to numberofqueriers * 2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will keep the default to 1, here is some rationale:

Customers complain when querying large number of traces e.g. 500-1500. A single trace get from S3 can take from 2s to 8s. for 500 traces it is 500*4=2000s/60=33mins. Cutting the time in half with 2 concurrent requests does not bring any value, however 30 concurrent requests significantly improve the situation but for that customers need to scale up queriers.

Copy link
Collaborator

@andreasgerstmayr andreasgerstmayr Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use a default value which improves the current situation instead of leaving it as-is.

I like your suggestion of making the default value based on the number of queriers, maybe we can leave it 0 in the webhook and compute the default (2 * .spec.template.querier.replicas if set to 0) in the manifest generation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My above comment is for TempoStack, for TempoMonolithic we can't scale up queriers, so maybe let's go with 2 if the default max concurent requests per querier is 20?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the default for monolithic to 2 and tempostack to queriers*2.

@pavolloffay pavolloffay mentioned this pull request Oct 4, 2024
6 tasks
@pavolloffay pavolloffay force-pushed the tempo-query-find-traces-jobs branch from 0203052 to a63eb5f Compare October 7, 2024 07:31
@codecov-commenter
Copy link

codecov-commenter commented Oct 7, 2024

Codecov Report

Attention: Patch coverage is 23.07692% with 10 lines in your changes missing coverage. Please review.

Project coverage is 69.14%. Comparing base (88d46a5) to head (88c172f).

Files with missing lines Patch % Lines
internal/manifests/config/build.go 0.00% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1048      +/-   ##
==========================================
- Coverage   69.20%   69.14%   -0.06%     
==========================================
  Files         110      110              
  Lines        7049     7059      +10     
==========================================
+ Hits         4878     4881       +3     
- Misses       1881     1888       +7     
  Partials      290      290              
Flag Coverage Δ
unittests 69.14% <23.07%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Pavol Loffay <p.loffay@gmail.com>
Signed-off-by: Pavol Loffay <p.loffay@gmail.com>
@pavolloffay pavolloffay merged commit 900170b into grafana:main Oct 10, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants