Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tempo query improve search performance #1048

Merged
merged 2 commits into from
Oct 10, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions .chloggen/tempo-query-find-traces-jobs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. tempostack, tempomonolithic, github action)
component: tempostack, tempomonolithic

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Add tempo-query CRD option to speed up trace search.

# One or more tracking issues related to the change
issues: [1048]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
Following CRD options were added to speed up trace search in Jaeger UI/API. The trace search first
searches for traceids and then it gets a full trace. With this configuration option the requests
to get the full trace can be run in parallel:
For `TempoStack` - `spec.template.queryFrontend.jaegerQuery.findTracesConcurrentRequests`
For `TempoMonolithic` - `spec.jaegerui.findTracesConcurrentRequests`
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Current Operator version
OPERATOR_VERSION ?= 0.13.0
TEMPO_VERSION ?= 2.5.0
TEMPO_QUERY_VERSION ?= main-2999520
JAEGER_QUERY_VERSION ?= 1.62.0
TEMPO_QUERY_VERSION ?= main-1de25ca
TEMPO_GATEWAY_VERSION ?= main-2024-08-05-11d0d94
TEMPO_GATEWAY_OPA_VERSION ?= main-2024-04-29-914c13f
OAUTH_PROXY_VERSION=4.14
Expand Down
12 changes: 8 additions & 4 deletions apis/tempo/v1alpha1/tempomonolithic_defaults.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,11 @@ import (
)

var (
twoGBQuantity = resource.MustParse("2Gi")
tenGBQuantity = resource.MustParse("10Gi")
defaultServicesDuration = metav1.Duration{Duration: time.Hour * 24 * 3}
defaultTimeout = metav1.Duration{Duration: time.Second * 30}
twoGBQuantity = resource.MustParse("2Gi")
tenGBQuantity = resource.MustParse("10Gi")
defaultServicesDuration = metav1.Duration{Duration: time.Hour * 24 * 3}
defaultTimeout = metav1.Duration{Duration: time.Second * 30}
defaultFindTracesConcurrentRequests = 1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting to 1 might not be ideal. Perhaps we can default to numberofqueriers * 2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will keep the default to 1, here is some rationale:

Customers complain when querying large number of traces e.g. 500-1500. A single trace get from S3 can take from 2s to 8s. for 500 traces it is 500*4=2000s/60=33mins. Cutting the time in half with 2 concurrent requests does not bring any value, however 30 concurrent requests significantly improve the situation but for that customers need to scale up queriers.

Copy link
Collaborator

@andreasgerstmayr andreasgerstmayr Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use a default value which improves the current situation instead of leaving it as-is.

I like your suggestion of making the default value based on the number of queriers, maybe we can leave it 0 in the webhook and compute the default (2 * .spec.template.querier.replicas if set to 0) in the manifest generation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My above comment is for TempoStack, for TempoMonolithic we can't scale up queriers, so maybe let's go with 2 if the default max concurent requests per querier is 20?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the default for monolithic to 2 and tempostack to queriers*2.

)

// Default sets all default values in a central place, instead of setting it at every place where the value is accessed.
Expand Down Expand Up @@ -88,6 +89,9 @@ func (r *TempoMonolithic) Default(ctrlConfig configv1alpha1.ProjectConfig) {
if r.Spec.JaegerUI.ServicesQueryDuration == nil {
r.Spec.JaegerUI.ServicesQueryDuration = &defaultServicesDuration
}
if r.Spec.JaegerUI.FindTracesConcurrentRequests == 0 {
r.Spec.JaegerUI.FindTracesConcurrentRequests = defaultFindTracesConcurrentRequests
}
}

if r.Spec.Timeout.Duration == 0 {
Expand Down
20 changes: 13 additions & 7 deletions apis/tempo/v1alpha1/tempomonolithic_defaults_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,8 @@ func TestMonolithicDefault(t *testing.T) {
Enabled: true,
SAR: "{\"namespace\": \"testns\", \"resource\": \"pods\", \"verb\": \"get\"}",
},
ServicesQueryDuration: &defaultServicesDuration,
ServicesQueryDuration: &defaultServicesDuration,
FindTracesConcurrentRequests: 1,
},
Management: "Managed",
Timeout: metav1.Duration{Duration: time.Second * 30},
Expand Down Expand Up @@ -269,7 +270,8 @@ func TestMonolithicDefault(t *testing.T) {
Enabled: false,
SAR: "{\"namespace\": \"testns\", \"resource\": \"pods\", \"verb\": \"get\"}",
},
ServicesQueryDuration: &defaultServicesDuration,
ServicesQueryDuration: &defaultServicesDuration,
FindTracesConcurrentRequests: 1,
},
Management: "Managed",
Timeout: metav1.Duration{Duration: time.Second * 30},
Expand Down Expand Up @@ -334,7 +336,8 @@ func TestMonolithicDefault(t *testing.T) {
Enabled: true,
SAR: "{\"namespace\": \"testns\", \"resource\": \"pods\", \"verb\": \"get\"}",
},
ServicesQueryDuration: &defaultServicesDuration,
ServicesQueryDuration: &defaultServicesDuration,
FindTracesConcurrentRequests: 1,
},
Management: "Managed",
Timeout: metav1.Duration{Duration: time.Second * 30},
Expand Down Expand Up @@ -398,15 +401,16 @@ func TestMonolithicDefault(t *testing.T) {
Enabled: false,
SAR: "{\"namespace\": \"testns\", \"resource\": \"pods\", \"verb\": \"get\"}",
},
ServicesQueryDuration: &defaultServicesDuration,
ServicesQueryDuration: &defaultServicesDuration,
FindTracesConcurrentRequests: 1,
},
Management: "Managed",
Timeout: metav1.Duration{Duration: time.Second * 30},
},
},
},
{
name: "define custom duration for services list and timeout",
name: "define custom duration for services list, timeout and find traces",
input: &TempoMonolithic{
ObjectMeta: v1.ObjectMeta{
Name: "test",
Expand All @@ -424,7 +428,8 @@ func TestMonolithicDefault(t *testing.T) {
Route: &MonolithicJaegerUIRouteSpec{
Enabled: true,
},
ServicesQueryDuration: &v1.Duration{Duration: time.Duration(100 * 100)},
ServicesQueryDuration: &v1.Duration{Duration: time.Duration(100 * 100)},
FindTracesConcurrentRequests: 40,
},
Timeout: metav1.Duration{Duration: time.Hour},
},
Expand Down Expand Up @@ -461,7 +466,8 @@ func TestMonolithicDefault(t *testing.T) {
Enabled: false,
SAR: "{\"namespace\": \"testns\", \"resource\": \"pods\", \"verb\": \"get\"}",
},
ServicesQueryDuration: &v1.Duration{Duration: time.Duration(100 * 100)},
ServicesQueryDuration: &v1.Duration{Duration: time.Duration(100 * 100)},
FindTracesConcurrentRequests: 40,
},
Management: "Managed",
Timeout: metav1.Duration{Duration: time.Hour},
Expand Down
13 changes: 13 additions & 0 deletions apis/tempo/v1alpha1/tempomonolithic_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,19 @@ type MonolithicJaegerUISpec struct {
// +optional
// +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="ServicesQueryDuration",xDescriptors="urn:alm:descriptor:com.tectonic.ui:advanced"
ServicesQueryDuration *metav1.Duration `json:"servicesQueryDuration,omitempty"`

// FindTracesConcurrentRequests defines how many concurrent request a single trace search can submit.
// The search for traces in Jaeger submits limit+1 requests. First requests finds trace IDs and then it fetches
// entire traces by ID. This property allows Jaeger to fetch traces in parallel.
// Note that by default a single Tempo querier can process 20 concurrent search jobs.
// Increasing this property might require scaling up querier instances, especially on error "job queue full"
// See also Tempo's extraConfig:
// querier.max_concurrent_queries (20 default)
// query_frontend.max_outstanding_per_tenant: (2000 default). Increase if the query-frontend returns 429
//
// +optional
// +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="FindTracesConcurrentRequests",xDescriptors="urn:alm:descriptor:com.tectonic.ui:advanced"
FindTracesConcurrentRequests int `json:"findTracesConcurrentRequests,omitempty"`
}

// MonolithicJaegerUIIngressSpec defines the settings for the Jaeger UI ingress.
Expand Down
13 changes: 13 additions & 0 deletions apis/tempo/v1alpha1/tempostack_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -618,6 +618,19 @@ type JaegerQuerySpec struct {
// +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="ServicesQueryDuration"
ServicesQueryDuration *metav1.Duration `json:"servicesQueryDuration,omitempty"`

// FindTracesConcurrentRequests defines how many concurrent request a single trace search can submit.
// The search for traces in Jaeger submits limit+1 requests. First requests finds trace IDs and then it fetches
// entire traces by ID. This property allows Jaeger to fetch traces in parallel.
// Note that by default a single Tempo querier can process 20 concurrent search jobs.
// Increasing this property might require scaling up querier instances, especially on error "job queue full"
// See also Tempo's extraConfig:
// querier.max_concurrent_queries (20 default)
// query_frontend.max_outstanding_per_tenant: (2000 default). Increase if the query-frontend returns 429
//
// +optional
// +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="FindTracesConcurrentRequests",xDescriptors="urn:alm:descriptor:com.tectonic.ui:advanced"
FindTracesConcurrentRequests int `json:"findTracesConcurrentRequests,omitempty"`

// Authentication defines the options for the oauth proxy used to protect jaeger UI
//
// +optional
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ metadata:
capabilities: Deep Insights
categories: Logging & Tracing,Monitoring
containerImage: ghcr.io/grafana/tempo-operator/tempo-operator:v0.13.0
createdAt: "2024-10-07T07:11:28Z"
createdAt: "2024-10-09T16:46:12Z"
description: Create and manage deployments of Tempo, a high-scale distributed
tracing backend.
operatorframework.io/cluster-monitoring: "true"
Expand Down Expand Up @@ -307,6 +307,19 @@ spec:
"{"namespace": "<tempo_stack_namespace>", "resource": "pods", "verb": "get"}'
displayName: SAR
path: jaegerui.authentication.sar
- description: 'FindTracesConcurrentRequests defines how many concurrent request
a single trace search can submit. The search for traces in Jaeger submits
limit+1 requests. First requests finds trace IDs and then it fetches entire
traces by ID. This property allows Jaeger to fetch traces in parallel. Note
that by default a single Tempo querier can process 20 concurrent search
jobs. Increasing this property might require scaling up querier instances,
especially on error "job queue full" See also Tempo''s extraConfig: querier.max_concurrent_queries
(20 default) query_frontend.max_outstanding_per_tenant: (2000 default).
Increase if the query-frontend returns 429'
displayName: FindTracesConcurrentRequests
path: jaegerui.findTracesConcurrentRequests
x-descriptors:
- urn:alm:descriptor:com.tectonic.ui:advanced
- description: Annotations defines the annotations of the Ingress object.
displayName: Annotations
path: jaegerui.ingress.annotations
Expand Down Expand Up @@ -1002,6 +1015,19 @@ spec:
path: template.queryFrontend.jaegerQuery.enabled
x-descriptors:
- urn:alm:descriptor:com.tectonic.ui:booleanSwitch
- description: 'FindTracesConcurrentRequests defines how many concurrent request
a single trace search can submit. The search for traces in Jaeger submits
limit+1 requests. First requests finds trace IDs and then it fetches entire
traces by ID. This property allows Jaeger to fetch traces in parallel. Note
that by default a single Tempo querier can process 20 concurrent search
jobs. Increasing this property might require scaling up querier instances,
especially on error "job queue full" See also Tempo''s extraConfig: querier.max_concurrent_queries
(20 default) query_frontend.max_outstanding_per_tenant: (2000 default).
Increase if the query-frontend returns 429'
displayName: FindTracesConcurrentRequests
path: template.queryFrontend.jaegerQuery.findTracesConcurrentRequests
x-descriptors:
- urn:alm:descriptor:com.tectonic.ui:advanced
- description: Ingress defines the options for the Jaeger Query ingress.
displayName: Jaeger Query UI Ingress Settings
path: template.queryFrontend.jaegerQuery.ingress
Expand Down Expand Up @@ -1427,7 +1453,7 @@ spec:
- name: RELATED_IMAGE_JAEGER_QUERY
value: docker.io/jaegertracing/jaeger-query:1.62.0
- name: RELATED_IMAGE_TEMPO_QUERY
value: docker.io/grafana/tempo-query:main-2999520
value: docker.io/grafana/tempo-query:main-1de25ca
- name: RELATED_IMAGE_TEMPO_GATEWAY
value: quay.io/observatorium/api:main-2024-08-05-11d0d94
- name: RELATED_IMAGE_TEMPO_GATEWAY_OPA
Expand Down Expand Up @@ -1576,7 +1602,7 @@ spec:
name: tempo
- image: docker.io/jaegertracing/jaeger-query:1.62.0
name: jaeger-query
- image: docker.io/grafana/tempo-query:main-2999520
- image: docker.io/grafana/tempo-query:main-1de25ca
name: tempo-query
- image: quay.io/observatorium/api:main-2024-08-05-11d0d94
name: tempo-gateway
Expand Down
11 changes: 11 additions & 0 deletions bundle/community/manifests/tempo.grafana.com_tempomonolithics.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1109,6 +1109,17 @@ spec:
description: Enabled defines if the Jaeger UI component should
be created.
type: boolean
findTracesConcurrentRequests:
description: |-
FindTracesConcurrentRequests defines how many concurrent request a single trace search can submit.
The search for traces in Jaeger submits limit+1 requests. First requests finds trace IDs and then it fetches
entire traces by ID. This property allows Jaeger to fetch traces in parallel.
Note that by default a single Tempo querier can process 20 concurrent search jobs.
Increasing this property might require scaling up querier instances, especially on error "job queue full"
See also Tempo's extraConfig:
querier.max_concurrent_queries (20 default)
query_frontend.max_outstanding_per_tenant: (2000 default). Increase if the query-frontend returns 429
type: integer
ingress:
description: Ingress defines the Ingress configuration for the
Jaeger UI.
Expand Down
11 changes: 11 additions & 0 deletions bundle/community/manifests/tempo.grafana.com_tempostacks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2391,6 +2391,17 @@ spec:
description: Enabled defines if the Jaeger Query component
should be created.
type: boolean
findTracesConcurrentRequests:
description: |-
FindTracesConcurrentRequests defines how many concurrent request a single trace search can submit.
The search for traces in Jaeger submits limit+1 requests. First requests finds trace IDs and then it fetches
entire traces by ID. This property allows Jaeger to fetch traces in parallel.
Note that by default a single Tempo querier can process 20 concurrent search jobs.
Increasing this property might require scaling up querier instances, especially on error "job queue full"
See also Tempo's extraConfig:
querier.max_concurrent_queries (20 default)
query_frontend.max_outstanding_per_tenant: (2000 default). Increase if the query-frontend returns 429
type: integer
ingress:
description: Ingress defines the options for the Jaeger
Query ingress.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ metadata:
capabilities: Deep Insights
categories: Logging & Tracing,Monitoring
containerImage: ghcr.io/grafana/tempo-operator/tempo-operator:v0.13.0
createdAt: "2024-10-07T07:11:27Z"
createdAt: "2024-10-09T16:46:11Z"
description: Create and manage deployments of Tempo, a high-scale distributed
tracing backend.
operatorframework.io/cluster-monitoring: "true"
Expand Down Expand Up @@ -307,6 +307,19 @@ spec:
"{"namespace": "<tempo_stack_namespace>", "resource": "pods", "verb": "get"}'
displayName: SAR
path: jaegerui.authentication.sar
- description: 'FindTracesConcurrentRequests defines how many concurrent request
a single trace search can submit. The search for traces in Jaeger submits
limit+1 requests. First requests finds trace IDs and then it fetches entire
traces by ID. This property allows Jaeger to fetch traces in parallel. Note
that by default a single Tempo querier can process 20 concurrent search
jobs. Increasing this property might require scaling up querier instances,
especially on error "job queue full" See also Tempo''s extraConfig: querier.max_concurrent_queries
(20 default) query_frontend.max_outstanding_per_tenant: (2000 default).
Increase if the query-frontend returns 429'
displayName: FindTracesConcurrentRequests
path: jaegerui.findTracesConcurrentRequests
x-descriptors:
- urn:alm:descriptor:com.tectonic.ui:advanced
- description: Annotations defines the annotations of the Ingress object.
displayName: Annotations
path: jaegerui.ingress.annotations
Expand Down Expand Up @@ -1002,6 +1015,19 @@ spec:
path: template.queryFrontend.jaegerQuery.enabled
x-descriptors:
- urn:alm:descriptor:com.tectonic.ui:booleanSwitch
- description: 'FindTracesConcurrentRequests defines how many concurrent request
a single trace search can submit. The search for traces in Jaeger submits
limit+1 requests. First requests finds trace IDs and then it fetches entire
traces by ID. This property allows Jaeger to fetch traces in parallel. Note
that by default a single Tempo querier can process 20 concurrent search
jobs. Increasing this property might require scaling up querier instances,
especially on error "job queue full" See also Tempo''s extraConfig: querier.max_concurrent_queries
(20 default) query_frontend.max_outstanding_per_tenant: (2000 default).
Increase if the query-frontend returns 429'
displayName: FindTracesConcurrentRequests
path: template.queryFrontend.jaegerQuery.findTracesConcurrentRequests
x-descriptors:
- urn:alm:descriptor:com.tectonic.ui:advanced
- description: Ingress defines the options for the Jaeger Query ingress.
displayName: Jaeger Query UI Ingress Settings
path: template.queryFrontend.jaegerQuery.ingress
Expand Down Expand Up @@ -1437,7 +1463,7 @@ spec:
- name: RELATED_IMAGE_JAEGER_QUERY
value: docker.io/jaegertracing/jaeger-query:1.62.0
- name: RELATED_IMAGE_TEMPO_QUERY
value: docker.io/grafana/tempo-query:main-2999520
value: docker.io/grafana/tempo-query:main-1de25ca
- name: RELATED_IMAGE_TEMPO_GATEWAY
value: quay.io/observatorium/api:main-2024-08-05-11d0d94
- name: RELATED_IMAGE_TEMPO_GATEWAY_OPA
Expand Down Expand Up @@ -1597,7 +1623,7 @@ spec:
name: tempo
- image: docker.io/jaegertracing/jaeger-query:1.62.0
name: jaeger-query
- image: docker.io/grafana/tempo-query:main-2999520
- image: docker.io/grafana/tempo-query:main-1de25ca
name: tempo-query
- image: quay.io/observatorium/api:main-2024-08-05-11d0d94
name: tempo-gateway
Expand Down
11 changes: 11 additions & 0 deletions bundle/openshift/manifests/tempo.grafana.com_tempomonolithics.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1109,6 +1109,17 @@ spec:
description: Enabled defines if the Jaeger UI component should
be created.
type: boolean
findTracesConcurrentRequests:
description: |-
FindTracesConcurrentRequests defines how many concurrent request a single trace search can submit.
The search for traces in Jaeger submits limit+1 requests. First requests finds trace IDs and then it fetches
entire traces by ID. This property allows Jaeger to fetch traces in parallel.
Note that by default a single Tempo querier can process 20 concurrent search jobs.
Increasing this property might require scaling up querier instances, especially on error "job queue full"
See also Tempo's extraConfig:
querier.max_concurrent_queries (20 default)
query_frontend.max_outstanding_per_tenant: (2000 default). Increase if the query-frontend returns 429
type: integer
ingress:
description: Ingress defines the Ingress configuration for the
Jaeger UI.
Expand Down
Loading