Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YAAAAS - Yet Another Attempt At Adaptive Sampling #2966

Merged
merged 50 commits into from
Sep 8, 2021

Conversation

joe-elliott
Copy link
Member

@joe-elliott joe-elliott commented Apr 28, 2021

Which problem is this PR solving?

Short description of the changes

This PR builds on #2818 and as such @Ashmita152 is named as a coauthor. Currently it only wires config for adaptive sampling to the existing Cassandra storage implementations.

Additional changes

  • Adjusts hotrod jaeger tracer creation to allow configuration through env vars
  • Makes changes to ./docker-compose/jaeger-docker-compose.yaml to test adaptive sampling with hotrod.

@joe-elliott joe-elliott requested a review from a team as a code owner April 28, 2021 18:57
@joe-elliott joe-elliott requested a review from objectiser April 28, 2021 18:57
@mergify mergify bot requested a review from jpkrohling April 28, 2021 18:58
Signed-off-by: Joe Elliott <number101010@gmail.com>
Co-authored-by: Ashmita Bohara ashmita.bohara152@gmail.com
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
@jpkrohling
Copy link
Contributor

Do you want an initial review, or should I wait for the remaining items to be done?

@joe-elliott
Copy link
Member Author

Let me move this a little farther forward before a first review. I have everything wired up but I'd like to see it work at least a little before we start reviews.

@jpkrohling jpkrohling marked this pull request as draft April 29, 2021 13:33
@jpkrohling
Copy link
Contributor

I'll mark this as a draft then.

Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
@codecov
Copy link

codecov bot commented May 4, 2021

Codecov Report

Merging #2966 (86f42d8) into master (6685913) will decrease coverage by 0.08%.
The diff coverage is 85.32%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2966      +/-   ##
==========================================
- Coverage   95.90%   95.81%   -0.09%     
==========================================
  Files         242      243       +1     
  Lines       14831    14922      +91     
==========================================
+ Hits        14223    14297      +74     
- Misses        527      539      +12     
- Partials       81       86       +5     
Impacted Files Coverage Δ
plugin/sampling/leaderelection/leader_election.go 100.00% <ø> (ø)
plugin/storage/factory.go 94.47% <0.00%> (-4.24%) ⬇️
...ugin/sampling/strategystore/adaptive/aggregator.go 62.71% <33.33%> (+9.26%) ⬆️
pkg/hostname/hostname.go 53.84% <53.84%> (ø)
plugin/sampling/strategystore/static/factory.go 83.33% <57.14%> (-16.67%) ⬇️
cmd/collector/app/collector.go 75.23% <70.00%> (-0.81%) ⬇️
plugin/storage/cassandra/factory.go 97.08% <70.00%> (-2.92%) ⬇️
.../sampling/strategystore/adaptive/strategy_store.go 82.35% <82.35%> (ø)
cmd/collector/app/root_span_handler.go 100.00% <100.00%> (ø)
cmd/collector/app/span_handler_builder.go 100.00% <100.00%> (ø)
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6685913...86f42d8. Read the comment docs.

Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
@yurishkuro yurishkuro mentioned this pull request May 19, 2021
3 tasks
@wgb1990
Copy link

wgb1990 commented May 26, 2021

@joe-elliott

Sorry, I asked a very simple question. In the process of reading the source code, I found that the Store.InsertThroughput method was not called anywhere

// Store writes and retrieves sampling data to and from storage.
type Store interface {
// InsertThroughput inserts aggregated throughput for operations into storage.
InsertThroughput(throughput []*model.Throughput) error
......
}

@joe-elliott
Copy link
Member Author

Sorry, I asked a very simple question. In the process of reading the source code, I found that the Store.InsertThroughput method was not called anywhere

Thank you for pointing that out. The last time I spent time on this I was trying to track down the mechanism to add throughput. :)

The other elements were working: distributed lock and probabilities calculation, but this was not.

@yurishkuro
Copy link
Member

I realized there are a few files missing from what was available internally at Uber. I will work with @vprithvi to get them committed (they were all cleaned up for oss, but for some reason not moved).

@yurishkuro
Copy link
Member

The missing files have been added in #3065

Signed-off-by: Joe Elliott <number101010@gmail.com>
@joe-elliott
Copy link
Member Author

joe-elliott commented Jul 15, 2021

Let's go! I'll work on clawing back that last few percentage of code coverage today.

Signed-off-by: Joe Elliott <number101010@gmail.com>
model/span.go Outdated Show resolved Hide resolved
cmd/collector/app/sampling/strategystore/factory.go Outdated Show resolved Hide resolved
cmd/collector/main.go Outdated Show resolved Hide resolved
@@ -45,6 +47,10 @@ type Factory interface {

// CreateDependencyReader creates a dependencystore.Reader.
CreateDependencyReader() (dependencystore.Reader, error)

// CreateLockAndSamplingStore creates a distributedlock.Lock and samplingstore.Store.
// These are used for adaptive sampling.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sampling store could be static though, can it not? So not always for adaptive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SamplingStore is explicitly for adaptive sampling. It is for storing and retrieving the various datapoints necessary to calculate the current sampling rates:

https://github.com/jaegertracing/jaeger/blob/master/storage/samplingstore/interface.go#L25

Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
)

// NewStrategyStore creates a strategy store that holds adaptive sampling strategies.
func NewStrategyStore(options Options, metricsFactory metrics.Factory, logger *zap.Logger, lock distributedlock.Lock, store samplingstore.Store) (*Processor, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am now confused by what drives what. NewStrategyStore returns Processor? Is Processor API surface bigger than StrategyStore? If so, should this be renamed to reflect that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Processor implements the strategystore.StrategyStore interface which is used by the serving code to respond when a user hits /sampling?service="test". This is used for both adaptive and static sampling (and all future sampling served off this endpoint).

type StrategyStore interface {
	GetSamplingStrategy(ctx context.Context, serviceName string) (*sampling.SamplingStrategyResponse, error)
}

samplingstore.Store is an interface that abstracts away a backend (cassandra, elastic, ...) for use by the adaptive sampling code to record data for adaptive sampling.

type Store interface {
	InsertThroughput(throughput []*model.Throughput) error
	InsertProbabilitiesAndQPS(hostname string, probabilities model.ServiceOperationProbabilities, qps model.ServiceOperationQPS) error
	GetThroughput(start, end time.Time) ([]*model.Throughput, error)
	GetProbabilitiesAndQPS(start, end time.Time) (map[string][]model.ServiceOperationData, error)
	GetLatestProbabilities() (model.ServiceOperationProbabilities, error)
}

Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
@joe-elliott joe-elliott requested review from jpkrohling and removed request for objectiser August 19, 2021 15:29
jpkrohling
jpkrohling previously approved these changes Aug 20, 2021
@jpkrohling jpkrohling added this to the v1.26.0 milestone Aug 23, 2021
@yurishkuro yurishkuro removed this from the v1.26.0 milestone Sep 8, 2021
yurishkuro and others added 2 commits September 8, 2021 11:45
@yurishkuro yurishkuro merged commit e345aa7 into jaegertracing:master Sep 8, 2021
@joe-elliott
Copy link
Member Author

woohoo!

@jpkrohling jpkrohling added this to the v1.27.0 milestone Sep 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adaptive Sampling
4 participants