-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exemplars prototype #936
Exemplars prototype #936
Conversation
ab2c1fa
to
0be0964
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
My approval here should carry only so much weight because I'm not very familiar with the Python SDK but this code generally looks readable and feels familiar. I have been thinking about how precisely I would integrate sampling in the SDK, and came up with two possibilities. One of the possibilities is to implement exemplars in the Aggregators themselves, as you've done, an d another possibility is to implement something in the Accumulator that might support exemplars in a less Aggregator-dependent way. Of course, you're also showing why we want the Aggregator to see the Update, so that Histograms can select exemplars by bucket.
To get this behavior w/ the first approach is straight forward, though it requires the Aggregator API to receive the excluded labels (as you have done), whereas up until now the Aggregators only know about numbers and math--labels are orthogonal to Aggregator state.
I thought about the other possibility (that exemplars can be Accumulator functionality) because it would be nice if we could compose something with any other aggregator to make this work. Imagine a class, named ExemplarAggregator, that can be used to select Aggregators in parallel any other aggregation (e.g., Histogram, Exact, ...). This is less straight forward, because now there's some kind of dependency between exemplar selection and an independent aggregator, and that's why I'm approving your approach.
For the record though, I put up a draft PR that begins this other approach here: open-telemetry/opentelemetry-go#1023. The idea is that the Accumulator can be told which dimensions to include (thus which to exclude) for each metric instrument, and that it therefore knows when to call the sampling manager. See: https://github.com/open-telemetry/opentelemetry-go/pull/1023/files#diff-4c3de179542bac3c5ceacd2090215160R225. I haven't worked out the details.
142c4ea
to
654f185
Compare
docs/examples/exemplars/README.rst
Outdated
- Exemplars will be picked to represent the input distribution, without unquantifiable bias | ||
- A "sample_count" attribute will be set on each exemplar to quantify how many measurements each exemplar represents | ||
|
||
See 'statistical_exemplars.ipynb' for the example (TODO: how do I link this?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you not able to just link to the python file instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also do something like this if you want it inlined in the docs.
.. literalinclude:: basic_trace.py
:language: python
:lines: 1-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did literalinclude of the python examples, would have preferred a link to the jupyter notebook but this will work for now
@@ -0,0 +1,84 @@ | |||
# Copyright The OpenTelemetry Authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can we rename basic_meter
to metrics and put this folder under that in the examples?
opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/exemplars.py
Outdated
Show resolved
Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/exemplars.py
Outdated
Show resolved
Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/exemplars.py
Outdated
Show resolved
Hide resolved
self.sample_set[replace_index] = exemplar | ||
|
||
def merge(self, set1, set2): | ||
combined = set1 + set2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the algorithm Josh linked, I don't think merging like this would give a uniform sample. In the extreme case, imagine k = 1
and you have two samplers; sampler1's rand_count = 1
(population size of 1) and sampler2 finishes with rand_count = 10
. sampler1's single value would be weighted much more strongly than any of the values sampler2 saw.
E.g. sampler1 see value [0]
, sampler2 sees values [1, 2, 3, ..., 10]
but keeps only k = 1 sample, say, [6]
. Randomly sampling k = 1 values from [0, 6]
would give [0]
50% of the time, even though it was one of 11 values sampled in the whole population.
I think there is a similar issue when arg_count > k
. The wiki page says each item is sampled with probability k/n
; if the samplers being merged saw different n
, then you would end up giving more weight to the sampler with smaller n
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed offline, just keeping the second set's exemplars with that assumption based on the current SDK implementation. Might need to change in the future but for now that keeps this much simpler
opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/exemplars.py
Outdated
Show resolved
Hide resolved
I love the Jupyter Notebook! I think it would be awesome to add nbsphinx to the docs setup so you could include the notebook directly in the docs, if others are cool with that. |
I actually tried to use nbsphinx, and found that the notebook was awful looking inline, due to its margins being the full page but the docs being only half a page. Maybe if I put it on its own page it would be different? |
Ya, not too sure, ive never used it myself. If nothing else works, you could probably render it to HTML then link or embed that somehow? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more small nits
opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/exemplars.py
Outdated
Show resolved
Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/exemplars.py
Outdated
Show resolved
Hide resolved
|
||
def __init__( | ||
self, | ||
config: dict, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you know the exact keys and types, try a TypedDict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't know the exact types - it may change depending on what aggregators need to know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean for custom aggregators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the values already used are int, str, bool, list, and there is no restriction on passing in something like a float to the config dict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow you. Are you saying you would use the same key with different value types depending on the aggregator?
opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/exemplars.py
Outdated
Show resolved
Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/exemplars.py
Outdated
Show resolved
Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/exemplars.py
Outdated
Show resolved
Hide resolved
bc1b06b
to
e4e8c20
Compare
I left a lengthy remark on this topic here: |
Closing for now. |
This PR introduces exemplars into the SDK (for details on exemplars, see OTEP #113)
These change are (of course) experimental, as no major backend supports "statistical" exemplars and there is no concrete SDK spec for exemplars yet (only an OTEP). So any feedback is very much appreciated :)
Built off of the views API prototype: #596