-
Notifications
You must be signed in to change notification settings - Fork 650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flush export queue when it reaches max_export_batch_size
#1521
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Code looks good, blocking design question.
@@ -194,7 +194,7 @@ def on_end(self, span: Span) -> None: | |||
|
|||
self.queue.appendleft(span) | |||
|
|||
if len(self.queue) >= self.max_queue_size // 2: | |||
if len(self.queue) >= self.max_export_batch_size: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change! I would wonder if it would be better to modify the exporter to flush the queue at something less than the max batch size?
I think this code was written to notify a flush well before the queue filled to handle race conditions. I haven't looked at the code super closely recently, but IIRC there is a possibility for multiple spans to end at the same time, and as a result:
- first span would notify the flush
- second span would be dropped, since the flush is notified, but the export_batch_size has been reached.
So maybe take something a little more cautious, like filling up to 80%, and have be the condition for both the notify and the flush itself (refactoring both to a single function call would prevent future splitting of the code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Queue overflow and spans drop — that's exactly the issue I'm addressing here.
There are 2 different numbers — max_queue_size = 2048
and max_export_batch_size = 512
.
Don't think it's a good idea to flush half- (or 80%-) full requests. That will only make effective_export_batch_size = 0.8 * max_export_batch_size
.
Second span in your example won't be dropped if max_queue_size > max_export_batch_size
, which should be true.
Maybe as extra check the code should validate that max_queue_size > 2 * max_export_batch_size
or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! Makes sense. Sorry about that, I misread the code the first time.
I think the checks there are there are already good (albeit maybe require more documentation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Good catch, thanks for the change.
Description
Currently
BatchExportSpanProcessor
skips the sleep if it already hasmax_export_batch_size
in queue, 512 by default. But it interrupts that sleep only when it collects 1024 items in the queue.It should export immediately when there are
max_export_batch_size
items in the queue.Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Manually by creating spans and analysing logs
Does This PR Require a Contrib Repo Change?
Answer the following question based on these examples of changes that would require a Contrib Repo Change:
The OTel specification has changed which prompted this PR to update the method interfaces of
opentelemetry-api/
oropentelemetry-sdk/
The method interfaces of
opentelemetry-instrumentation/
have changedThe method interfaces of
test/util
have changedScripts in
scripts/
that were copied over to the Contrib repo have changedConfiguration files that were copied over to the Contrib repo have changed (when consistency between repositories is applicable) such as in
pyproject.toml
isort.cfg
.flake8
When a new
.github/CODEOWNER
is addedMajor changes to project information, such as in:
README.md
CONTRIBUTING.md
Yes. - Link to PR:
No.
Checklist: