-
Notifications
You must be signed in to change notification settings - Fork 126
Conversation
Codecov Report
@@ Coverage Diff @@
## master #54 +/- ##
==========================================
- Coverage 88.46% 87.96% -0.51%
==========================================
Files 93 93
Lines 2246 2260 +14
==========================================
+ Hits 1987 1988 +1
- Misses 259 272 +13
Continue to review full report at Codecov.
|
btw - is calling close automatically like this in the reporter's destructor really a good idea? |
On 5 February 2018 at 22:40, Ryan ***@***.***> wrote:
btw - is calling close automatically like this in the reporter's
destructor really a good idea?
https://github.com/jaegertracing/cpp-client/blob/master/src/jaegertracing/
reporters/RemoteReporter.h#L45
it can cause the process to take much longer to exit than it otherwise
would.
Yeah. Possibly not in the dtor; maybe it's best done only via an explicit
Close() from the Tracer. Need to expose internal API for that.
|
1102a98
to
6ac58e8
Compare
OK, so addressing #53 instead with an extension |
6ac58e8
to
94b50d2
Compare
Hrm, doesn't seem to do the job. I'm still losing spans. General approach seems sensible, forcing a blocking flush via separate API request. I just haven't figured out how to do it yet through all the layers involved. |
Maybe something like
? |
Actually, I see the issue is that I didn't add a "poison pill" to the end of the queue, instead used a boolean flag |
Actually, I see we do check for an empty queue in the thread's main loop, so I'm not sure why this isn't effectively the same thing. Also, now that I've read #53, I see this issue has further reaching goals than just fixing this local problem. Might be worth adding an eager flush mechanism as you suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me but I want to discuss with Jaeger devs about overall idea of eager flushing. Hopefully will merge soon. Thanks for the help.
src/jaegertracing/Tracer.h
Outdated
@@ -194,6 +194,11 @@ class Tracer : public opentracing::Tracer, | |||
|
|||
void close() noexcept { Close(); } | |||
|
|||
void Flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only use the title case functions for opentracing-cpp methods. I'd prefer flush
here unless it is added to the OT implementation and this is an override.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in latest
I'm still seeing intermittent lost spans with this patch, though less frequently. I just did some test runs with Wireshark observing the udp communications. The lost spans definitely never seem to get sent on the wire, they're not misplaced in the collector. The span name / operation name never appears on the wire at all. Cause is uncertain as yet. Having trouble debugging effectively. May have to write small test program to see if I can repro. In case it's relevant, my spans seem to be almost exactly 10s long if they do arrive. This is ... aberrant. Tomorrow will examine whether the app's doing something crazy there or the span timestamps are bogus. |
94b50d2
to
521f442
Compare
Extend the OpenTracing API with an explicit jaegertracing::Tracer::flush() method to force spans to be flushed eagerly without closing the tracer. It returns only when the spans are flushed. To support this a new condition variable is introduced in the reporter to allow the main thread to wait on notification from the reporter flush thread. Fixes jaegertracing#53 Call flush() from Close(), but not from the Tracer dtor. So we follow the spec and ensure we flush buffers on explicit Close only. Fixes jaegertracing#52 Signed-off-by: Craig Ringer <craig@2ndquadrant.com>
521f442
to
866782d
Compare
OK, modified as discussed. It now flushes buffers on explicit My implementation of |
@isaachier Should be ready for review/merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a lot of this is overkill. Instead of adding explicit flush, maybe we can just add an option to flush everything synchronously. That way we can avoid spawning a thread altogether.
{ | ||
std::unique_lock<std::mutex> lock(_mutex); | ||
_cv_flush.wait(lock, [this]() { | ||
async_flush(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely looks wrong. The lambda here is just meant to express a boolean condition to avoid spurious wakeups. For example, instead of while (running) { cv.wait(); }
you can do cv.wait([&running] { return running; }
. This will potentially lead to many bugs in the program.
Fixed in #59. |
Thanks!
|
The RemoteReporter buffers spans, but didn't flush them on close.
So any spans Finish()ed between the last flush interval and the
Close() of the Tracer would be lost.
Fixes #52
Signed-off-by: Craig Ringer craig@2ndquadrant.com