Flush any pending buffers on close() #54

ringerc · 2018-02-05T14:27:37Z

The RemoteReporter buffers spans, but didn't flush them on close.
So any spans Finish()ed between the last flush interval and the
Close() of the Tracer would be lost.

Fixes #52

Signed-off-by: Craig Ringer craig@2ndquadrant.com

codecov · 2018-02-05T14:34:48Z

Codecov Report

Merging #54 into master will decrease coverage by 0.5%.
The diff coverage is 29.41%.

@@            Coverage Diff             @@
##           master      #54      +/-   ##
==========================================
- Coverage   88.46%   87.96%   -0.51%     
==========================================
  Files          93       93              
  Lines        2246     2260      +14     
==========================================
+ Hits         1987     1988       +1     
- Misses        259      272      +13

Impacted Files	Coverage Δ
src/jaegertracing/reporters/RemoteReporter.h	`100% <ø> (ø)`	⬆️
src/jaegertracing/reporters/Reporter.h	`66.66% <0%> (-33.34%)`	⬇️
src/jaegertracing/reporters/RemoteReporter.cpp	`66.17% <30%> (-7.16%)`	⬇️
src/jaegertracing/Tracer.h	`82% <33.33%> (-4.32%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c36adcf...866782d. Read the comment docs.

rnburn · 2018-02-05T14:40:43Z

btw - is calling close automatically like this in the reporter's destructor really a good idea?
https://github.com/jaegertracing/cpp-client/blob/master/src/jaegertracing/reporters/RemoteReporter.h#L45
It can cause the process to take much longer to exit than it otherwise would.

ringerc · 2018-02-05T15:13:14Z

On 5 February 2018 at 22:40, Ryan ***@***.***> wrote: btw - is calling close automatically like this in the reporter's destructor really a good idea? https://github.com/jaegertracing/cpp-client/blob/master/src/jaegertracing/ reporters/RemoteReporter.h#L45 it can cause the process to take much longer to exit than it otherwise would.

Yeah. Possibly not in the dtor; maybe it's best done only via an explicit Close() from the Tracer. Need to expose internal API for that.

ringerc · 2018-02-07T07:44:40Z

OK, so addressing #53 instead with an extension Tracer.Flush() method to allow explicit flushes.

ringerc · 2018-02-07T11:49:10Z

Hrm, doesn't seem to do the job. I'm still losing spans. General approach seems sensible, forcing a blocking flush via separate API request. I just haven't figured out how to do it yet through all the layers involved.

ringerc · 2018-02-07T12:23:22Z

Maybe something like

 void RemoteReporter::flush()
{
    async_flush();
    std::unique_lock<std::mutex> lock(_mutex);
    _cv.wait(lock, [this]() {
        return !_running || _queue.empty();
    });
}

?

isaachier · 2018-02-07T16:15:42Z

Actually, I see the issue is that I didn't add a "poison pill" to the end of the queue, instead used a boolean flag running. That means the shutdown is immediate. Will patch and submit new PR. Thanks for the help here, but will be taking a different approach, the one used in the Go client.

isaachier · 2018-02-07T16:21:14Z

Actually, I see we do check for an empty queue in the thread's main loop, so I'm not sure why this isn't effectively the same thing. Also, now that I've read #53, I see this issue has further reaching goals than just fixing this local problem. Might be worth adding an eager flush mechanism as you suggested.

isaachier

This makes sense to me but I want to discuss with Jaeger devs about overall idea of eager flushing. Hopefully will merge soon. Thanks for the help.

isaachier · 2018-02-07T16:25:58Z

src/jaegertracing/Tracer.h

@@ -194,6 +194,11 @@ class Tracer : public opentracing::Tracer,

    void close() noexcept { Close(); }

+    void Flush()


I only use the title case functions for opentracing-cpp methods. I'd prefer flush here unless it is added to the OT implementation and this is an override.

Fixed in latest

ringerc · 2018-02-08T16:22:20Z

I'm still seeing intermittent lost spans with this patch, though less frequently.

I just did some test runs with Wireshark observing the udp communications. The lost spans definitely never seem to get sent on the wire, they're not misplaced in the collector. The span name / operation name never appears on the wire at all.

Cause is uncertain as yet. Having trouble debugging effectively. May have to write small test program to see if I can repro.

In case it's relevant, my spans seem to be almost exactly 10s long if they do arrive. This is ... aberrant. Tomorrow will examine whether the app's doing something crazy there or the span timestamps are bogus.

Extend the OpenTracing API with an explicit jaegertracing::Tracer::flush() method to force spans to be flushed eagerly without closing the tracer. It returns only when the spans are flushed. To support this a new condition variable is introduced in the reporter to allow the main thread to wait on notification from the reporter flush thread. Fixes jaegertracing#53 Call flush() from Close(), but not from the Tracer dtor. So we follow the spec and ensure we flush buffers on explicit Close only. Fixes jaegertracing#52 Signed-off-by: Craig Ringer <craig@2ndquadrant.com>

ringerc · 2018-02-09T03:56:59Z

OK, modified as discussed. It now flushes buffers on explicit Close() only.

My implementation of RemoteReporter::flush() was pretty much completely wrong. Fixed now, so it waits on its own condition variable and the reporter thread sets that condition variable after a wakeup (whether or not a span was flushed).

ringerc · 2018-02-09T03:57:29Z

@isaachier Should be ready for review/merge

isaachier

I think a lot of this is overkill. Instead of adding explicit flush, maybe we can just add an option to flush everything synchronously. That way we can avoid spawning a thread altogether.

isaachier · 2018-02-09T04:07:56Z

src/jaegertracing/reporters/RemoteReporter.cpp

+{
+    std::unique_lock<std::mutex> lock(_mutex);
+    _cv_flush.wait(lock, [this]() {
+        async_flush();


This definitely looks wrong. The lambda here is just meant to express a boolean condition to avoid spurious wakeups. For example, instead of while (running) { cv.wait(); } you can do cv.wait([&running] { return running; }. This will potentially lead to many bugs in the program.

isaachier · 2018-02-12T19:44:31Z

Fixed in #59.

ringerc · 2018-02-13T06:28:24Z

Thanks!

ringerc force-pushed the flush-on-close branch 2 times, most recently from 1102a98 to 6ac58e8 Compare February 7, 2018 07:41

ringerc force-pushed the flush-on-close branch from 6ac58e8 to 94b50d2 Compare February 7, 2018 08:09

isaachier closed this Feb 7, 2018

isaachier mentioned this pull request Feb 7, 2018

Expose interface to flush buffered spans #53

Closed

isaachier reopened this Feb 7, 2018

isaachier reviewed Feb 7, 2018

View reviewed changes

ringerc force-pushed the flush-on-close branch from 94b50d2 to 521f442 Compare February 9, 2018 03:32

ringerc force-pushed the flush-on-close branch from 521f442 to 866782d Compare February 9, 2018 03:54

isaachier suggested changes Feb 9, 2018

View reviewed changes

isaachier closed this Feb 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flush any pending buffers on close() #54

Flush any pending buffers on close() #54

ringerc commented Feb 5, 2018

codecov bot commented Feb 5, 2018 •

edited

Loading

rnburn commented Feb 5, 2018 •

edited

Loading

ringerc commented Feb 5, 2018 via email

ringerc commented Feb 7, 2018

ringerc commented Feb 7, 2018

ringerc commented Feb 7, 2018

isaachier commented Feb 7, 2018

isaachier commented Feb 7, 2018

isaachier left a comment

isaachier Feb 7, 2018

ringerc Feb 8, 2018

ringerc commented Feb 8, 2018 •

edited

Loading

ringerc commented Feb 9, 2018

ringerc commented Feb 9, 2018

isaachier left a comment

isaachier Feb 9, 2018

isaachier commented Feb 12, 2018

ringerc commented Feb 13, 2018 via email

		@@ -194,6 +194,11 @@ class Tracer : public opentracing::Tracer,

		void close() noexcept { Close(); }

		void Flush()

Flush any pending buffers on close() #54

Flush any pending buffers on close() #54

Conversation

ringerc commented Feb 5, 2018

codecov bot commented Feb 5, 2018 • edited Loading

Codecov Report

rnburn commented Feb 5, 2018 • edited Loading

ringerc commented Feb 5, 2018 via email

ringerc commented Feb 7, 2018

ringerc commented Feb 7, 2018

ringerc commented Feb 7, 2018

isaachier commented Feb 7, 2018

isaachier commented Feb 7, 2018

isaachier left a comment

Choose a reason for hiding this comment

isaachier Feb 7, 2018

Choose a reason for hiding this comment

ringerc Feb 8, 2018

Choose a reason for hiding this comment

ringerc commented Feb 8, 2018 • edited Loading

ringerc commented Feb 9, 2018

ringerc commented Feb 9, 2018

isaachier left a comment

Choose a reason for hiding this comment

isaachier Feb 9, 2018

Choose a reason for hiding this comment

isaachier commented Feb 12, 2018

ringerc commented Feb 13, 2018 via email

codecov bot commented Feb 5, 2018 •

edited

Loading

rnburn commented Feb 5, 2018 •

edited

Loading

ringerc commented Feb 8, 2018 •

edited

Loading