-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #376: AwsXrayRemoteSampler
doesn’t poll for update
#377
Fix #376: AwsXrayRemoteSampler
doesn’t poll for update
#377
Conversation
|
27d8b19
to
6051652
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this & and contributing this fix! Just to confirm, did you run the integration tests in the package as part of this change?
clock.nanoTime() | ||
+ TimeUnit.MILLISECONDS.toNanos( | ||
AwsXrayRemoteSampler.DEFAULT_TARGET_INTERVAL_MILLIS)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if this value is greater than INT_MAX
? Is that a possibility?
@@ -148,8 +149,8 @@ private void getAndUpdateSampler() { | |||
} | |||
|
|||
private void scheduleSamplerUpdate() { | |||
long delay = pollingIntervalNanos + RANDOM.nextInt(jitterNanos); | |||
pollFuture = executor.schedule(this::getAndUpdateSampler, delay, TimeUnit.NANOSECONDS); | |||
long delay = pollingIntervalMillis + RANDOM.nextInt(jitterMillis); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we stick with nanos and switch to Random.longs(0, jitterNanos)
instead of switching to millis? It's nice to avoid dropping precision mysteriously when possible, even in practice it generally shouldn't matter here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! This requires significantly less changes. I pushed an updated commit.
6051652
to
fbf3c6e
Compare
* returns the duration until the next scheduled sampler update or null if no next update is scheduled yet | ||
*/ | ||
@Nullable | ||
public Duration getNextSamplerUpdateScheduledDuration() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently only required for the test. Would that be okay?
I don't have an idea how to test this properly because we don't want to wait for 5 minutes in the test, right? As only long polling intervals trigger this bug so we should use a long polling interval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm I think the maintainers would have final say but I don't think we want to expose additional public methods in the sampler like this. Is it possible to mock the system clock and trick the test into thinking 5 minutes have passed? Or just otherwise verifying the interval isn't negative when set to 5 minutes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can it be package-private and still accessed from test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I made the method package-private.
fbf3c6e
to
bf09a9a
Compare
@willarmiros Does running the integration tests mean running If yes, running this command returns an error on my machine:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixscheinost for the test failure it looks like you're not resolving the package version for some reason... Maybe you need to do a build first? https://github.com/open-telemetry/opentelemetry-java-contrib#getting-started
Not sure if @anuraaga would have any other ideas why this could happen.
* returns the duration until the next scheduled sampler update or null if no next update is scheduled yet | ||
*/ | ||
@Nullable | ||
public Duration getNextSamplerUpdateScheduledDuration() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm I think the maintainers would have final say but I don't think we want to expose additional public methods in the sampler like this. Is it possible to mock the system clock and trick the test into thinking 5 minutes have passed? Or just otherwise verifying the interval isn't negative when set to 5 minutes?
Sorry for the confusion on |
@SuppressWarnings( | ||
"OptionalGetWithoutIsPresent") // getAsLong: RANDOM.longs should always provide a value, so | ||
// we can do an unchecked unwrap here | ||
long delay = pollingIntervalNanos + RANDOM.longs(0, jitterNanos).findFirst().getAsLong(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's replace the jitterNanos
field with RANDOM.longs(0, jitterNanos)
to not construct the stream every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I am not very used to working with streams. How would I get the next value from a stream? From reading examples and the documentation a little streams aren't supposed to use that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "unlimited streams" like this RANDOM.longs()
aren't very common so examples out there may look a bit weird. But in this case, just creating the stream once and using it over and over should work OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used .iterator()
to convert the stream to an Iterator<Long>
and use next()
to get the next item. As the stream should be infinite calling hasNext()
should not be necessary.
I think one way to test that the scheduling works correctly without the This way we could use a mocked scheduler that could verify what's scheduleed. |
bf09a9a
to
c3f9ab2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - just small nit left
// https://github.com/open-telemetry/opentelemetry-java-contrib/issues/376 | ||
@Test | ||
void testJitterTruncation() { | ||
sampler.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the same pattern as defaultInitialSampler()
, not operating on the test's "base sampler" and defining a local one within try/resources.
In the default configuration `pollingIntervalNanos = 3 * 10^11` so `pollingIntervalMillis / 100 > Integer.MAX_VALUE`. Switch to storing the jitter in a `long` as well.
c3f9ab2
to
87a675d
Compare
Description:
In the default configuration
pollingIntervalNanos = 3 * 10^11
sopollingIntervalMillis / 100 > Integer.MAX_VALUE
.Switch to storing everything in milliseconds instead. This should be a non-breaking change as all the nanoseconds related logic isn’t part of the public API.
Existing Issue(s):
#376
Testing:
I don't know how this could be tested as all the logic isn't part of the public API.
Best would be a sort of integration test where time could be skipped forward and we could observe a request. Is there some code already set up in this project to do something like this?EDIT: I noticed that there both real integration tests and a test that tests polling with a very short polling interval.
I tested this manually by shadowing the two affected classes in my project and manually testing there.