Fix #376: `AwsXrayRemoteSampler` doesn’t poll for update #377

felixscheinost · 2022-06-28T15:43:21Z

Description:

In the default configuration pollingIntervalNanos = 3 * 10^11 so pollingIntervalMillis / 100 > Integer.MAX_VALUE.

Switch to storing everything in milliseconds instead. This should be a non-breaking change as all the nanoseconds related logic isn’t part of the public API.

Existing Issue(s):

#376

Testing:

I don't know how this could be tested as all the logic isn't part of the public API.

~~Best would be a sort of integration test where time could be skipped forward and we could observe a request. Is there some code already set up in this project to do something like this?~~
EDIT: I noticed that there both real integration tests and a test that tests polling with a very short polling interval.

I tested this manually by shadowing the two affected classes in my project and manually testing there.

linux-foundation-easycla · 2022-06-28T15:43:26Z

The committers listed above are authorized under a signed CLA.

✅ login: felixscheinost / name: Felix Scheinost (2539e61)

willarmiros

Thanks for catching this & and contributing this fix! Just to confirm, did you run the integration tests in the package as part of this change?

willarmiros · 2022-06-29T18:36:11Z

aws-xray/src/main/java/io/opentelemetry/contrib/awsxray/XrayRulesSampler.java

+                clock.nanoTime()
+                    + TimeUnit.MILLISECONDS.toNanos(
+                        AwsXrayRemoteSampler.DEFAULT_TARGET_INTERVAL_MILLIS));


What if this value is greater than INT_MAX? Is that a possibility?

anuraaga · 2022-06-30T01:14:18Z

aws-xray/src/main/java/io/opentelemetry/contrib/awsxray/AwsXrayRemoteSampler.java

@@ -148,8 +149,8 @@ private void getAndUpdateSampler() {
  }

  private void scheduleSamplerUpdate() {
-    long delay = pollingIntervalNanos + RANDOM.nextInt(jitterNanos);
-    pollFuture = executor.schedule(this::getAndUpdateSampler, delay, TimeUnit.NANOSECONDS);
+    long delay = pollingIntervalMillis + RANDOM.nextInt(jitterMillis);


Can we stick with nanos and switch to Random.longs(0, jitterNanos) instead of switching to millis? It's nice to avoid dropping precision mysteriously when possible, even in practice it generally shouldn't matter here.

Good point! This requires significantly less changes. I pushed an updated commit.

felixscheinost · 2022-06-30T09:30:35Z

aws-xray/src/main/java/io/opentelemetry/contrib/awsxray/AwsXrayRemoteSampler.java

+   * returns the duration until the next scheduled sampler update or null if no next update is scheduled yet
+   */
+  @Nullable
+  public Duration getNextSamplerUpdateScheduledDuration() {


This is currently only required for the test. Would that be okay?

I don't have an idea how to test this properly because we don't want to wait for 5 minutes in the test, right? As only long polling intervals trigger this bug so we should use a long polling interval.

Hmmm I think the maintainers would have final say but I don't think we want to expose additional public methods in the sampler like this. Is it possible to mock the system clock and trick the test into thinking 5 minutes have passed? Or just otherwise verifying the interval isn't negative when set to 5 minutes?

can it be package-private and still accessed from test?

Good point, I made the method package-private.

felixscheinost · 2022-06-30T09:35:32Z

@willarmiros Does running the integration tests mean running ./gradlew :aws-xray:awsTest?

If yes, running this command returns an error on my machine:

> Task :aws-xray:compileAwsTestJava FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':aws-xray:compileAwsTestJava'.
> Could not resolve all files for configuration ':aws-xray:awsTestCompileClasspath'.
   > Could not find io.opentelemetry:opentelemetry-exporter-otlp-trace:.
     Required by:
         project :aws-xray

willarmiros

@felixscheinost for the test failure it looks like you're not resolving the package version for some reason... Maybe you need to do a build first? https://github.com/open-telemetry/opentelemetry-java-contrib#getting-started

Not sure if @anuraaga would have any other ideas why this could happen.

willarmiros · 2022-06-30T16:56:19Z

aws-xray/src/main/java/io/opentelemetry/contrib/awsxray/AwsXrayRemoteSampler.java

+   * returns the duration until the next scheduled sampler update or null if no next update is scheduled yet
+   */
+  @Nullable
+  public Duration getNextSamplerUpdateScheduledDuration() {


Hmmm I think the maintainers would have final say but I don't think we want to expose additional public methods in the sampler like this. Is it possible to mock the system clock and trick the test into thinking 5 minutes have passed? Or just otherwise verifying the interval isn't negative when set to 5 minutes?

anuraaga · 2022-07-01T02:25:10Z

Sorry for the confusion on awsTest - it's not run with the build currently since we don't have AWS credentials for integration tests in this repo. It seems to have drifted and doesn't build anymore... It's also not a true integration test as it just polls and expects manually checking the logs after operating the console. @willarmiros I'd suggest someone at AWS checks out the PR and runs it manually until we could have AWS creds in this repo for running integ tests.

anuraaga · 2022-07-01T02:20:38Z

aws-xray/src/main/java/io/opentelemetry/contrib/awsxray/AwsXrayRemoteSampler.java

+    @SuppressWarnings(
+        "OptionalGetWithoutIsPresent") // getAsLong: RANDOM.longs should always provide a value, so
+    // we can do an unchecked unwrap here
+    long delay = pollingIntervalNanos + RANDOM.longs(0, jitterNanos).findFirst().getAsLong();


Let's replace the jitterNanos field with RANDOM.longs(0, jitterNanos) to not construct the stream every time.

Sorry, I am not very used to working with streams. How would I get the next value from a stream? From reading examples and the documentation a little streams aren't supposed to use that way?

I think "unlimited streams" like this RANDOM.longs() aren't very common so examples out there may look a bit weird. But in this case, just creating the stream once and using it over and over should work OK.

I used .iterator() to convert the stream to an Iterator<Long> and use next() to get the next item. As the stream should be infinite calling hasNext() should not be necessary.

felixscheinost · 2022-07-01T07:32:55Z

I think one way to test that the scheduling works correctly without the getNextSamplerUpdateScheduledDuration method I added would be to add (also package-private) a way to override scheduler. The scheduler could also be set in the builder and passeed into the sampler.

This way we could use a mocked scheduler that could verify what's scheduleed.

anuraaga

Thanks - just small nit left

anuraaga · 2022-07-04T08:35:58Z

aws-xray/src/test/java/io/opentelemetry/contrib/awsxray/AwsXrayRemoteSamplerTest.java

+  // https://github.com/open-telemetry/opentelemetry-java-contrib/issues/376
+  @Test
+  void testJitterTruncation() {
+    sampler.close();


Let's use the same pattern as defaultInitialSampler(), not operating on the test's "base sampler" and defining a local one within try/resources.

In the default configuration `pollingIntervalNanos = 3 * 10^11` so `pollingIntervalMillis / 100 > Integer.MAX_VALUE`. Switch to storing the jitter in a `long` as well.

felixscheinost requested a review from a team June 28, 2022 15:43

github-actions bot assigned anuraaga and willarmiros Jun 28, 2022

github-actions bot requested review from anuraaga and willarmiros June 28, 2022 15:43

felixscheinost force-pushed the feature/fix-aws-xray-remote-sampler branch 2 times, most recently from 27d8b19 to 6051652 Compare June 29, 2022 18:29

willarmiros reviewed Jun 29, 2022

View reviewed changes

anuraaga reviewed Jun 30, 2022

View reviewed changes

felixscheinost force-pushed the feature/fix-aws-xray-remote-sampler branch from 6051652 to fbf3c6e Compare June 30, 2022 09:29

felixscheinost commented Jun 30, 2022

View reviewed changes

felixscheinost force-pushed the feature/fix-aws-xray-remote-sampler branch from fbf3c6e to bf09a9a Compare June 30, 2022 09:32

willarmiros reviewed Jun 30, 2022

View reviewed changes

anuraaga reviewed Jul 1, 2022

View reviewed changes

felixscheinost force-pushed the feature/fix-aws-xray-remote-sampler branch from bf09a9a to c3f9ab2 Compare July 4, 2022 08:32

anuraaga reviewed Jul 4, 2022

View reviewed changes

Fix open-telemetry#376: AwsXrayRemoteSampler doesn’t poll for update

87a675d

In the default configuration `pollingIntervalNanos = 3 * 10^11` so `pollingIntervalMillis / 100 > Integer.MAX_VALUE`. Switch to storing the jitter in a `long` as well.

felixscheinost force-pushed the feature/fix-aws-xray-remote-sampler branch from c3f9ab2 to 87a675d Compare July 4, 2022 09:25

anuraaga approved these changes Jul 4, 2022

View reviewed changes

willarmiros approved these changes Jul 5, 2022

View reviewed changes

anuraaga merged commit dd4d335 into open-telemetry:main Jul 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #376: `AwsXrayRemoteSampler` doesn’t poll for update #377

Fix #376: `AwsXrayRemoteSampler` doesn’t poll for update #377

felixscheinost commented Jun 28, 2022 •

edited

Loading

linux-foundation-easycla bot commented Jun 28, 2022 •

edited

Loading

willarmiros left a comment

willarmiros Jun 29, 2022

anuraaga Jun 30, 2022

felixscheinost Jun 30, 2022

felixscheinost Jun 30, 2022

willarmiros Jun 30, 2022

trask Jun 30, 2022

felixscheinost Jul 4, 2022

felixscheinost commented Jun 30, 2022

willarmiros left a comment

willarmiros Jun 30, 2022

anuraaga commented Jul 1, 2022

anuraaga Jul 1, 2022

felixscheinost Jul 1, 2022

anuraaga Jul 1, 2022

felixscheinost Jul 4, 2022

felixscheinost commented Jul 1, 2022

anuraaga left a comment

anuraaga Jul 4, 2022

Fix #376: AwsXrayRemoteSampler doesn’t poll for update #377

Fix #376: AwsXrayRemoteSampler doesn’t poll for update #377

Conversation

felixscheinost commented Jun 28, 2022 • edited Loading

linux-foundation-easycla bot commented Jun 28, 2022 • edited Loading

willarmiros left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felixscheinost commented Jun 30, 2022

willarmiros left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anuraaga commented Jul 1, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felixscheinost commented Jul 1, 2022

anuraaga left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fix #376: `AwsXrayRemoteSampler` doesn’t poll for update #377

Fix #376: `AwsXrayRemoteSampler` doesn’t poll for update #377

felixscheinost commented Jun 28, 2022 •

edited

Loading

linux-foundation-easycla bot commented Jun 28, 2022 •

edited

Loading