Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDK] Volatile reads + MetricPoint improvements #3458

Closed
wants to merge 3 commits into from

Conversation

CodeBlanch
Copy link
Member

[Builds off of #3384]

Changes

  • Switch to Volatile.Read where we were doing Interlocked.Read or Interlocked.CompareExchange (for doubles).

  • Today there are two different locking mechanisms in play for histograms:

    Now we use lock (this.histogramBuckets.LockObject) in both places. I think as it existed it was bugged. Ran it by @noahfalk he suggested just using the lock as trying to make our own mechanism is risky.

  • Some misc optimizations like in Update(long) call into dedicated Histrogram update methods instead of entering Update(double) which needs to switch again.

@CodeBlanch CodeBlanch requested a review from a team July 19, 2022 19:52
@codecov
Copy link

codecov bot commented Jul 19, 2022

Codecov Report

Merging #3458 (92b4c9c) into main (ab10e11) will increase coverage by 0.32%.
The diff coverage is 93.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3458      +/-   ##
==========================================
+ Coverage   86.21%   86.54%   +0.32%     
==========================================
  Files         265      264       -1     
  Lines        9598     9584      -14     
==========================================
+ Hits         8275     8294      +19     
+ Misses       1323     1290      -33     
Impacted Files Coverage Δ
src/OpenTelemetry/Metrics/MetricPoint.cs 86.42% <93.33%> (+1.53%) ⬆️
...ourceInstrumentation/DiagnosticSourceSubscriber.cs 95.12% <100.00%> (ø)
src/OpenTelemetry/Metrics/HistogramBuckets.cs 100.00% <100.00%> (ø)
...ZPages/Implementation/ZPagesExporterEventSource.cs 56.25% <0.00%> (-6.25%) ⬇️
...ter.ZPages/Implementation/ZPagesActivityTracker.cs 97.14% <0.00%> (-2.86%) ⬇️
...elemetry.Api/Context/RemotingRuntimeContextSlot.cs
...metryProtocol/Implementation/ActivityExtensions.cs 95.60% <0.00%> (+4.39%) ⬆️
...emetry.Api/Internal/OpenTelemetryApiEventSource.cs 79.41% <0.00%> (+5.88%) ⬆️
...xporter.OpenTelemetryProtocol/OtlpTraceExporter.cs 77.27% <0.00%> (+18.18%) ⬆️
... and 1 more

Copy link
Member

@alanwest alanwest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes me wonder how the stress test might be adapted to have discovered this bug. I assume with enough concurrent threads some amount of metric updates would reliably be lost by the snapshot thread resetting things.


private void UpdateHistogramSumCount(double number)
{
lock (this.histogramBuckets.LockObject)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2951 showed good perf improvement when using own mechanism, as opposed to plain locks. If we modify the SnapShot path to also use the Interlock check for IsCriticalSectionOccupied, - would that not work??

cc : @utpilla

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cijothomas Possibly, but there be dragons. I'll paraphrase what @noahfalk told me.

Consider code like this...

   if (Interlocked.Exchange(ref this.histogramBuckets.IsCriticalSectionOccupied, 1) == 0)
   {
      this.runningValue.AsLong++;
      this.histogramBuckets.RunningSum += number;
      this.histogramBuckets.RunningBucketCounts[i]++;

      this.histogramBuckets.IsCriticalSectionOccupied = 0;
   }

The compiler is free/able (by spec) to rewrite that as:

   if (Interlocked.Exchange(ref this.histogramBuckets.IsCriticalSectionOccupied, 1) == 0)
   {
      this.histogramBuckets.IsCriticalSectionOccupied = 0; // Danger!

      this.runningValue.AsLong++;
      this.histogramBuckets.RunningSum += number;
      this.histogramBuckets.RunningBucketCounts[i]++;
   }

That is why it is risky to try and make our own mechanism. If we just use lock there, order is guaranteed to not change.

Copy link
Contributor

@utpilla utpilla Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use Interlocked.Exchange (or maybe Volatile.Write) instead of simply assigning that to zero to avoid that risk.

There is an example on the docs for Interlocked that shows this this approach: https://docs.microsoft.com/en-us/dotnet/api/system.threading.interlocked?view=net-6.0#examples. This example does not even use Volatile.Write. It just uses an Interlocked.Exchange to update that value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@utpilla Let's say we did it like this...

if (Interlocked.Exchange(ref this.histogramBuckets.IsCriticalSectionOccupied, 1) == 0)
{
    Interlocked.Increment(ref this.runningValue.AsLong);
    Interlocked.Add(ref this.histogramBuckets.RunningSum, number); // Not possible with double, but for discussion sake
    Interlocked.Increment(ref this.histogramBuckets.RunningBucketCounts[i]);

    Interlocked.Exchange(ref this.histogramBuckets.IsCriticalSectionOccupied, 0);
}

(Or different version with a mix of Interlocked vs Volatile.)

That would work (I think) but would also be a lot slower for the happy-path?

I thought maybe Thread.MemoryBarrier would help us. But the docs do recommend a lock over that 🤷

Let's say we did it like this...

if (Interlocked.Exchange(ref this.histogramBuckets.IsCriticalSectionOccupied, 1) == 0)
{
    unchecked
    {
        this.runningValue.AsLong++;
        this.histogramBuckets.RunningSum += number;
        this.histogramBuckets.RunningBucketCounts[i]++;
    }

    Interlocked.Exchange(ref this.histogramBuckets.IsCriticalSectionOccupied, 0);
}

I don't think that works. Because the stuff in unchecked could be cached. Needs some kind of a memory fence. But I could be wrong this stuff is confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I suggested using Interlocked, I didn't mean that we should switch every statement to use Interlocked like shown here:

if (Interlocked.Exchange(ref this.histogramBuckets.IsCriticalSectionOccupied, 1) == 0)
{
    Interlocked.Increment(ref this.runningValue.AsLong);
    Interlocked.Add(ref this.histogramBuckets.RunningSum, number); // Not possible with double, but for discussion sake
    Interlocked.Increment(ref this.histogramBuckets.RunningBucketCounts[i]);

    Interlocked.Exchange(ref this.histogramBuckets.IsCriticalSectionOccupied, 0);
}

I was indeed referring to this:

if (Interlocked.Exchange(ref this.histogramBuckets.IsCriticalSectionOccupied, 1) == 0)
{
    unchecked
    {
        this.runningValue.AsLong++;
        this.histogramBuckets.RunningSum += number;
        this.histogramBuckets.RunningBucketCounts[i]++;
    }

    Interlocked.Exchange(ref this.histogramBuckets.IsCriticalSectionOccupied, 0);
}

I don't think that works. Because the stuff in unchecked could be cached. Needs some kind of a memory fence.

How does using a lock instead help with this? ^

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@utpilla

How does using a lock instead help with this?

Good question. It is magic 😄 @noahfalk Can you provide a little detail here? My understanding is that the compiler(?) treats the whole lock body as fenced/non-reordering?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be okay even if the instructions inside the unchecked block are re-ordered.

the stuff in unchecked could be cached

But does memory barrier/fence even help with caching/freshness?

@utpilla
Copy link
Contributor

utpilla commented Jul 20, 2022

Switch to Volatile.Read where we were doing Interlocked.Read or Interlocked.CompareExchange (for doubles).

Do we have a clear understanding of what's the use case for Volatile.Read vs Interlocked.Read? Why does Volatile.Read work in this case? And why does Interlocked.Read not work in this case?

@CodeBlanch
Copy link
Member Author

Switch to Volatile.Read where we were doing Interlocked.Read or Interlocked.CompareExchange (for doubles).

Do we have a clear understanding of what's the use case for Volatile.Read vs Interlocked.Read? Why does Volatile.Read work in this case? And why does Interlocked.Read not work in this case?

@utpilla

There was some discussion on #3384 regarding this.

@utpilla
Copy link
Contributor

utpilla commented Jul 22, 2022

Apart from the lock vs Interlocked issue discussed in this PR, I wanted to ensure that we are fully aware of the implications of using Volatile methods.

Volatile Read/Write methods, offer a weaker guarantee with regards to instruction reordering. On the other hand, Interlocked Read/Exchange methods generate full fences and provide a stronger guarantee. Are we sure that we don't risk any code correctness when using Volatile methods? (in this PR and in #3349)

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added the Stale Issues and pull requests which have been flagged for closing due to inactivity label Jul 30, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2022

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Aug 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Issues and pull requests which have been flagged for closing due to inactivity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants