Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopt Polly V8 in Microsoft.Extensions.Http.Resilience #4108

Merged
merged 16 commits into from
Jun 27, 2023

Conversation

martintmk
Copy link
Contributor

@martintmk martintmk commented Jun 23, 2023

Adopting the V8 version for HTTP clients.

Changes:

  • Adopting V8 API and dropping some internals that are no longer needed
  • Reorganizing some internals that are shared into /Internal folder.
  • Http-based options based now derive from Polly V8 options
  • Cleanup, simplification of unit tests
  • Replacing the token Pipeline with Strategy to align with Polly V8
  • Making some pub-internals internal again

The tests are still WIP, I'll work on them until we have 100% code coverage. I'll run performance tests once #4100 is merged.

Unblocks #4084

API Changes:

Standard Handlers

Mostly cosmetical changes, high-level API stays the same. V8 now also supports asynchronous predicates with richer arguments.

// Old
services
    .AddHttpClient("my-client")
    .AddStandardResilienceHandler()
    .Configure(options =>
    {
        options.TotalRequestTimeoutOptions.TimeoutInterval = TimeSpan.FromSeconds(40);
        options.RetryOptions.ShouldHandleException = e => e is HttpRequestException;
        options.RetryOptions.ShouldHandleResultAsError = r => !r.IsSuccessStatusCode;
    });

// New
services
    .AddHttpClient("my-client")
    .AddStandardResilienceHandler()
    .Configure(options =>
    {
        options.TotalRequestTimeoutOptions.Timeout = TimeSpan.FromSeconds(40);
        options.RetryOptions.ShouldHandle = args => args switch
        {
            { Exception: HttpRequestException } => PredicateResult.True,
            { Result: { IsSuccessStatusCode: false } } => PredicateResult.True,
            _ => PredicateResult.False,
        };
    });

Resilience Handler

Conceptually the same, but now the strategy is configured using a callback.

// Old
services
    .AddHttpClient("my-client")
    .AddResilienceHandler("my-handler")
    .AddRetryPolicy("my-retry", options => options.RetryCount = 3)
    .AddTimeoutPolicy("my-timeout", options => options.TimeoutInterval = TimeSpan.FromSeconds(4));

// New
services
    .AddHttpClient("my-client")
    .AddResilienceHandler("my-handler", builder => builder
      .AddRetry(new HttpRetryStrategyOptions
      {
          StrategyName = "my-retry",
          RetryCount = 3,
      })
      .AddTimeout(new HttpTimeoutStrategyOptions
      {
          StrategyName = "my-timeout",
          Timeout = TimeSpan.FromSeconds(4),
      }));

Benchmarks:

Before:

Method Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
DefaultClient 236.4 ns 2.24 ns 3.28 ns 1.00 0.00 0.0191 488 B 1.00
StandardResilienceHandler 4,101.4 ns 21.96 ns 32.19 ns 17.35 0.22 0.1602 4152 B 8.51
StandardHedgingHandler 8,060.2 ns 58.09 ns 85.15 ns 34.10 0.35 0.2289 5880 B 12.05

After:

Method Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
DefaultClient 237.4 ns 1.56 ns 2.69 ns 1.00 0.00 0.0191 488 B 1.00
StandardResilienceHandler 2,774.0 ns 44.61 ns 74.53 ns 11.69 0.32 0.0229 648 B 1.33
StandardHedgingHandler 7,132.6 ns 114.23 ns 197.03 ns 30.05 1.02 0.0458 1336 B 2.74
Microsoft Reviewers: Open in CodeFlow

@ghost ghost assigned martintmk Jun 23, 2023
@martintmk martintmk requested a review from geeknoid June 23, 2023 11:56
@martintmk
Copy link
Contributor Author

cc @martincostello


namespace Microsoft.Extensions.Http.Resilience.Internal.Validators;

[OptionsValidator]
internal sealed partial class HttpStandardResilienceOptionsValidator : IValidateOptions<HttpStandardResilienceOptions>
internal sealed class HttpStandardResilienceOptionsValidator : IValidateOptions<HttpStandardResilienceOptions>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geeknoid I had to disable this validator because it materializes even the internal validation attributes we have in Polly and the PR won't compile.

I believe the generator should just skip those.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the option type in this package derive from the option type in the Polly package?

@tarekgh This looks like a case we hadn't seen before and since you know own the option validation generator, well, here you go :-) I don't know if there's a way to compose this together such that the derived option type gets full validation even for the properties inherited from the base class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the option type in this package derive from the option type in the Polly package?

Yes, this is the internal attribute we are using:

https://github.com/App-vNext/Polly/blob/a9bf3e95a37ff9c98ca02c8ca395f91645ffe5d8/src/Polly.Core/Timeout/TimeoutStrategyOptions.cs#L20

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is going to work without you publishing something more in the polly code. Otherwise, there is no way for the validation code generated in this assembly to validate the state in the base class.

If you make TimeoutAttribute public, then this will just work as the generated code will have access to the atribute definition.

In order to keep the attribute hidden, then you'd need to make public instead the generated options validation type from the Polly library so that it could be called from this assembly.

Perhaps there would be a way for Polly to register an option validator for its part of the state, and then this assembly only validates its part of the state?

In any case, @tarekgh, we should fix the option generator to ignore attributes that aren't reachable from the generated code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, it would be nice to get rid of it and use built-in attributes.

I need to validate that timespan is in range of -1 milliseconds to max. Also, zero or default value must be excluded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide isolated reproduce for this problem so I can look at it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here a repro: 1bdd925

@martintmk I think your best bet for now is to make the attribute public in Polly and then it'll work correctly.

Copy link
Contributor Author

@martintmk martintmk Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geeknoid Thanks Martin for the repro!

I have actually took a second look at our custom attributes and realized we can actually drop them:
App-vNext/Polly#1351

Still, the generator should properly handle the internal attributes.

@martintmk martintmk force-pushed the mtomka/http-resilience branch from 2fec0a9 to 5212ce0 Compare June 26, 2023 08:24
@martintmk martintmk force-pushed the mtomka/http-resilience branch from 5caab39 to aaabea1 Compare June 26, 2023 09:24
@martintmk martintmk marked this pull request as ready for review June 26, 2023 11:27
@martintmk
Copy link
Contributor Author

martintmk commented Jun 26, 2023

@RussKie

The code coverage reports is really strange. I am pretty sure this call is covered (many tests that the AddResilienceHandler API).

image

https://dev.azure.com/dnceng-public/public/_build/results?buildId=320898&view=codecoverage-tab

Any idea on how to fix this?

edit: also the threshold check is strange. The code coverage reports 99.8% coverage while the check fails complaining the coverage is <99%.

using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Telemetry.Logging;

namespace Microsoft.Extensions.Http.Resilience.FaultInjection.Internal;

[ExcludeFromCodeCoverage]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't have this, the logging code should be tested too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that mean adding exhaustive test coverage for everything the generated logging code does, which is effectively external and an implementation detail?

Seems wasteful to me to 100% cover someone else's code just because it's source-generated as long as it's being exercised by a test. I don't think you'd need it if the requirement wasn't 100% coverage as you could just tweak the threshold, but if it's all or nothing then this seems to be the sensible alternative to exhaustively re-testing the generated code to do logging.

By the same logic I exclude Request Delegate Generator code, otherwise you get fun behaviour like this: dotnet/aspnetcore#48376. I certainly wouldn't spend my time testing ASP.NET Code's error handling scenarios for my user-supplied delegate chasing a coverage metric.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Martin says, we had this problem in Polly too. We had to artificially add test cases for code that was auto-generated.

https://github.com/App-vNext/Polly/blob/faaf2ee95b9b92386f8e091277604240f9ab05c0/test/Polly.Extensions.Tests/Telemetry/ResilienceTelemetryDiagnosticSourceTests.cs#L173

Not a big fan of that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martincostello That's not what this implies.

This is about ensuring the telemetry produced by this component matches expectations. Within the tests, you use a FakeLogger that captures the telemetry and the ensure that the right thing is coming out of the logging system at the time you expect it to.

As a general rule, the only things that we consider acceptable to exclude from code coverage involve some legacy design patterns that preclude it and some racy code where its impossible to ensure all paths are visited in a given test run. Both are very rare things.

Copy link
Member

@martincostello martincostello Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But isn't that custom telemetry logging code called through the compiler generated code? There's no "real" code here in this class, it's just a partial method stub:

[LogMethod(0, LogLevel.Information,
"Fault-injection group name: {groupName}. " +
"Fault type: {faultType}. " +
"Injected value: {injectedValue}. " +
"Http content key: {httpContentKey}. ")]
public static partial void LogInjection(
ILogger logger,
string groupName,
string faultType,
string injectedValue,
string httpContentKey);

It not being covered doesn't preclude the assertions in the fake logger.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless of course [LogMethod] doesn't work the same as [LoggerMessage] which I initially thought this was.

Looking at some code of my own I have this:

[ExcludeFromCodeCoverage]
private static partial class Log
{
    [LoggerMessage(
       EventId = 1,
       Level = LogLevel.Information,
       Message = "Deployments have been frozen by {UserName}.")]
    public static partial void DeploymentsFrozen(ILogger logger, string? userName);
}

Which gets turned into this:

partial class Log
{
    [global::System.CodeDom.Compiler.GeneratedCodeAttribute("Microsoft.Extensions.Logging.Generators", "8.0.8.28008")]
    private static readonly global::System.Action<global::Microsoft.Extensions.Logging.ILogger, global::System.String?, global::System.Exception?> __DeploymentsFrozenCallback =
        global::Microsoft.Extensions.Logging.LoggerMessage.Define<global::System.String?>(global::Microsoft.Extensions.Logging.LogLevel.Information, new global::Microsoft.Extensions.Logging.EventId(1, nameof(DeploymentsFrozen)), "Deployments have been frozen by {UserName}.", new global::Microsoft.Extensions.Logging.LogDefineOptions() { SkipEnabledCheck = true }); 

    [global::System.CodeDom.Compiler.GeneratedCodeAttribute("Microsoft.Extensions.Logging.Generators", "8.0.8.28008")]
    public static partial void DeploymentsFrozen(global::Microsoft.Extensions.Logging.ILogger logger, global::System.String? userName)
    {
        if (logger.IsEnabled(global::Microsoft.Extensions.Logging.LogLevel.Information))
        {
            __DeploymentsFrozenCallback(logger, userName, null);
        }
    }
}

In that case it's not useful to add tests to test the coverage of the branching for IsEnabled() on all the logging we have, hence the coverage suppression in my case.

If it's a different use case here because that just looks similar to the above then disregard my comment 😃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Polly we are correctly checking that message is logged:

https://github.com/App-vNext/Polly/blob/faaf2ee95b9b92386f8e091277604240f9ab05c0/test/Polly.Extensions.Tests/Telemetry/ResilienceTelemetryDiagnosticSourceTests.cs#L155

But for some strange reason, it want's us to cover the code-generated code too. For example, the auto-generated struct that backs the entry with all it's properties and methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a bug #4124. In the meantime, I had to manually decrease code-coverage to 98. Once that bug is fixed, we can put it back to 100.

Copy link
Member

@geeknoid geeknoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice. I still have ~20 files to go...

Copy link
Member

@geeknoid geeknoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could RequestRoutingStrategyFactory.ReturnRoutingStrategy be removed and just replaced with the TryReset function to implement the IResettable interface?

And related, the name CreateRoutingStrategy is misleading, that should be GetRoutingStrategy.

@martintmk
Copy link
Contributor Author

Could RequestRoutingStrategyFactory.ReturnRoutingStrategy be removed and just replaced with the TryReset function to implement the IResettable interface?

And related, the name CreateRoutingStrategy is misleading, that should be GetRoutingStrategy.

Is it OK to do it in follow-up? I want to turn it to internal abstract class too.

@geeknoid
Copy link
Member

Could RequestRoutingStrategyFactory.ReturnRoutingStrategy be removed and just replaced with the TryReset function to implement the IResettable interface?
And related, the name CreateRoutingStrategy is misleading, that should be GetRoutingStrategy.

Is it OK to do it in follow-up? I want to turn it to internal abstract class too.

Of course...

@RussKie
Copy link
Member

RussKie commented Jun 26, 2023

@RussKie Igor Velikorossov FTE

The code coverage reports is really strange. I am pretty sure this call is covered (many tests that the AddResilienceHandler API).

image

dev.azure.com/dnceng-public/public/_build/results?buildId=320898&view=codecoverage-tab

Any idea on how to fix this?

edit: also the threshold check is strange. The code coverage reports 99.8% coverage while the check fails complaining the coverage is <99%.

@jakubch1 are you able to help here?

@jakubch1
Copy link
Member

@RussKie Igor Velikorossov FTE
The code coverage reports is really strange. I am pretty sure this call is covered (many tests that the AddResilienceHandler API).
image
dev.azure.com/dnceng-public/public/_build/results?buildId=320898&view=codecoverage-tab
Any idea on how to fix this?
edit: also the threshold check is strange. The code coverage reports 99.8% coverage while the check fails complaining the coverage is <99%.

@jakubch1 are you able to help here?

Issue here can be that Azure DevOps are using old version of Report Generator inside Publish Code Coverage task. I recommend upgrade to V2 version of the task: https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/reference/publish-code-coverage-results-v2?view=azure-pipelines

@martintmk
Copy link
Contributor Author

@RussKie Igor Velikorossov FTE
The code coverage reports is really strange. I am pretty sure this call is covered (many tests that the AddResilienceHandler API).
image
dev.azure.com/dnceng-public/public/_build/results?buildId=320898&view=codecoverage-tab
Any idea on how to fix this?
edit: also the threshold check is strange. The code coverage reports 99.8% coverage while the check fails complaining the coverage is <99%.

@jakubch1 are you able to help here?

Issue here can be that Azure DevOps are using old version of Report Generator inside Publish Code Coverage task. I recommend upgrade to V2 version of the task: https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/reference/publish-code-coverage-results-v2?view=azure-pipelines

Created an issue #4124.

RussKie added a commit to RussKie/dotnet-extensions-experimental that referenced this pull request Jun 27, 2023
@martintmk martintmk merged commit 81af726 into main Jun 27, 2023
@martintmk martintmk deleted the mtomka/http-resilience branch June 27, 2023 13:13
@ghost ghost added this to the 8.0 Preview7 milestone Jun 27, 2023
RussKie added a commit to RussKie/extensions that referenced this pull request Jun 28, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jul 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants