Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Introduce logging sampling #5123

Open
evgenyfedorov2 opened this issue Apr 23, 2024 · 19 comments
Open

[API Proposal]: Introduce logging sampling #5123

evgenyfedorov2 opened this issue Apr 23, 2024 · 19 comments
Assignees
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation work in progress 🚧

Comments

@evgenyfedorov2
Copy link
Contributor

Background and motivation

As a follow-up of this comment and this issue, I am proposing a more generalized solution below.

Note: a new optional State with log record timestamps which mentioned in the Risks section is not part of the proposal. It will be added later as soon as we agree on everything else included in this API proposal.

Background info provided by @geeknoid:

We propose augmenting the .NET logging feature set to expose mechanisms designed specifically to enable application developers to intelligently control the amount of log data pushed by their application to telemetry backends. By eliding redundant log state early in the log pipeline, application developers can dramatically reduce their telemetry COGs.

Log Record Matching

The mechanisms described below depend on the idea of log record matching. As the application produces log records, log record matching is used to determine the specific treatment to apply to each record.
Through configuration, the application specifies an ordered list of matcher functions which are invoked in sequence to determine whether the log record is a match or not. Applications can have custom matchers with whatever sophistication they want, and the platform includes a simple matcher to address the 99% use case. The simple matcher uses configuration state of the form:

class LogRecordPattern
{
    string? Category { get; set; }
    EventId? EventId { get; set; }
    LogLevel? LogLevel { get; set; }
    KeyValuePair<string, string>[]? Tags { get; set; }   
}

These properties are all optional, they are evaluated in order and have AND semantics, so all defined properties must match. When a match occurs, the corresponding configuration state describes what to do with the log record.
Whereas category, event id, and log level are used to match against specific types of log records produced by the application, the tags are used to match against runtime constraints. For example, using tags you can match log records produced from a particular region or cluster, or records that mention a specific artifact or path, etc. For the initial go at this, we propose to do strict equality matching. In other words, for a record to match, it would need to contain all the listed tags with the specific values. Thus, no support for wildcards, regexes, Boolean combinations, etc. This is not intended to be a SQL query, it’s meant to be relatively fast and simple.

Global Controls

These controls apply at the level of the whole application, independent of the context in which logging takes place.

Filtering

Matching records can be filtered. Filtering is implemented via a simple callback model that gives a yes/no response to discard or keep a record. Customers can use a built-in statical sampler to implement this filter or can roll their own.
The statistical sampler we provide works at the application level. It will disable or enable logging of matching records globally for the life of the application based on a random number chosen at startup. This allows a population of application instances to emit the matching records, while the rest of the application instances won’t emit those records.

Buffering

Matching records can be buffered in a named circular buffer. The size and number of circular buffers is determined through configuration.
Buffered records are normally not emitted by the application and just get discarded. To emit buffered records, the application makes an explicit API call, supplying the name of the buffer to flush. Through configuration, the application can control whether to temporarily suspend buffering after a flush has occurred. This makes it possible for an application to obtain log records N seconds before and N seconds after an incident.

Request-Oriented Controls

These controls are aware of the notion of ASP.NET requests and provide the ability to moderate log output within the scope of individual requests.

Filtering

Matching records are filtered or not within the scope of a single request. Like in the global case above, this filtering can use custom application logic or a simple statistical sampler to control which request’s logging records will be filtered out.
In addition to statistical sampling, or perhaps in combination with it, we can factor in hints delivered through distributed tracing to determine whether to filter out matching records.

Buffering

This is like the global buffering described above, except that the buffers being used are not circular and they are private to an individual request’s lifetime. The buffer expands to capture all matching log records for the duration of a request. The buffer’s contents are discarded at the end of a request’s lifetime, unless the application has signaled explicitly that it should be flushed instead.

How It Works

Here’s the flow of things:
• Configuration contains an ordered list of log record match conditions.
• When a record arrives, it is matched against the conditions until a match is found or the end of the list is reached.
• If a match is not found, then the log record is dispatched normally.
• If a match is found, then the configuration state associated with the match is used to determine how to handle the record. One of four things can happen:
o Global Filtering. The configuration state holds a list of filters to invoke, including statistical samplers. If any of the filters returns false, then the log record is discarded and processing ends.
o Global Buffering. The configuration state holds the name of the global buffer the record should be directed to. The log record is serialized and inserted into the named circular buffer and processing ends.
o Request-Level Filtering. The configuration state holds a list of request-oriented filters to invoke, including statistical samplers. If any of the filters returns false, then the log record is discarded and processing ends.
o Request-Level Buffering. The log record is serialized and inserted in the current request’s buffer.

API Proposal

/// <summary>
/// A pattern to match log records against.
/// </summary>
public class LogRecordPattern
{
    /// <summary>
    /// Gets or sets log record category.
    /// </summary>
    public string? Category { get; set; }

    /// <summary>
    /// Gets or sets log record event ID.
    /// </summary>
    public EventId? EventId { get; set; }

    /// <summary>
    /// Gets or sets log record log level.
    /// </summary>
    public LogLevel? LogLevel { get; set; }

    /// <summary>
    /// Gets or sets log records state tags.
    /// </summary>
    public KeyValuePair<string, string>[]? Tags { get; set; }
}

/// <summary>
/// Enumerates actions one of which can be executed on a matching log record.
/// </summary>
public enum ControlAction
{
    /// <summary>
    /// Filter log records globally.
    /// </summary>
    GlobalFilter,

    /// <summary>
    /// Buffer log records globally.
    /// </summary>
    GlobalBuffer,

    /// <summary>
    /// Filter log records withing an HTTP request flow.
    /// </summary>
    RequestFilter,

    /// <summary>
    /// Buffer log records for the duration of an HTTP requst flow.
    /// </summary>
    RequestBuffer
}

/// <summary>
/// Lets you register logging samplers in a dependency injection container.
/// </summary>
public static class LoggingSamplingExtensions
{
    /// <summary>
    /// Enable logging sampling.
    /// </summary>
    /// <param name="builder">An instance of <see cref="ILoggingBuilder"/> to enable sampling in.</param>
    /// <param name="configure">A delegate to fine-tune the sampling.</param>
    /// <returns>The value of <paramref name="builder"/>.</returns>
    public static ILoggingBuilder EnableSampling(this ILoggingBuilder builder, Action<ILoggingSamplingBuilder> configure)
    {
...
    }

    /// <summary>
    /// Set the built-in simple sampler.
    /// </summary>
    /// <param name="builder">An instance of <see cref="ILoggingSamplingBuilder"/> to set the simple sampler in.</param>
    /// <param name="configure">A delegate to fine-tune the sampling.</param>
    /// <returns>The value of <paramref name="builder"/>.</returns>
    public static ILoggingSamplingBuilder SetSimpleSampler(this ILoggingSamplingBuilder builder, Action<LogSamplingOptions> configure)
    {
...
    }

    /// <summary>
    /// Set a logging sampler.
    /// </summary>
    /// <typeparam name="T">A sampler type</typeparam>
    /// <param name="builder">An instance of <see cref="ILoggingSamplingBuilder"/> to set the logging sampler in.</param>
    /// <param name="configure">A delegate to fine-tune the sampling.</param>
    /// <returns>The value of <paramref name="builder"/>.</returns>
    public static ILoggingSamplingBuilder SetSampler<T>(this ILoggingSamplingBuilder builder, Action<LogSamplingOptions> configure)
        where T : class, ILoggingSampler
    {
...
    }
}

/// <summary>
/// Options to configure log sampling.
/// </summary>
public class LogSamplingOptions
{
    /// <summary>
    /// Gets or sets a list of log pattern matchers.
    /// </summary>
    public List<Matcher> Matchers { get; set; }

    /// <summary>
    /// Gets or sets a list of log buffers.
    /// </summary>
    public ISet<LogBuffer> Buffers { get; set; }
}

/// <summary>
/// Represents a component that samples log records.
/// </summary>
public interface ILoggingSampler
{
    /// <summary>
    /// Sample a log record if it matches the <paramref name="logRecordPattern"/>.
    /// </summary>
    /// <param name="logRecordPattern">A log record pattern to match against.</param>
    /// <returns>True, if the log record was sampled. False otherwise.</returns>
    public bool Sample(LogRecordPattern logRecordPattern);
}

/// <summary>
/// An interface for configuring logging sampling.
/// </summary>
public interface ILoggingSamplingBuilder
{
    /// <summary>
    /// Gets the <see cref="IServiceCollection"/> where logging sampling services are configured.
    /// </summary>
    public IServiceCollection Services { get; }
}

/// <summary>
/// Represents a circular log buffer configuration.
/// </summary>
public class LogBuffer
{
    /// <summary>
    /// Gets or sets log buffer name.
    /// </summary>
    public string Name { get; set; }

    /// <summary>
    /// Gets or sets duration to suspend buffering after the flush operation occurred.
    /// </summary>
    public TimeSpan? SuspendAfterFlushDuration { get; set; }

    /// <summary>
    /// Gets or sets a circular buffer duration.
    /// </summary>
    public TimeSpan? BufferingDuration { get; set; }

    /// <summary>
    /// Gets or sets buffer size.
    /// </summary>
    public long? BufferSize { get; set; }
}

// <summary>
/// A log pattern matcher.
/// </summary>
public class Matcher
{
    /// <summary>
    /// Gets a filtering delegate.
    /// </summary>
    public Func<LogRecordPattern, bool>? Filter { get; }

    /// <summary>
    /// Gets a buffering delegate.
    /// </summary>
    public Action<BufferingTool, LogRecordPattern>? Buffer { get; }

    /// <summary>
    /// Gets a control action to perform in case there is a match.
    /// </summary>
    public ControlAction ControlAction { get; }

    /// <summary>
    /// Initializes a new instance of the <see cref="Matcher"/> class.
    /// </summary>
    /// <param name="pattern">A log record pattern to match.</param>
    /// <param name="filter">A global filtering delegate.</param>
    public Matcher(LogRecordPattern pattern, Func<LogRecordPattern, bool> filter)
    {
        ControlAction = ControlAction.GlobalFilter;
        ...
    }

    /// <summary>
    /// Initializes a new instance of the <see cref="Matcher"/> class.
    /// </summary>
    /// <param name="pattern">A log record pattern to match.</param>
    /// <param name="buffer">A global buffering delegate.</param>
    public Matcher(LogRecordPattern pattern, Action<BufferingTool, LogRecordPattern> buffer)
    {
        ControlAction = ControlAction.GlobalBuffer;
        ...
    }

    /// <summary>
    /// Matches the log record pattern against the supplied <paramref name="pattern"/>.
    /// </summary>
    /// <param name="pattern">A log record pattern to match against.</param>
    /// <returns>True if there is a match. False otherwise.</returns>
    public bool Match(LogRecordPattern pattern)
    {
...
    }
}

API Usage

// implementation example in LoggerConfig.cs:
internal sealed class LoggerConfig
{
    public LoggerConfig(..., ILoggingSampler[] samplers)
    {
        Samplers = samplers;
        ...
    }

    public ILoggingSampler[] Samplers { get; }
}

// implementation example in ExtendedLogger.cs:
    public void Log<TState>(LogLevel logLevel, EventId eventId, TState state, Exception? exception, Func<TState, Exception?, string> formatter)
    {
...
        var pattern = new LogRecordPattern
        {
            Category = // get category,
            Tags = state,
            EventId = eventId,
            LogLevel = logLevel,
        };

        foreach (var sampler in _factory.Config.Samplers)
        {
            if (sampler.Sample(pattern))
            {
                return; // the record was sampled, hence we don't log it.
            }
        }
    }

// an example of the built-in simple sampler implementation:
internal class SimpleSampler : ILoggingSampler
{
    private readonly List<Matcher> _matchers;
    private readonly BufferingTool _bufferingTool;

    public SimpleSampler(IOptions<LogSamplingOptions> options, BufferingTool bufferingTool)
    {
        _matchers = options.Value.Matchers;
        _bufferingTool = bufferingTool;
    }

    public bool Sample(LogRecordPattern logRecordPattern)
    {
        foreach (var matcher in _matchers)
        {
            if (matcher.Match(logRecordPattern))
            {
                switch (matcher.ControlAction)
                {
                    case ControlAction.GlobalFilter:
                        if (!matcher.Filter(logRecordPattern))
                        {
                            return true;
                        }

                        break;
                    case ControlAction.GlobalBuffer:
                        matcher.Buffer(_bufferingTool, logRecordPattern);
                        return true;
                    case ControlAction.RequestFilter:
                        break;
                    case ControlAction.RequestBuffer:
                        break;
                }
            }
        }

        return false;
    }
}

// usage example:
LoggerFactory.Create(builder =>
{
    _ = builder.EnableSampling(samplingBuilder => samplingBuilder

            // add the built-in simple sampler:
            .SetSimpleSampler(o =>
            {
                o.Matchers = new List<Matcher>
                {
                    new Matcher(
                        new LogRecordPattern
                        {
                            LogLevel = LogLevel.Information,
                            Category = "Microsoft.Extensions.Hosting",
                        },

                        // drop 99% of logs of Hosting category:
                        (pattern) => Random.Shared.NextDouble() < 0.01),
                    new Matcher(
                        new LogRecordPattern
                        {
                            LogLevel = LogLevel.Error,
                        },

                       // buffer all error logs:
                        (tool, pattern) => tool.Buffer("MyBuffer")),
                };
                o.Buffers = new HashSet<LogBuffer>
                {
                    new LogBuffer
                    {
                        Name = "MyBuffer",
                        SuspendAfterFlushDuration = TimeSpan.FromSeconds(10),
                        BufferingDuration = TimeSpan.FromSeconds(10),
                        BufferSize = 1_000_000,
                    },
                };
            }));
});

Alternative Designs

Previous proposal is here

Risks

Provided by @geeknoid:

Buffering Challenges

The existing logging infrastructure was not designed to support buffering. In particular, if log records are buffered early in the infrastructure, the records will not have been timestamped, and would get timestamped only when they were emitted from the process. This is not acceptable.
To support the model, we will augment the logging infrastructure to pass new optional state down with every log record flowing through the system, which will include the timestamp at which the record was captured. Log processing engines, like OpenTelemetry, will need to be retrofitted to recognize this state and use the supplied timestamp instead of capturing a timestamp themselves.
If a log provider hasn’t been upgraded to recognize the new optional state, the log records flowing through that provider will be timestamped with an incorrect value. Developers will be made aware to watch for this issue when first enabling buffering and to upgrade their log provider if needed.
A second challenge with buffering is that the TState value supplied by the application to the logging infrastructure is expected to be fully consumed before the Log method returns to the application. If we simply held on to the TState instances during buffering, we could easily end up with invalid state by the time the buffer is flushed. To avoid this, we need to serialize the TState to a neutral form before returning to the application.

Performance Considerations

The performance profile of an application will be impacted by filtering and buffering behavior. In particular, buffering can lead to potentially substantial spikes in log output when problems occur.
This trade-off is unavoidable, application developers need to be made aware so that they can make informed choices. We need to make it easy to turn off all these mechanisms to enable application developers to experience/measure the worst-case scenario.
It’s interesting to consider the effects of global vs request-oriented controls:
• With global controls, an individual application process either outputs matching log records for its life time or it doesn’t. So some processes will have a higher computational burden then others. We could consider reevaluating whether a process should output those log records on a regular basis (say every hour), but it’s not clear whether this flexibility is needed/desirable.
• With request-level controls, the performance of individual requests processed by every application process varies. Some requests have more overhead than others.

@evgenyfedorov2 evgenyfedorov2 added untriaged api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Apr 23, 2024
@geeknoid
Copy link
Member

@noahfalk This is as we discussed a little while ago.

@noahfalk
Copy link
Member

Cool! I've got this flagged to come and take a closer look but it might be a little bit.

@noahfalk
Copy link
Member

Thanks for putting this together!

A couple thoughts here:

  • For the timestamping issue, this sounds like something we'd want to firm up prior to doing major work. I think my main concern is that if we expect the commonly used ILoggerProviders (say ConsoleProvider and OTel's provider at least) to test the TState for some pre-existing timestamp information, that is going to impose a performance overhead on every log message being processed, regardless whether the application is using the features proposed here. If we are asking all .NET logging users to take a perf regression so that a small fraction of those users can use new features we should do due diligence to ensure that perf overhead is very small. If the mechanism to recognize log messages that are pre-timestamped requires any new shared APIs in M.E.L.A then we should also ensure we are going to be able to ship that in .NET 9.

    Specific things to firm up:

    1. How do we expect ILoggerProviders to recognize and treat buffered data?
    2. What is the performance overhead for checking a TState to determine it isn't buffered?
    3. What APIs (if any) need to be added to M.E.L.A to make this work?
  • How do we expect the sampling API interacts with the filtering API?
    Is the idea that filters happen first, then sampling happens on what remains?
    If I want some data for some new category to be buffered, do I have to both add a filter rule to include it and a buffer rule to get it buffered, or is it sufficient to just add the buffering rule?

  • From an API design perspective, filtering and sampling seem related so could we make their usage appear more unified? For example if I wanted to include only Microsoft.Extensions.Hosting informational and all errors I could write this:

builder // ILoggingBuilder
        .AddFilter("Microsoft.Extensions.Hosting", LogLevel.Informational)
        .AddFilter(null, LogLevel.Error);

If I wanted to select that same set of logs to be buffered and sampled could I do something like this?

builder // ILoggingBuilder
        .AddBuffer("MyBuffer", durationSecs:10, size: 1_000_000)
        .AddBufferedFilter("MyBuffer", "Microsoft.Extensions.Hosting", LogLevel.Informational,
             samplingRate: 0.01)
        .AddBufferedFilter("MyBuffer", null, LogLevel.Error);               

I'm not tied to that approach at all - its just an off-the-cuff example of what I am guessing would be possible. My broader hopes from the API design perspective:

  1. Can we make the buffering and sampling appear more unified with the filtering that is already there? Or if you think making them appear unified will create confusion lets discuss why.
  2. Can we make scenarios we expect to be common more compact? Reducing the number of lines of code, types, and method calls users need to interact with should make it easier to understand and author. This shouldn't preclude having verbose APIs available to handle more complex cases.

There is probably more fine-grained API feedback but I think we'd want to look at the high level stuff before the smaller stuff.

@tarekgh @samsp-msft - you guys also probably want to look at this.

@tarekgh
Copy link
Member

tarekgh commented May 1, 2024

I second @noahfalk for all points he raised. One more thing, why we are combining the filtering and buffering together. Buffering maybe used in more scenarios than just buffering. And should we follow the pattern we used in redaction? I means something like:

    builder.EnableRedaction(...)
               .EnableFilttering(....) // EnableSampler?
               .EnableBuffering(....)

And create a global filter class which can easily registered to support global filtering. For request filtering, can have local filter and can be a setting to enable in aspnet so anyone handle http requests? this can simplify the APIs I guess and remove ControlAction. just thoughts!

@samsp-msft
Copy link
Member

I have a number of thoughts, but I think this also needs to be fit into the larger picture with OpenTelemetry.

Filtering based on EventID

I have heard the need from customers that the granularity of the logging levels is not sufficient, and that they would like to be able to collect specific log messages from a lower log level (debug ==1, Critical ==5), or filter out higher level messages as they occur too frequently.

This should happen regardless of whether the log message is associated with a trace, and the frequency of its usage. Ideally is handled through configuration along with the rest of the log levels, so it provides an override of the more granular defaults.

Head-based sampling of per-request telemetry.

If your service is handing a large number of requests, collecting tracing and log messages from all of them use up a lot of data. Rather than sampling each source of data indepedently, you want to sample them using the same algorithm based on the traceID. Each source can be given a percentage for what to keep. If the same hash algorithm is used to decide what to keep, then requests that fall under the lowest common denominator will have all their records, others will be filtered out as applicable.

Ideally the same processing can happen to log messages with a traceid as the distributed tracing so both the trace and log messages will be emitted together. Because this is a head-based sampling algorithm, it doesn't depend on buffering and waiting for completion to be useful.

Rate limiting

The classic Application Insights SDK has a sampling algorithm that supports rate based limiting - so you can cap the tracing at 5 requests/sec (as an example). I believe its a semi-adaptive algorithm so its not just the first 5 requests that will be emitted, but it uses an average over a timespan to determine what to retain.

I think we need the same algorithm to be implemented as a processor for OpenTelemetry so its not dependent on being used with a specific exporter. When Aspire Apps are deployed to ACA, this mechanism could be used to ensure that the dashboard is not overloaded with results.
The great thing about this kind of algorithm is that its unlikely to filter out records when doing local debugging (unless load testing) so you are not wondering where your telemetry went.

  1. Hard limits
    The adaptive algorithm can be augmented with additional safeguards to ensure that you don't get too much data or too little data. For example, you could have overrides of a min sampling percentage, so you get 0.1% of requests regardless of the rate limit.

It could also have max limits to reduce the chance of coding/configuration errors causing high bills. For example have a max-log-messages-per-traceid, max-spans-per-traceid and max-events-per-span which would be calculated and limit the data per request, per process. That should be doable on a per-process basis without needing to do tail sampling as long as there is an event for when the request is completed, and therefore the count stats can be removed.

In the case that the limits are reached, ideally an additional log message is emitted that details that messages have been suppressed, and what their counts were. This can then be used to understand the patterns that have caused the suppression to occur and the extent of the problem.

Log messages not associated with a trace

Not all log messages are going to be associated with a request, and therefore have the context of a traceid. These should have the same concepts of fixed rate and time-based sampling applied to them. The values for those rates should be independent from those for trace-based sampling.

Processing order

  • I suspect the EventID based filtering should be applied first, and be performed regardless of the other options. If a message is elevated, or suppressed, it should then go through the rest of the algorithm to determine what happens.

  • Ideally the decision for whether a trace is sampled is made at the time the traceid is retrieved from the request headers, and the Recorded bit is set. There should be an option as to whether to:

    • always honor the Recorded bit from the incoming request
    • ignore the incoming Recorded bit and use the algorithm to set the value
    • only use the algorithm if the incoming request doesn't have the recorded bit

    This should then set the Recorded bit on the Activity so that code can use that bit to determine whether to write telemetry or not.

    • Ideally the IsEnabled() method for ILogger should use the Activity Recorded bit to determine if the message should be written. If no activity is present, it short-circuits that check

    • In the output processing for OpenTelemetry an additional filter is applied - that will remove Traces and Logs that are trace specific, so that the above is an optimization, but bad logging code may have wasted CPU, but will not affect the output.

Tail-based sampling

All of the above is applied on a per-process basis and is decided up front without knowlege of the final success/failure of the request. In most high performance systems, this is the most cost effective way to make decisions.
An alternative would be to use an out of process, tail based sampler typically implemented in an agent such as OTel Collector or Strato. As the implementation of that is external to the process, it's not specific to .NET and so not part of this feature set.

@tarekgh
Copy link
Member

tarekgh commented May 9, 2024

I'm just checking in to see if we're moving forward with this. The reason I ask is because we have the tracking issue dotnet/runtime#82465, and we need to determine whether we should close it if we're going ahead with the proposal mentioned here. Naturally, we can collaborate to decide what needs to be incorporated into the runtime, if anything.

@evgenyfedorov2
Copy link
Contributor Author

Yes, we will be moving forward. I've been busy with other things recently, and I still am, which is why there's been a delay.

@geeknoid
Copy link
Member

I think LogSamplngOptions.Matchers should be an IList instead of a List.

@evgenyfedorov2 evgenyfedorov2 self-assigned this May 28, 2024
@evgenyfedorov2
Copy link
Contributor Author

evgenyfedorov2 commented Jun 6, 2024

For the timestamping issue,

@noahfalk, my idea here is to add DateTime.UtcNow timestamp immediately to Tags (TState) for each buffered log record as a new key-value pair with the Timestamp key. Then log exporters will need to be able to recognize it. Not sure about other exporters, but we will definitely need to update Open Telemetry exporters to support that. E.g. when exporting, instead of setting timestamp, they will need to check the Timestamp tag first.

How do we expect the sampling API interacts with the filtering API?

Please see the update code snippets below. Basically, filtering is about throwing logs away forever, buffering - putting them aside temporarily. I think those should be separate as @tarekgh suggested.

Can we make the buffering and sampling appear more unified with the filtering that is already there? Or if you think making them appear unified will create confusion lets discuss why.

Yes, I moved the buffering to separate extension methods. So now it can be used together with the sampling, but not limited to it.

Can we make scenarios we expect to be common more compact?

I updated the log snippets and tried to make the API more compact, but still, it is not ideal. At the same time, I feel my current proposal aligns with patterns we have in the repository. We surely want to take another look at it at the API review session.

One more thing, why we are combining the filtering and buffering together

Good idea, thank you! I have updated the code snippets, please see below.

/// <summary>
/// Lets you register log samplers in a dependency injection container.
/// </summary>
public static class LogSamplingExtensions
{
    /// <summary>
    /// Enable log sampling.
    /// </summary>
    /// <param name="builder">An instance of <see cref="ILoggingBuilder"/> to enable sampling in.</param>
    /// <param name="configure">A delegate to fine-tune the sampling.</param>
    /// <returns>The value of <paramref name="builder"/>.</returns>
    public static ILoggingBuilder EnableSampling(this ILoggingBuilder builder, Action<ILogSamplingBuilder> configure)
    {
...
    }

    /// <summary>
    /// Add the built-in simple sampling filter.
    /// </summary>
    /// <param name="builder">An instance of <see cref="ILogSamplingBuilder"/> to set the simple sampling in.</param>
    /// <param name="configure">A delegate to fine-tune the sampling.</param>
    /// <returns>The value of <paramref name="builder"/>.</returns>
    public static ILogSamplingBuilder AddSimpleSamplingFilter(this ILogSamplingBuilder builder, Action<SamplingFilterOptions> configure)
    {
...
    }

    /// <summary>
    /// Add a log sampler.
    /// </summary>
    /// <typeparam name="T">A sampler type.</typeparam>
    /// <param name="builder">An instance of <see cref="ILogSamplingBuilder"/> to set the log sampler in.</param>
    /// <param name="configure">A delegate to fine-tune the sampling.</param>
    /// <returns>The value of <paramref name="builder"/>.</returns>
    public static ILogSamplingBuilder AddSamplingFilter<[DynamicallyAccessedMembers(DynamicallyAccessedMemberTypes.PublicConstructors)] T>(
        this ILogSamplingBuilder builder,
        Action<SamplingFilterOptions> configure)
        where T : class, ILogSampler
    {
...
    }
}


/// <summary>
/// Lets you register log buffers in a dependency injection container.
/// </summary>
public static class LogBufferingExtensions
{
    /// <summary>
    /// Add log buffering.
    /// </summary>
    /// <param name="builder">An instance of <see cref="ILoggingBuilder"/> to enable buffering in.</param>
    /// <param name="configure">A delegate to fine-tune the buffering.</param>
    /// <returns>The value of <paramref name="builder"/>.</returns>
    public static ILoggingBuilder AddBuffering(this ILoggingBuilder builder, Action<LogBufferingOptions> configure)
    {
...
    }

    /// <summary>
    /// Add log buffering.
    /// </summary>
    /// <param name="builder">An instance of <see cref="ILoggingBuilder"/> to enable buffering in.</param>
    /// <param name="section">Configuration section that contains <see cref="LogBufferingOptions"/>.</param>
    /// <returns>The value of <paramref name="builder"/>.</returns>
    public static ILoggingBuilder AddBuffering(this ILoggingBuilder builder, IConfigurationSection section)
    {
...
    }

    /// <summary>
    /// Add the built-in simple buffering filter.
    /// </summary>
    /// <param name="builder">An instance of <see cref="ILogSamplingBuilder"/> to set the simple buffering filter in.</param>
    /// <param name="configure">A delegate to fine-tune the sampling.</param>
    /// <returns>The value of <paramref name="builder"/>.</returns>
    public static ILogSamplingBuilder AddSimpleBufferingFilter(ILogSamplingBuilder builder, Action<BufferingFilterOptions> configure)
    {
...
    }
}


public class BufferingFilterOptions
{
    /// <summary>
    /// Gets or sets a list of log pattern matchers for buffering.
    /// </summary>
    public IList<BufferingMatcher> Matchers { get; set; }
}

public class SamplingFilterOptions
{
    /// <summary>
    /// Gets or sets a list of log pattern matchers for sampling.
    /// </summary>
    public IList<SamplingMatcher> Matchers { get; set; }
}

public class LogBufferConfig
{
    /// <summary>
    /// Gets or sets log buffer name.
    /// </summary>
    [Required]
    public string Name { get; set; } = string.Empty;

    /// <summary>
    /// Gets or sets duration to suspend buffering after the flush operation occurred.
    /// </summary>
    public TimeSpan? SuspendAfterFlushDuration { get; set; }

    /// <summary>
    /// Gets or sets a circular buffer duration.
    /// </summary>
    public TimeSpan? Duration { get; set; }

    /// <summary>
    /// Gets or sets buffer size.
    /// </summary>
    public long? Size { get; set; }
}

public class LogBufferingOptions
{
    /// <summary>
    /// Gets or sets a list of log buffers.
    /// </summary>
    public ISet<LogBufferConfig> Configs { get; set; }
}

internal class SimpleSamplingFilter : ILogSampler
{
    private readonly IList<SamplingMatcher> _matchers;

    public SimpleSamplingFilter(IOptions<SamplingFilterOptions> options)
    {
        _matchers = options.Value.Matchers;
    }

    public bool Sample(LogRecordPattern logRecordPattern)
    {
        foreach (var matcher in _matchers)
        {
            if (matcher.Match(logRecordPattern))
            {
                switch (matcher.ControlAction)
                {
                    case ControlAction.GlobalFilter:
                        if (!matcher.Filter(logRecordPattern))
                        {
                            return true;
                        }

                        break;
                    case ControlAction.RequestFilter:
                        break;
                }
            }
        }

        return false;
    }
}

internal class SimpleBufferingFilter : ILogSampler
{
    private readonly IList<BufferingMatcher> _matchers;
    private readonly LogBuffer _bufferingTool;

    public SimpleBufferingFilter(IOptions<BufferingFilterOptions> options, LogBuffer bufferingTool)
    {
        _matchers = options.Value.Matchers;
        _bufferingTool = bufferingTool;
    }

    public bool Sample(LogRecordPattern logRecordPattern)
    {
        foreach (var matcher in _matchers)
        {
            if (matcher.Match(logRecordPattern))
            {
                switch (matcher.ControlAction)
                {
                    case ControlAction.GlobalBuffer:
                        matcher!.Buffer(_bufferingTool, logRecordPattern);
                        return true;
                    case ControlAction.RequestBuffer:
                        break;
                }
            }
        }

        return false;
    }
}
//usage example:
const string myGlobalBufferName = "My buffer";
return LoggerFactory.Create(builder =>
{
    builder
        .EnableBuffering(opts =>
        {
            opts.Configs.Add(new LogBufferConfig
            {
                Name = myGlobalBufferName,
                SuspendAfterFlushDuration = TimeSpan.FromSeconds(10),
                Duration = TimeSpan.FromSeconds(10),
                Size = 1_000_000,
            });
        })
        .EnableSampling(samplingBuilder => samplingBuilder
            .EnableSimpleSamplingFilter(opts =>
            {
                opts.Matchers.Add(
                    new SamplingMatcher(
                        new LogRecordPattern
                        {
                            LogLevel = LogLevel.Information,
                            Category = "Microsoft.Extensions.Hosting",
                        },
                        (pattern) => Random.Shared.NextDouble() < 0.01));
            })
            .EnableSimpleBufferingFilter(opts =>
            {
                opts.Matchers.Add(
                    new BufferingMatcher(
                        new LogRecordPattern
                        {
                            LogLevel = LogLevel.Error,
                        },
                        (tool, pattern) => tool.Buffer(myGlobalBufferName, pattern)));
            }));
});

@evgenyfedorov2
Copy link
Contributor Author

evgenyfedorov2 commented Jun 6, 2024

Filtering based on EventID

This is possible with current design, LogRecordPattern has the EventID property.

Head-based sampling of per-request telemetry.

Tracing sampling is not in scope of this proposal, but I think this can be done later, this is just a matter of a few more extension matter which I believe should fit in this design.

@julealgon
Copy link

Has the idea of "converting" logs into activity events and then use the activity sampling mechanism to sample all telemetry been discussed?

Logs and Activity events almost seem redundant to me, conceptually. Makes me wonder if we could eventually just abandon logs completely in favor of activity events.

@tarekgh
Copy link
Member

tarekgh commented Jun 20, 2024

Has the idea of "converting" logs into activity events and then use the activity sampling mechanism to sample all telemetry been discussed?

In the context of the OpenTelemetry I heard multiple times discouraging users from using activity events and should be using logging. Converting the logs to Activity events will not be desirable as the specs recommend specific limits to the events and in addition to this will not be good performance-wise. Activity can also contains other kind of events which can complicate how to filter and manage it.

CC @noahfalk @reyang @cijothomas

@noahfalk
Copy link
Member

noahfalk commented Jun 21, 2024

@noahfalk, my idea here is to add DateTime.UtcNow timestamp immediately to Tags (TState) for each buffered log record as a new key-value pair with the Timestamp key. Then log exporters will need to be able to recognize it.

Although functionally this works, it probably doesn't have good performance characteristics. It requires enumerating through the IReadOnlyList doing a string comparison of every entry to check if that entry is the Timestamp entry. Also Timestamp probably isn't the only field you will want to buffer as log providers might want to emit thread id, Activity Trace Id, Activity Span Id, and potentially others in the future. Trying to match N well-known names to M key-value pairs on a TState requires N*M string comparisons. We've been working with @geeknoid on an alternate higher performance way of doing this using a new interface instead.

How do we expect the sampling API interacts with the filtering API?

Please see the update code snippets below. Basically, filtering is about throwing logs away forever, buffering - putting them aside temporarily. I think those should be separate as @tarekgh suggested.

I think I understand how each mechanism is supposed to behave in isolation, what I don't understand or see any information about is how do these mechanisms behave when they are all used together?

For example assume I code like this:

builder.AddFilter("Microsoft.Extensions.Hosting", LogLevel.Informational)
       .EnableSimpleBufferingFilter(opts =>
            {
                opts.Matchers.Add(
                    new BufferingMatcher(
                        new LogRecordPattern
                        {
                            LogLevel = LogLevel.Error,
                        },
                        (tool, pattern) => tool.Buffer(myGlobalBufferName, pattern)));
            }));
         .EnableSimpleSamplingFilter(opts =>
            {
                opts.Matchers.Add(
                    new SamplingMatcher(
                        new LogRecordPattern
                        {
                            LogLevel = LogLevel.Warning,
                            Category = "Microsoft.Extensions.Hosting",
                        },
                        (pattern) => Random.Shared.NextDouble() < 0.01));
            })

And assume I log various messages with different properties:

  1. Cateogory = Microsoft.Extensions.Hosting, Level=Error
  2. Cateogory = Microsoft.Extensions.Hosting, Level=Warning
  3. Cateogory = Microsoft.Extensions.Hosting, Level=Informational
  4. Cateogory = Microsoft.Extensions.Hosting, Level=Trace
  5. Category = Foo, Level = Error
  6. Category = Foo, Level = Info

What log messages end up where? What is the mental model for customers to apply so they know what behavior to expect?

At the same time, I feel my current proposal aligns with patterns we have in the repository.

I'm not sure which patterns you are referring to. API design certainly has a subjective element to it but at the moment this API proposal still feels like it deviates considerably from the style of the pre-existing API in this area.

Compare:

builder.AddFilter("Microsoft.Extensions.Hosting", LogLevel.Informational)

vs.

.EnableSimpleBufferingFilter(opts =>
            {
                opts.Matchers.Add(
                    new BufferingMatcher(
                        new LogRecordPattern
                        {
                            Category = "Microsoft.Extensions.Hosting",
                            LogLevel = LogLevel.Informational,
                        },
                        (tool, pattern) => tool.Buffer(myGlobalBufferName, pattern)));
            }));
  1. The existing filter API is compact using ~1 LOC per filter, the new API is verbose using ~7 LoC per filter. The number of lines isn't itself the issue, but it tends to be a proxy for the number of unique type names, method names, and concepts a user needs to understand in order to use the API. Going from 1 to 7 implies the patterns have significant differences in complexity or verbosity.

  2. The existing filter API primarily uses extension methods with parameter arguments to define filtering rules, the new API uses list appending, multiple nested object constructors and property setters.

  3. The existing filter API uses a type called LoggerFilterRule to store matching information (AddFilter() creates an instance of this type). The new API introduces a new LogRecordPattern type for roughly the same purpose.

  4. The existing filter API uses terminology 'Rule' and 'Rules' for the filtering criteria whereas the new API introduces different terminology, 'Matcher' and 'Matchers', for its filtering criteria.

We surely want to take another look at it at the API review session.

As it is right now, I don't believe the API is ready for review and it may need to change significantly before it becomes ready. Review is good for validation and minor design refinements but it is not a good forum for resolving large design decisions or exploring a broader design space. We need to resolve the majority of the issues ourselves, prior to review.

Filtering based on EventID

This is possible with current design, LogRecordPattern has the EventID property.

If my only goal was to filter on EventID, with no desire to sample or buffer, is there a straigtforward way to do that? As best I understand the current API I don't think that is possible without introducing sampling or buffering as an unneeded concept. Its OK to say this filtering is out-of-scope for the current feature but if that is the case then I'll want to have a good understanding of how we could potentially add it in the future in a way that feels consistent with the rest of the API.

Logs and Activity events almost seem redundant to me, conceptually. Makes me wonder if we could eventually just abandon logs completely in favor of activity events.

I agree, there is a degree of redundancy there. I believe distributed tracing events got added to the OpenTelemetry specification because at the time the spec didn't have a logging API and events served as a very simple form of logging. But hypothetically, if one of them was ever to take over the role of the other I expect it would be logs that get preserved. Logs have structure, categorization, filtering, configuration, pluggable sinks, enrichment, dedicated programming model, and attention to performance. Adding sampling to logs appears far easier than adding all the capabilities of logs to ActivityEvents.

@KalleOlaviNiemitalo
Copy link

Are ILogger.BeginScope calls buffered?

@alrz
Copy link
Member

alrz commented Aug 31, 2024

Would it be possible to filter individual fields as well? This is useful when the underlying library logs something that is too large or possibly a dupe of others in the log record.

@evgenyfedorov2
Copy link
Contributor Author

evgenyfedorov2 commented Oct 21, 2024

Improved API based on feedback

In this post, I am trying to accommodate all the feedback above and propose a new iteration of the logging sampling. In addition to logging sampling as such, it will also include logging buffering, leveraging its .NET 9 API dotnet/runtime#104129.

API Usage

Here is the main part - how the API usage might look like:

// BUFFERING:

// enable buffering using one of the options below:
// 1. If you have a non-ASP.NET Core app, and would like to buffer logs of the Warning level and below:
loggingBuilder.AddGlobalBuffer(LogLevel.Warning);

// 2. If you have an ASP.NET Core app, you can buffer logs
// for each HTTP request/response pair into a separate buffer
// and only for the lifetime of the respective HttpContext.
// If there is no active HttpContext, buffering will be done into the global buffer.
// again, only for the Warning level and below:
loggingBuilder.AddHttpRequestBuffering(LogLevel.Warning);

// and trigger flush:
public class MyClass()
{
    public MyClass(IBufferManager bufferManager) {...} // injected via constructor

    private void BusinessCriticalMethod()
    {
      try {}
      catch // something bad happened
      {
         bufferManager.Flush(); // emit all buffered logs
      }
    }
}

// SAMPLING:
// enable a sampler using one of the options below:

// 1. sample 10% of Information level logs and below using a built-in probabilistic sampler:
loggingBuilder.AddProbabilitySampler(0.1, LogLevel.Information);

// 2. or create and register your own sampler:
loggingBuilder.AddSampler<MyCustomSampler>();

// 4. or, in case you use OpenTelemetry Tracing Sampling,
// just apply same sampling decision to logs which is already made to the underlying Activity:
loggingBuilder.AddTraceBasedSampler();

Implementation

Here is how implementation might look like, just to give an idea:

Buffering implementation

/// <summary>
/// Interface for a global buffer manager.
/// </summary>
public interface IBufferManager
{
    /// <summary>
    /// Flushes the buffer and emits all buffered logs.
    /// </summary>
    void Flush();

    /// <summary>
    /// Enqueues a log record in the underlying buffer.
    /// </summary>
    /// <param name="bufferSink">Buffer sink.</param>
    /// <param name="logLevel">Log level.</param>
    /// <param name="category">Category.</param>
    /// <param name="eventId">Event ID.</param>
    /// <param name="attributes">Attributes.</param>
    /// <param name="exception">Exception.</param>
    /// <param name="formatter">Formatter delegate.</param>
    /// <typeparam name="TState">Type of the <paramref name="attributes"/> instance.</typeparam>
    /// <returns><see langword="true"/> if the log record was buffered; otherwise, <see langword="false"/>.</returns>
    bool TryEnqueue<TState>(
        IBufferSink bufferSink,
        LogLevel logLevel,
        string category,
        EventId eventId,
        TState attributes,
        Exception? exception,
        Func<TState, Exception?, string> formatter);
}

/// <summary>
/// Interface for a HTTP request buffer manager.
/// </summary>
public interface IHttpRequestBufferManager : IBufferManager
{
    /// <summary>
    /// Flushes the buffer and emits buffered logs for the current request.
    /// </summary>
    public void FlushCurrentRequestLogs();
}

/// <summary>
/// Represents a sink for buffered log records of all categories which can be forwarded
/// to all currently registered loggers.
/// </summary>
public interface IBufferSink
{
    /// <summary>
    /// Forwards the <paramref name="serializedRecords"/> to all currently registered loggers.
    /// </summary>
    /// <param name="serializedRecords">serialized log records.</param>
    /// <typeparam name="T">Type of the log records.</typeparam>
    void LogRecords<T>(IEnumerable<T> serializedRecords)
        where T : ISerializedLogRecord;
}

/// <summary>
/// Represents a buffered log record that has been serialized.
/// </summary>
public interface ISerializedLogRecord
{
    public DateTimeOffset Timestamp { get; }
    public LogLevel LogLevel { get; }
    public EventId EventId { get; }
    public string? Exception { get; }
    public string? FormattedMessage { get; }
    public IReadOnlyList<KeyValuePair<string, string>> Attributes { get; }
}

// Extension methods to register in ILoggingBuilder:
public static ILoggingBuilder AddGlobalBuffer(this ILoggingBuilder builder, IConfiguration configuration)
{
    _ = Throw.IfNull(builder);
    _ = Throw.IfNull(configuration);

    return builder
        .AddGlobalBufferConfiguration(configuration)
        .AddGlobalBufferManager();
}

public static ILoggingBuilder AddGlobalBuffer(this ILoggingBuilder builder, LogLevel? level = null, Action<GlobalBufferOptions>? configure = null)
{
    _ = Throw.IfNull(builder);

    _ = builder.Services
        .Configure<GlobalBufferOptions>(options => options.Rules.Add(new BufferFilterRule(null, level, null)))
        .Configure(configure ?? new Action<GlobalBufferOptions>(_ => { }));

    return builder.AddGlobalBufferManager();
}

public static ILoggingBuilder AddHttpRequestBuffering(this ILoggingBuilder builder, IConfiguration configuration)
{
    _ = Throw.IfNull(builder);
    _ = Throw.IfNull(configuration);

    return builder
        .AddHttpRequestBufferConfiguration(configuration)
        .AddHttpRequestBufferManager()
        .AddGlobalBufferConfiguration(configuration)
        .AddGlobalBufferManager();
}

public static ILoggingBuilder AddHttpRequestBuffering(this ILoggingBuilder builder, LogLevel? level = null, Action<HttpRequestBufferOptions>? configure = null)
{
    _ = Throw.IfNull(builder);

    _ = builder.Services
        .Configure<HttpRequestBufferOptions>(options => options.Rules.Add(new BufferFilterRule(null, level, null)))
        .Configure(configure ?? new Action<HttpRequestBufferOptions>(_ => { }));

    return builder
        .AddHttpRequestBufferManager()
        .AddGlobalBuffer(level)
        .AddGlobalBufferManager();
}

public class GlobalBufferOptions
{
    /// <summary>
    /// Gets or sets the time to suspend the buffer after flushing.
    /// </summary>
    /// <remarks>
    /// Use this to temporarily suspend buffering after a flush, e.g. in case of an incident you may want all logs to be emitted immediately,
    /// so the buffering will be suspended for the <see paramref="SuspendAfterFlushDuration"/> time.
    /// </remarks>
    public TimeSpan SuspendAfterFlushDuration { get; set; } = TimeSpan.FromSeconds(30);

    /// <summary>
    /// Gets or sets the duration to check and remove the buffered items exceeding the <see cref="Capacity"/>.
    /// </summary>
    public TimeSpan Duration { get; set; } = TimeSpan.FromSeconds(30);

    /// <summary>
    /// Gets or sets the size of the buffer.
    /// </summary>
    public int Capacity { get; set; } = 10_000;

    /// <summary>
    /// Gets or sets the collection of <see cref="BufferFilterRule"/> used for filtering log messages for the purpose of further buffering.
    /// </summary>
    public List<BufferFilterRule> Rules { get; set; } = [];
}

/// <summary>
/// The options for LoggerBuffer.
/// </summary>
public class HttpRequestBufferOptions
{
    /// <summary>
    /// Gets or sets the duration to check and remove the buffered items exceeding the <see cref="PerRequestCapacity"/>.
    /// </summary>
    public TimeSpan PerRequestDuration { get; set; } = TimeSpan.FromSeconds(10);

    /// <summary>
    /// Gets or sets the size of the buffer for a request.
    /// </summary>
    public int PerRequestCapacity { get; set; } = 1_000;

    /// <summary>
    /// Gets or sets the collection of <see cref="BufferFilterRule"/> used for filtering log messages for the purpose of further buffering.
    /// </summary>
    public List<BufferFilterRule> Rules { get; set; } = [];
}

public class BufferFilterRule
{
    /// <summary>
    /// Initializes a new instance of the <see cref="BufferFilterRule"/> class.
    /// </summary>
    public BufferFilterRule()
        : this(null, null, null)
    {
    }

    /// <summary>
    /// Initializes a new instance of the <see cref="BufferFilterRule"/> class.
    /// </summary>
    /// <param name="categoryName">The category name to use in this filter rule.</param>
    /// <param name="logLevel">The <see cref="LogLevel"/> to use in this filter rule.</param>
    /// <param name="eventId">The <see cref="EventId"/> to use in this filter rule.</param>
    public BufferFilterRule(string? categoryName, LogLevel? logLevel, int? eventId)
    {
        Category = categoryName;
        LogLevel = logLevel;
        EventId = eventId;
    }

    /// <inheritdoc/>
    public string? Category { get; set; }

    /// <inheritdoc/>
    public LogLevel? LogLevel { get; set; }

    /// <inheritdoc/>
    public int? EventId { get; set; }
}

Sampling implementation

/// <summary>
 /// Controls the number of samples of log records collected and sent to the backend.
 /// </summary>
 public abstract class LoggerSampler
 {
     /// <summary>
     /// Makes a sampling decision based on the provided <paramref name="parameters"/>.
     /// </summary>
     /// <param name="parameters">The parameters used to make the sampling decision.</param>
     /// <returns><see langword="true" /> if the log record should be sampled; otherwise, <see langword="false" />.</returns>
     public abstract bool ShouldSample(SamplingParameters parameters);
 }

/// <summary>
/// Contains the parameters helping make sampling decisions for logs.
/// </summary>
public readonly struct SamplingParameters : IEquatable<SamplingParameters>
{
    public SamplingParameters(LogLevel logLevel, string category, EventId eventId, IReadOnlyList<KeyValuePair<string, object?>>? state = null)
    {
        LogLevel = logLevel;
        Category = Throw.IfNull(category);
        EventId = eventId;
        State = state;
    }

    /// <summary>
    /// Gets the log category.
    /// </summary>
    public string Category { get; }

    /// <summary>
    /// Gets the event ID.
    /// </summary>
    public EventId EventId { get; }

    /// <summary>
    /// Gets the log level.
    /// </summary>
    public LogLevel LogLevel { get; }

    /// <summary>
    /// Gets the log message state.
    /// </summary>
    public IReadOnlyList<KeyValuePair<string, object?>>? State { get; }
}

// Extension methods to register in ILoggingBuilder:

/// <summary>
/// Adds Trace-based logging sampler to the logging infrastructure. Sampling decisions
/// for logs match exactly the sampling decisions for the underlying <see cref="System.Diagnostics.Activity"/>.
/// </summary>
/// <param name="builder">The dependency injection container to add logging to.</param>
/// <returns>The value of <paramref name="builder"/>.</returns>
/// <remarks>Please configure Tracing Sampling separately as part of OpenTelemetry .NET.</remarks>
/// <exception cref="ArgumentNullException"><paramref name="builder"/> is <see langword="null"/>.</exception>
public static ILoggingBuilder AddTraceBasedSampler(this ILoggingBuilder builder)
{
   _ = Throw.IfNull(builder);

   return builder.AddSampler<TraceBasedSampler>();
}

/// <summary>
/// Adds Probability sampler to the logging infrastructure. Matched logs will be sampled
/// according to the provided probability./>.
/// Higher the probability value, higher is the probability of a given log record to be sampled in.
/// </summary>
/// <param name="builder">The dependency injection container to add logging to.</param>
/// <param name="configuration">The <see cref="IConfiguration" /> to add.</param>
/// <returns>The value of <paramref name="builder"/>.</returns>
/// <exception cref="ArgumentNullException"><paramref name="builder"/> is <see langword="null"/>.</exception>
public static ILoggingBuilder AddProbabilitySampler(this ILoggingBuilder builder, IConfiguration configuration)
{
   _ = Throw.IfNull(builder);
   _ = Throw.IfNull(configuration);

   return builder
       .AddProbabilitySamplerConfiguration(configuration)
       .AddSampler<ProbabilitySampler>();
}

/// <summary>
/// Adds Probability sampler to the logging infrastructure. Matched logs will be sampled
/// according to the provided <paramref name="probability"/>.
/// Higher the probability value, higher is the probability of a given log record to be sampled in.
/// </summary>
/// <param name="builder">The dependency injection container to add logging to.</param>
/// <param name="probability">Probability from 0.0 to 1.0.</param>
/// <param name="level">The log level (and below) to apply the sampler to.</param>
/// <returns>The value of <paramref name="builder"/>.</returns>
/// <exception cref="ArgumentNullException"><paramref name="builder"/> is <see langword="null"/>.</exception>
public static ILoggingBuilder AddProbabilitySampler(this ILoggingBuilder builder, double probability, LogLevel? level = null)
{
   _ = Throw.IfNull(builder);
   _ = Throw.IfOutOfRange(probability, 0, 1, nameof(probability));

   _ = builder.Services.Configure<ProbabilitySamplerOptions>(options =>
           options.Rules.Add(new ProbabilitySamplerFilterRule(probability, null, level, null)));

   return builder.AddSampler<ProbabilitySampler>();
}

/// <summary>
/// Adds a lambda logging sampler to the logging infrastructure.
/// </summary>
/// <param name="builder">The dependency injection container to add logging to.</param>
/// <param name="samplingDecision">The delegate to be used to decide what to sample.</param>
/// <returns>The value of <paramref name="builder"/>.</returns>
/// <exception cref="ArgumentNullException"><paramref name="builder"/> or <paramref name="samplingDecision"/> is <see langword="null"/>.</exception>
public static ILoggingBuilder AddSampler(this ILoggingBuilder builder, Func<SamplingParameters, bool> samplingDecision)
{
   _ = Throw.IfNull(builder);
   _ = Throw.IfNull(samplingDecision);

   return builder.AddSampler(new FuncBasedSampler(samplingDecision));
}

/// <summary>
/// Adds a logging sampler type to the logging infrastructure.
/// </summary>
/// <typeparam name="T">Logging sampler type.</typeparam>
/// <param name="builder">The dependency injection container to add logging to.</param>
/// <returns>The value of <paramref name="builder"/>.</returns>
/// <exception cref="ArgumentNullException"><paramref name="builder"/> is <see langword="null"/>.</exception>
public static ILoggingBuilder AddSampler<[DynamicallyAccessedMembers(DynamicallyAccessedMemberTypes.PublicConstructors)] T>(
   this ILoggingBuilder builder)
   where T : LoggerSampler
{
   _ = Throw.IfNull(builder);

   builder.Services.TryAddEnumerable(ServiceDescriptor.Singleton<ILoggerFactory, ExtendedLoggerFactory>());
   _ = builder.Services.AddSingleton<LoggerSampler, T>();

   return builder;
}

/// <summary>
/// Adds a logging sampler instance to the logging infrastructure.
/// </summary>
/// <param name="builder">The dependency injection container to add logging to.</param>
/// <param name="sampler">The sampler instance to add.</param>
/// <returns>The value of <paramref name="builder"/>.</returns>
/// <exception cref="ArgumentNullException"><paramref name="builder"/> or <paramref name="sampler"/> is <see langword="null"/>.</exception>    
public static ILoggingBuilder AddSampler(this ILoggingBuilder builder, LoggerSampler sampler)
{
   _ = Throw.IfNull(builder);
   _ = Throw.IfNull(sampler);

   builder.Services.TryAddEnumerable(ServiceDescriptor.Singleton<ILoggerFactory, ExtendedLoggerFactory>());
   _ = builder.Services.AddSingleton(sampler);

   return builder;
}

@noahfalk @samsp-msft @geeknoid @tarekgh

Last update: 12.12.2024

@noahfalk
Copy link
Member

Definitely feels like an improvement over the earlier iteration :) Some more questions to keep refining the design:

buffering

Are you envisioning that users pick exactly one buffering option or they are combinable in some way? I wasn't sure the way you have it listed. I suspect 'pick one' will be considerably easier to conceptualize and implement.

Assuming that the buffering APIs are intended to be 'pick exactly one' then we need to determine what AddHttpRequestBuffering does when messages are logged while no HttpContext is active. For example it might not buffer them or it might buffer them in a singleton buffer.

For flushing the buffer, the usage example is getting the buffer from DI but do we expect the buffer to be registered in the DI container directly? Also if it is registered, I'm not sure how that would work with HttpRequest buffering where there are many buffers which are changing transiently. I suspect you might want to capture the ILoggingBufferProvider rather than the ILoggingBuffer.

Zooming out a little on the flushing, I think it would be good to capture what we expect to be a canonical use-case for flushing the buffer (@geeknoid not sure if you had specific expectations here?). I'm not as familiar with how other systems practically configure tail sampling but I'd guess the common approach is that the buffer is flushed when error messages are observed in the log stream, not by direct API calls. I'm not opposed to having a flush API but I want to make sure usage is capturing the behavior we will recommend everyone to do. If we want to recommend calling the flush API then we should understand where that occurs. Perhaps in some shared application fault handling logic or in error handling middleware on the ASP.NET pipeline?

For ILoggingBufferProvider and ILoggingBuffer, I'd suggest limit the public API to only what is needed for code to request flushing the buffer. I'd expect whatever component implements the IBufferedLogger is the same component that implements the ILoggingBuffer with no need to expose the private API between them to anything else.

The APIs for global buffering didn't seem to match up with the usage. For example I see AddGlobalBuffering with parameters category, eventId, level but the usage takes a Func<>. Perhaps things got out-of-sync? It seems like the main design decision that is lurking there is whether buffering activation is based on a single Func<...,bool> or if it is based on a list of FilterRule objects. The filters allow for slightly more optimization because they can be pre-sorted by category but the single Func<...,bool> is probably simpler to use and implement. I think the Func is probably fine to use as the plan of record.

Its unclear how the GlobalBufferingOptions are currently fitting in. Maybe this was just part of the usage/API sections seeming to be a bit out-of-sync.

sampling

For the ratio based sampler if there is a shortcut API that accepts a level I'd suggest it should sample that level and below, not that level and above. I struggle to come up with a scenario where an app developer would want to sample their high severity messages (ie errors) but not sample lower severity messages (Info).

@evgenyfedorov2
Copy link
Contributor Author

@noahfalk thank you for your comments! I updated the post directly, hopefully it is ok with you.

Are you envisioning that users pick exactly one buffering option or they are combinable in some way? I wasn't sure the way you have it listed. I suspect 'pick one' will be considerably easier to conceptualize and implement.

It is 'pick one', I updated the post. Also, not every app (or library) is an ASP.NET Core app, therefore I believe there must be API for those as well.

Assuming that the buffering APIs are intended to be 'pick exactly one' then we need to determine what AddHttpRequestBuffering does when messages are logged while no HttpContext is active. For example it might not buffer them or it might buffer them in a singleton buffer.

I am sticking to the 'buffer them in the singleton (global) buffer' for now.

For flushing the buffer, the usage example is getting the buffer from DI but do we expect the buffer to be registered in the DI container directly? Also if it is registered, I'm not sure how that would work with HttpRequest buffering where there are many buffers which are changing transiently. I suspect you might want to capture the ILoggingBufferProvider rather than the ILoggingBuffer.

Updated the post.

For ILoggingBufferProvider and ILoggingBuffer, I'd suggest limit the public API to only what is needed for code to request flushing the buffer. I'd expect whatever component implements the IBufferedLogger is the same component that implements the ILoggingBuffer with no need to expose the private API between them to anything else

Updated where possible for now. Will investigate further.

The APIs for global buffering didn't seem to match up with the usage

Updated the post, sticking to the Func<> only for now.

Its unclear how the GlobalBufferingOptions are currently fitting in. Maybe this was just part of the usage/API sections seeming to be a bit out-of-sync.

IMO, it is still relevant. Renamed to BufferingOptions, this is supposed to be options for the whole buffering feature.
Also added an optional Action<BufferingOptions> parameter to all extensions methods which I had missed to do initially. This is a subject for discussion, for instance, SuspendAfterFlushDuration and Filter properties are applicable to both Global and Http request buffering, but Capacity and ,probably, Duration as well might not make sense for the Http request buffering where customers would prefer to buffer everything as long as the request is alive (of course with some upper bound to avoid running out of memory, but it does not necessarily has to be public API)

For the ratio based sampler if there is a shortcut API that accepts a level I'd suggest it should sample that level and below,

Updated the post.

@noahfalk
Copy link
Member

noahfalk commented Oct 23, 2024

I updated the post directly, hopefully it is ok with you.

Yep!

Its unclear how the GlobalBufferingOptions are currently fitting in. Maybe this was just part of the usage/API sections seeming to be a bit out-of-sync.

IMO, it is still relevant.

Yeah, I wasn't trying to have it removed. I only meant it wasn't connecting with the rest of the API and you remedied that with your update.

Renamed to BufferingOptions, this is supposed to be options for the whole buffering feature

I don't think anything compels global buffering and http buffering to share the same Options type. If they happen to share all the same options they can, but I am guessing you'll find its useful to have some minor differences. In that case it would be straightforward to have GlobalBuffering use a GlobalBufferingOptions and HttpRequestBuffering can use a HttpRequestBufferingOptions. For config that is similar we'd just use similar naming and property types. For example if HttpRequestBuffering includes one global buffer and many per-request buffers then you might want PerRequestCapacity and GlobalCapacity as two separate properties on a HttpRequestBufferingOptions type.

At this point I think my main bigger picture question is around if and where do we expect people to use the Flush() API. After that I do still have some feedback on finer grained details for some of the APIs but I wanted to make sure we are covering the major elements before fretting the little stuff. I'd also like to see us get some validation from potential users that the design is aligning with their needs. Thanks @evgenyfedorov2!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation work in progress 🚧
Projects
None yet
Development

No branches or pull requests

8 participants