Skip to content

Commit

Permalink
.Net Agents - Fix polling cycle to properly evaluate failure mode on …
Browse files Browse the repository at this point in the history
…exception (#9581)

### Motivation and Context
<!-- Thank you for your contribution to the semantic-kernel repo!
Please help reviewers and future users, providing the following
information:
  1. Why is this change required?
  2. What problem does it solve?
  3. What scenario does it contribute to?
  4. If it fixes an open issue, please link to the issue here.
-->

Assistant polling not propertly responding to terminal error conditions.

Fixes: #9579

### Description
<!-- Describe your changes, the overall approach, the underlying design.
These notes will help understanding how your code works. Thanks! -->

- Added explicit check for cancelling state: Invoke, InvokeStreaming, &
Polling loops
- Discriminate on different exception modes when polling
- Ensure stale run state isn't evaluated in polling loop
- Remove initial polling delay

### Contribution Checklist
<!-- Before submitting this PR, please make sure: -->

- [X] The code builds clean without any errors or warnings
- [X] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [X] All unit tests pass, and I have added new tests where possible
- [X] I didn't break anyone 😄
  • Loading branch information
crickman authored Nov 6, 2024
1 parent 6b20b98 commit 874ee95
Show file tree
Hide file tree
Showing 2 changed files with 56 additions and 7 deletions.
49 changes: 42 additions & 7 deletions dotnet/src/Agents/OpenAI/Internal/AssistantThreadActions.cs
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
// Copyright (c) Microsoft. All rights reserved.
using System;
using System.ClientModel;
using System.Collections.Generic;
using System.Linq;
using System.Net;
Expand Down Expand Up @@ -179,9 +180,11 @@ public static async IAsyncEnumerable<ChatMessageContent> GetMessagesAsync(Assist
// Evaluate status and process steps and messages, as encountered.
HashSet<string> processedStepIds = [];
Dictionary<string, FunctionResultContent> functionSteps = [];

do
{
// Check for cancellation
cancellationToken.ThrowIfCancellationRequested();

// Poll run and steps until actionable
await PollRunStatusAsync().ConfigureAwait(false);

Expand Down Expand Up @@ -301,20 +304,49 @@ async Task PollRunStatusAsync()

do
{
// Reduce polling frequency after a couple attempts
await Task.Delay(agent.PollingOptions.GetPollingInterval(count), cancellationToken).ConfigureAwait(false);
cancellationToken.ThrowIfCancellationRequested();

if (count > 0)
{
// Reduce polling frequency after a couple attempts
await Task.Delay(agent.PollingOptions.GetPollingInterval(count), cancellationToken).ConfigureAwait(false);
}

++count;

#pragma warning disable CA1031 // Do not catch general exception types
try
{
run = await client.GetRunAsync(threadId, run.Id, cancellationToken).ConfigureAwait(false);
}
catch
// The presence of a `Status` code means the server responded with error...always fail in that case
catch (ClientResultException clientException) when (clientException.Status <= 0)
{
// Check maximum retry count
if (count >= agent.PollingOptions.MaximumRetryCount)
{
throw;
}

// Retry for potential transient failure
continue;
}
catch (AggregateException aggregateException) when (aggregateException.InnerException is ClientResultException innerClientException)
{
// Retry anyway..
// The presence of a `Status` code means the server responded with error
if (innerClientException.Status > 0)
{
throw;
}

// Check maximum retry count
if (count >= agent.PollingOptions.MaximumRetryCount)
{
throw;
}

// Retry for potential transient failure
continue;
}
#pragma warning restore CA1031 // Do not catch general exception types
}
while (s_pollingStatuses.Contains(run.Status));

Expand Down Expand Up @@ -373,6 +405,9 @@ public static async IAsyncEnumerable<StreamingChatMessageContent> InvokeStreamin
IAsyncEnumerable<StreamingUpdate> asyncUpdates = client.CreateRunStreamingAsync(threadId, agent.Id, options, cancellationToken);
do
{
// Check for cancellation
cancellationToken.ThrowIfCancellationRequested();

stepsToProcess.Clear();

await foreach (StreamingUpdate update in asyncUpdates.ConfigureAwait(false))
Expand Down
14 changes: 14 additions & 0 deletions dotnet/src/Agents/OpenAI/RunPollingOptions.cs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@ namespace Microsoft.SemanticKernel.Agents.OpenAI;
/// </summary>
public sealed class RunPollingOptions
{
/// <summary>
/// The default maximum number or retries when monitoring thread-run status.
/// </summary>
public static int DefaultMaximumRetryCount { get; } = 3;

/// <summary>
/// The default polling interval when monitoring thread-run status.
/// </summary>
Expand All @@ -28,6 +33,15 @@ public sealed class RunPollingOptions
/// </summary>
public static TimeSpan DefaultMessageSynchronizationDelay { get; } = TimeSpan.FromMilliseconds(500);

/// <summary>
/// The maximum retry count when polling thread-run status.
/// </summary>
/// <remarks>
/// Only affects failures that have the potential to be transient. Explicit server error responses
/// will result in immediate failure.
/// </remarks>
public int MaximumRetryCount { get; set; } = DefaultMaximumRetryCount;

/// <summary>
/// The polling interval when monitoring thread-run status.
/// </summary>
Expand Down

0 comments on commit 874ee95

Please sign in to comment.