Add instrumentation for simple convenience chat calls (#107)

This PR builds foundation for OpenAI SDK tracing and metrics instrumentation (using Otel-compatible .NET primitives). It's limited to convenience `ChatClient` methods without streaming. The PR implements instrumentation according to [OpenTelemetry GenAI semantic conventions](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/gen-ai). The intention is to add instrumentation to other methods and client types and evolve it along with OTel GenAI semantic conventions. TODO (in this PR): - [x] add samples/docs - [x] add experimental feature-flag required to enable instrumentation - we don't know when OTel semantic conventions will be stable and expect breaking changes. TODO (in next PRs): - [ ] add instrumentation to streaming calls and protocol methods - [ ] track prompts and completions in events - ...
openai · Aug 9, 2024 · d5b5c60 · d5b5c60
1 parent 3284295
commit d5b5c60
Show file tree

Hide file tree

Showing 15 changed files with 966 additions and 18 deletions.
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 [![NuGet version](https://img.shields.io/nuget/vpre/openai.svg)](https://www.nuget.org/packages/OpenAI/absoluteLatest)
 
-The OpenAI .NET library provides convenient access to the OpenAI REST API from .NET applications. 
+The OpenAI .NET library provides convenient access to the OpenAI REST API from .NET applications.
 
 It is generated from our [OpenAPI specification](https://github.com/openai/openai-openapi) in collaboration with Microsoft.
 
@@ -26,6 +26,7 @@ It is generated from our [OpenAPI specification](https://github.com/openai/opena
 - [Advanced scenarios](#advanced-scenarios)
   - [Using protocol methods](#using-protocol-methods)
   - [Automatically retrying errors](#automatically-retrying-errors)
+- [Observability](#observability)
 
 ## Getting started
 
@@ -714,7 +715,7 @@ For example, to use the protocol method variant of the `ChatClient`'s `CompleteC
 ChatClient client = new("gpt-4o", Environment.GetEnvironmentVariable("OPENAI_API_KEY"));
 
 BinaryData input = BinaryData.FromBytes("""
-{  
+{
     "model": "gpt-4o",
     "messages": [
        {
@@ -749,3 +750,7 @@ By default, the client classes will automatically retry the following errors up
 - 502 Bad Gateway
 - 503 Service Unavailable
 - 504 Gateway Timeout
+
+## Observability
+
+OpenAI .NET library supports experimental distributed tracing and metrics with OpenTelemetry. Check out [Observability with OpenTelemetry](./docs/observability.md) for more details.
diff --git a/docs/observability.md b/docs/observability.md
@@ -0,0 +1,57 @@
+## Observability with OpenTelemetry
+
+> Note:
+> OpenAI .NET SDK instrumentation is in development and is not complete. See [Available sources and meters](#available-sources-and-meters) section for the list of covered operations.
+
+OpenAI .NET library is instrumented with distributed tracing and metrics using .NET [tracing](https://learn.microsoft.com/dotnet/core/diagnostics/distributed-tracing)
+and [metrics](https://learn.microsoft.com/dotnet/core/diagnostics/metrics-instrumentation) API and supports [OpenTelemetry](https://learn.microsoft.com/dotnet/core/diagnostics/observability-with-otel).
+
+OpenAI .NET instrumentation follows [OpenTelemetry Semantic Conventions for Generative AI systems](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/gen-ai).
+
+### How to enable
+
+The instrumentation is **experimental** - volume and semantics of the telemetry items may change.
+
+To enable the instrumentation:
+
+1. Set instrumentation feature-flag using one of the following options:
+
+   - set the `OPENAI_EXPERIMENTAL_ENABLE_OPEN_TELEMETRY` environment variable to `"true"`
+   - set the `OpenAI.Experimental.EnableOpenTelemetry` context switch to true in your application code when application
+     is starting and before initializing any OpenAI clients. For example:
+
+     ```csharp
+     AppContext.SetSwitch("OpenAI.Experimental.EnableOpenTelemetry", true);
+     ```
+
+2. Enable OpenAI telemetry:
+
+   ```csharp
+   builder.Services.AddOpenTelemetry()
+       .WithTracing(b =>
+       {
+           b.AddSource("OpenAI.*")
+             ...
+            .AddOtlpExporter();
+       })
+       .WithMetrics(b =>
+       {
+           b.AddMeter("OpenAI.*")
+            ...
+            .AddOtlpExporter();
+       });
+   ```
+
+   Distributed tracing is enabled with `AddSource("OpenAI.*")` which tells OpenTelemetry to listen to all [ActivitySources](https://learn.microsoft.com/dotnet/api/system.diagnostics.activitysource) with names starting with `OpenAI.*`.
+
+   Similarly, metrics are configured with `AddMeter("OpenAI.*")` which enables all OpenAI-related [Meters](https://learn.microsoft.com/dotnet/api/system.diagnostics.metrics.meter).
+
+Consider enabling [HTTP client instrumentation](https://www.nuget.org/packages/OpenTelemetry.Instrumentation.Http) to see all HTTP client
+calls made by your application including those done by the OpenAI SDK.
+Check out [OpenTelemetry documentation](https://opentelemetry.io/docs/languages/net/getting-started/) for more details.
+
+### Available sources and meters
+
+The following sources and meters are available:
+
+- `OpenAI.ChatClient` - records traces and metrics for `ChatClient` operations (except streaming and protocol methods which are not instrumented yet)
diff --git a/src/Custom/Chat/ChatClient.cs b/src/Custom/Chat/ChatClient.cs
@@ -1,3 +1,4 @@
+using OpenAI.Telemetry;
 using System;
 using System.ClientModel;
 using System.ClientModel.Primitives;
@@ -14,6 +15,7 @@ namespace OpenAI.Chat;
 public partial class ChatClient
 {
     private readonly string _model;
+    private readonly OpenTelemetrySource _telemetry;
 
     /// <summary>
     /// Initializes a new instance of <see cref="ChatClient"/> that will use an API key when authenticating.
@@ -62,6 +64,7 @@ protected internal ChatClient(ClientPipeline pipeline, string model, Uri endpoin
         _model = model;
         _pipeline = pipeline;
         _endpoint = endpoint;
+        _telemetry = new OpenTelemetrySource(model, endpoint);
     }
 
     /// <summary>
@@ -77,11 +80,22 @@ public virtual async Task<ClientResult<ChatCompletion>> CompleteChatAsync(IEnume
 
         options ??= new();
         CreateChatCompletionOptions(messages, ref options);
-
-        using BinaryContent content = options.ToBinaryContent();
-
-        ClientResult result = await CompleteChatAsync(content, cancellationToken.ToRequestOptions()).ConfigureAwait(false);
-        return ClientResult.FromValue(ChatCompletion.FromResponse(result.GetRawResponse()), result.GetRawResponse());
+        using OpenTelemetryScope scope = _telemetry.StartChatScope(options);
+
+        try
+        {
+            using BinaryContent content = options.ToBinaryContent();
+
+            ClientResult result = await CompleteChatAsync(content, cancellationToken.ToRequestOptions()).ConfigureAwait(false);
+            ChatCompletion chatCompletion = ChatCompletion.FromResponse(result.GetRawResponse());
+            scope?.RecordChatCompletion(chatCompletion);
+            return ClientResult.FromValue(chatCompletion, result.GetRawResponse());
+        }
+        catch (Exception ex)
+        {
+            scope?.RecordException(ex);
+            throw;
+        }
     }
 
     /// <summary>
@@ -105,11 +119,22 @@ public virtual ClientResult<ChatCompletion> CompleteChat(IEnumerable<ChatMessage
 
         options ??= new();
         CreateChatCompletionOptions(messages, ref options);
-
-        using BinaryContent content = options.ToBinaryContent();
-        ClientResult result = CompleteChat(content, cancellationToken.ToRequestOptions());
-        return ClientResult.FromValue(ChatCompletion.FromResponse(result.GetRawResponse()), result.GetRawResponse());
-
+        using OpenTelemetryScope scope = _telemetry.StartChatScope(options);
+
+        try
+        {
+            using BinaryContent content = options.ToBinaryContent();
+            ClientResult result = CompleteChat(content, cancellationToken.ToRequestOptions());
+            ChatCompletion chatCompletion = ChatCompletion.FromResponse(result.GetRawResponse());
+
+            scope?.RecordChatCompletion(chatCompletion);
+            return ClientResult.FromValue(chatCompletion, result.GetRawResponse());
+        }
+        catch (Exception ex)
+        {
+            scope?.RecordException(ex);
+            throw;
+        }
     }
 
     /// <summary>
@@ -200,7 +225,7 @@ private void CreateChatCompletionOptions(IEnumerable<ChatMessage> messages, ref
     {
         options.Messages = messages.ToList();
         options.Model = _model;
-        options.Stream = stream 
+        options.Stream = stream
             ? true
             : null;
         options.StreamOptions = stream ? options.StreamOptions : null;

diff --git a/src/OpenAI.csproj b/src/OpenAI.csproj
@@ -12,7 +12,7 @@
 
     <!-- Generate an XML documentation file for the project. -->
     <GenerateDocumentationFile>true</GenerateDocumentationFile>
-    
+
     <!-- Publish the repository URL in the built .nupkg (in the NuSpec <Repository> element) -->
     <PublishRepositoryUrl>true</PublishRepositoryUrl>
     <PackageIcon>OpenAI.png</PackageIcon>
@@ -21,15 +21,15 @@
     <!-- Create a .snupkg file in addition to the .nupkg file. -->
     <IncludeSymbols>true</IncludeSymbols>
     <SymbolPackageFormat>snupkg</SymbolPackageFormat>
-   
+
     <!-- Embed source files that are not tracked by the source control manager in the PDB -->
     <EmbedUntrackedSources>true</EmbedUntrackedSources>
 
     <TreatWarningsAsErrors>true</TreatWarningsAsErrors>
 
     <!-- Disable missing XML documentation warnings -->
     <NoWarn>$(NoWarn),1570,1573,1574,1591</NoWarn>
-    
+
     <!-- Disable obsolete warnings -->
     <NoWarn>$(NoWarn),0618</NoWarn>
 
@@ -63,7 +63,6 @@
     <!-- Normalize stored file paths in symbols when in a CI build. -->
     <ContinuousIntegrationBuild>true</ContinuousIntegrationBuild>
   </PropertyGroup>
-
   <ItemGroup>
     <None Include="OpenAI.png" Pack="true" PackagePath="\" />
     <None Include="..\CHANGELOG.md" Pack="true" PackagePath="\" />
@@ -73,5 +72,6 @@
   <ItemGroup>
     <PackageReference Include="Microsoft.SourceLink.GitHub" Version="8.0.0" PrivateAssets="All" />
     <PackageReference Include="System.ClientModel" Version="1.1.0-beta.5" />
+    <PackageReference Include="System.Diagnostics.DiagnosticSource" Version="8.0.1" />
   </ItemGroup>
 </Project>
diff --git a/src/Utility/AppContextSwitchHelper.cs b/src/Utility/AppContextSwitchHelper.cs
@@ -0,0 +1,33 @@
+using System;
+
+namespace OpenAI;
+
+internal static class AppContextSwitchHelper
+{
+    /// <summary>
+    /// Determines if either an AppContext switch or its corresponding Environment Variable is set
+    /// </summary>
+    /// <param name="appContexSwitchName">Name of the AppContext switch.</param>
+    /// <param name="environmentVariableName">Name of the Environment variable.</param>
+    /// <returns>If the AppContext switch has been set, returns the value of the switch.
+    /// If the AppContext switch has not been set, returns the value of the environment variable.
+    /// False if neither is set.
+    /// </returns>
+    public static bool GetConfigValue(string appContexSwitchName, string environmentVariableName)
+    {
+        // First check for the AppContext switch, giving it priority over the environment variable.
+        if (AppContext.TryGetSwitch(appContexSwitchName, out bool value))
+        {
+            return value;
+        }
+        // AppContext switch wasn't used. Check the environment variable.
+        string envVar = Environment.GetEnvironmentVariable(environmentVariableName);
+        if (envVar != null && (envVar.Equals("true", StringComparison.OrdinalIgnoreCase) || envVar.Equals("1")))
+        {
+            return true;
+        }
+
+        // Default to false.
+        return false;
+    }
+}
diff --git a/src/Utility/Telemetry/OpenTelemetryConstants.cs b/src/Utility/Telemetry/OpenTelemetryConstants.cs
@@ -0,0 +1,33 @@
+namespace OpenAI.Telemetry;
+
+internal class OpenTelemetryConstants
+{
+    // follow OpenTelemetry GenAI semantic conventions:
+    // https://github.com/open-telemetry/semantic-conventions/tree/v1.27.0/docs/gen-ai
+
+    public const string ErrorTypeKey = "error.type";
+    public const string ServerAddressKey = "server.address";
+    public const string ServerPortKey = "server.port";
+
+    public const string GenAiClientOperationDurationMetricName = "gen_ai.client.operation.duration";
+    public const string GenAiClientTokenUsageMetricName = "gen_ai.client.token.usage";
+
+    public const string GenAiOperationNameKey = "gen_ai.operation.name";
+
+    public const string GenAiRequestMaxTokensKey = "gen_ai.request.max_tokens";
+    public const string GenAiRequestModelKey = "gen_ai.request.model";
+    public const string GenAiRequestTemperatureKey = "gen_ai.request.temperature";
+    public const string GenAiRequestTopPKey = "gen_ai.request.top_p";
+
+    public const string GenAiResponseIdKey = "gen_ai.response.id";
+    public const string GenAiResponseFinishReasonKey = "gen_ai.response.finish_reasons";
+    public const string GenAiResponseModelKey = "gen_ai.response.model";
+
+    public const string GenAiSystemKey = "gen_ai.system";
+    public const string GenAiSystemValue = "openai";
+
+    public const string GenAiTokenTypeKey = "gen_ai.token.type";
+
+    public const string GenAiUsageInputTokensKey = "gen_ai.usage.input_tokens";
+    public const string GenAiUsageOutputTokensKey = "gen_ai.usage.output_tokens";
+}