-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: New Feature: token usage in metrics #9909
Comments
@TaoChenOSU could you look at this please. |
@Druid-of-Luhn Thank you for bringing this up! Currently in Python, we are not tracking the token usage per connector as a metric. In .Net, it depends on the specific implementation of the connector. For example, the OpenAI connector emits 3 metrics to track token consumption while the Bedrock connector doesn't. To answer your question on if it's possible for you to create the metrics yourself, the answer is yes. We do include the token usage information (when it's available) as part of the metadata on the return object when you call a chat completion service. Depending on how you use the AI service, you can create the metrics differently. If you are calling the service directly, you can create the metrics along with where you are calling the service. If you are using a kernel function, you can create the metrics in a filter. Feel free to post further questions :) |
Thanks, I am only using Azure OpenAI models at this stage, so I will add the necessary calls/filters. Are or will there be plans to implement this in Python in the future, to bring it to parity with .NET? |
Yes, we do have plans to reach parity: #6750 |
name: Feature request
about: Suggest an idea for this project
Currently, the only metrics produced by Semantic Kernel (at least in Python) are function call durations.
The observability documentation does indeed mention this:
semantic_kernel.function.invocation.duration
(Histogram) - function execution time (in seconds)semantic_kernel.function.streaming.duration
(Histogram) - function streaming execution time (in seconds)However, above that it also says (emphasis mine):
From that I understand that the feature is planned but not yet implemented. It feels like token usage would be a very helpful one to get as a metric (I can see that it is already logged), since it is a common question for solutions that use LLMs.
For the moment, is it possible for me to create some kind of filter or something and track the metric myself? Or does this depend on a more internal implementation?
I have also seen this issue: #6489
The text was updated successfully, but these errors were encountered: