Python: New Feature: token usage in metrics #9909

Druid-of-Luhn · 2024-12-09T15:19:57Z

name: Feature request
about: Suggest an idea for this project

Currently, the only metrics produced by Semantic Kernel (at least in Python) are function call durations.

The observability documentation does indeed mention this:

Telemetry	Description
Metric	Semantic Kernel captures the following metrics from kernel functions: `semantic_kernel.function.invocation.duration` (Histogram) - function execution time (in seconds) `semantic_kernel.function.streaming.duration` (Histogram) - function streaming execution time (in seconds)

However, above that it also says (emphasis mine):

Metrics: Semantic Kernel emits metrics from kernel functions and AI connectors. You will be able to monitor metrics such as the kernel function execution time, the token consumption of AI connectors, etc.

From that I understand that the feature is planned but not yet implemented. It feels like token usage would be a very helpful one to get as a metric (I can see that it is already logged), since it is a common question for solutions that use LLMs.

For the moment, is it possible for me to create some kind of filter or something and track the metric myself? Or does this depend on a more internal implementation?

I have also seen this issue: #6489

The text was updated successfully, but these errors were encountered:

alliscode · 2024-12-09T16:19:38Z

@TaoChenOSU could you look at this please.

TaoChenOSU · 2024-12-09T23:17:35Z

@Druid-of-Luhn Thank you for bringing this up!

Currently in Python, we are not tracking the token usage per connector as a metric. In .Net, it depends on the specific implementation of the connector. For example, the OpenAI connector emits 3 metrics to track token consumption while the Bedrock connector doesn't.

To answer your question on if it's possible for you to create the metrics yourself, the answer is yes. We do include the token usage information (when it's available) as part of the metadata on the return object when you call a chat completion service. Depending on how you use the AI service, you can create the metrics differently. If you are calling the service directly, you can create the metrics along with where you are calling the service. If you are using a kernel function, you can create the metrics in a filter.

Feel free to post further questions :)

Druid-of-Luhn · 2024-12-10T09:13:08Z

Thanks, I am only using Azure OpenAI models at this stage, so I will add the necessary calls/filters.

Are or will there be plans to implement this in Python in the future, to bring it to parity with .NET?

TaoChenOSU · 2024-12-12T22:12:35Z

Yes, we do have plans to reach parity: #6750

markwallace-microsoft added python Pull requests for the Python Semantic Kernel triage labels Dec 9, 2024

github-actions bot changed the title ~~New Feature: token usage in metrics~~ Python: New Feature: token usage in metrics Dec 9, 2024

alliscode removed the triage label Dec 9, 2024

alliscode assigned TaoChenOSU Dec 9, 2024

alliscode added this to Semantic Kernel Dec 9, 2024

Druid-of-Luhn closed this as completed Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: New Feature: token usage in metrics #9909

Python: New Feature: token usage in metrics #9909

Druid-of-Luhn commented Dec 9, 2024

alliscode commented Dec 9, 2024

TaoChenOSU commented Dec 9, 2024

Druid-of-Luhn commented Dec 10, 2024

TaoChenOSU commented Dec 12, 2024

Python: New Feature: token usage in metrics #9909

Python: New Feature: token usage in metrics #9909

Comments

Druid-of-Luhn commented Dec 9, 2024

alliscode commented Dec 9, 2024

TaoChenOSU commented Dec 9, 2024

Druid-of-Luhn commented Dec 10, 2024

TaoChenOSU commented Dec 12, 2024