[8.x](backport #41516) x-pack/metricbeat/module/openai: Add new module #42033
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed commit message
Implement a new module for OpenAI usage collection. This module operates on
https://api.openai.com/v1/usage
(by default; also configurable for Proxy URLs, etc.) and collects the limited set of usage metrics emitted from the undocumented endpoint.Example how the usage endpoints emits metrics:
Given timestamps
t0
,t1
,t2
, ...tn
in ascending order:t0
(first collection):t1
(after new API usage):t2
(continuous collection):and so on.
Example response:
As soon as the API is used, usage is generated after a few times. So, if collecting using the module real-time and that too multiple times of the day, it would collect duplicates and it is not good for storage as well as analytics of the usage data.
It's better to collect
time.Now() (in UTC) - 24h
so that we get full usage collection of the past day (in UTC) and it avoids duplication. So that's why I have introduced a configrealtime
and set it tofalse
as the collection is 24h delayed; we are now getting daily data.realtime: true
will work as any other normal collection where metrics are fetched in set intervals. Our recommendation is to keeprealtime: false
.As this is a metricbeat module, we do not have existing package that gives us support to store the cursor. So, in order to avoid pulling already pulled data, timestamps are being stored per API key. Logic for the same is commented in the code on how it is stored. We are using a new custom code to store the state in order to store the cursor and begin from the next available date.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
How to test this PR locally
This is an automatic backport of pull request x-pack/metricbeat/module/openai: Add new module #41516 done by Mergify.