Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose platform properties in scheduler metrics #151

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

alexnovak
Copy link
Contributor

Currently, prometheus metrics on the scheduler report all platform properties as a json-serialized string under the "properties" label. This is useful for bb-autoscaler, which knows how to parse this value, but is very challenging to use with tooling which might already be available for autoscaling in your setup, like keda. It also makes it challenging to dashboard or analyze queue metrics.

This change modifies the metric reporting behavior in the scheduler, adding all entries found in the platform to the labels. Collisions are resolved with comma-delimiting the values, which will act as a consistent label due to the strict ordering on properties accepted by buildbarn. Existing labels (platform, instance_name_prefix, and size_class) are prioritized, and cannot be renamed to avoid breaking existing alerts, and to avoid interfering with bb-autoscaler.

Implementation-wise, the individual metrics have been converted to templates which are instantiated whenever a new size class queue is created. The size class queue provisions and registers a registry for its metrics, which it deregisters in its remove method.

)

// counterTemplate is a utility struct for generating and registering prometheus.CounterVec
// resources from a set of options, and required labels to appear in the resulting vec.
type counterTemplate struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there's a better place for this to live? It feels a little divorced from the rest of the logic in this file, which is already massive. Open to suggestions here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant