-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
预测查询自定义及prometheus-adapter替代metric-adapter的实现 #520
Conversation
🎉 Successfully Build Images. Docker RegistryOverview: https://hub.docker.com/u/gocrane
Quick Deploy - Helm helm repo add crane https://finops-helm.pkg.coding.net/gocrane/gocrane
helm install crane -n crane-system --create-namespace \
--set craned.image.repository=gocrane/craned \
--set craned.image.tag=pr-520-b2a3ba4 \
--set metricAdapter.image.repository=gocrane/metric-adapter \
--set metricAdapter.image.tag=pr-520-b2a3ba4 \
--set craneAgent.image.repository=gocrane/crane-agent \
--set craneAgent.image.tag=pr-520-b2a3ba4 \
--set cranedDashboard.image.repository=gocrane/dashboard \
--set cranedDashboard.image.tag=pr-520-b2a3ba4 crane/crane Coding RegistryOverview: https://finops.coding.net/public-artifacts/gocrane/crane/packages
Quick Deploy - Helm helm repo add crane https://finops-helm.pkg.coding.net/gocrane/gocrane
helm install crane -n crane-system --create-namespace \
--set craned.image.repository=finops-docker.pkg.coding.net/gocrane/crane/craned \
--set craned.image.tag=pr-520-b2a3ba4 \
--set metricAdapter.image.repository=finops-docker.pkg.coding.net/gocrane/crane/metric-adapter \
--set metricAdapter.image.tag=pr-520-b2a3ba4 \
--set craneAgent.image.repository=finops-docker.pkg.coding.net/gocrane/crane/crane-agent \
--set craneAgent.image.tag=pr-520-b2a3ba4 \
--set cranedDashboard.image.repository=finops-docker.pkg.coding.net/gocrane/crane/dashboard \
--set cranedDashboard.image.tag=pr-520-b2a3ba4 crane/crane Ghcr RegistryOverview: https://github.com/orgs/gocrane/packages?repo_name=crane
Quick Deploy - Helm helm repo add crane https://finops-helm.pkg.coding.net/gocrane/gocrane
helm install crane -n crane-system --create-namespace \
--set craned.image.repository=ghcr.io/gocrane/crane/craned \
--set craned.image.tag=pr-520-b2a3ba4 \
--set metricAdapter.image.repository=ghcr.io/gocrane/crane/metric-adapter \
--set metricAdapter.image.tag=pr-520-b2a3ba4 \
--set craneAgent.image.repository=ghcr.io/gocrane/crane/crane-agent \
--set craneAgent.image.tag=pr-520-b2a3ba4 \
--set cranedDashboard.image.repository=ghcr.io/gocrane/crane/dashboard \
--set cranedDashboard.image.tag=pr-520-b2a3ba4 crane/crane |
df9dc31
to
3271650
Compare
8940ee4
to
080600a
Compare
pkg/controller/ehpa/hpa.go
Outdated
var averageValue *resource.Quantity | ||
switch metric.Type { | ||
case autoscalingv2.ResourceMetricSourceType: | ||
switch metric.Resource.Name { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metricIdentifier = utils.GetMetricIdentifier(metric, metric.Resource.name.String())
no need for switch case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
metricsForPrediction = append(metricsForPrediction, metricSpec) | ||
//first get annocation | ||
expressionQuery = utils.GetExpressionQueryAnnocation(metricIdentifier, ehpa.Annotations) | ||
//if annocation not matched, build expressionQuery by metric and ehpa.TargetName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo for annocation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/controller/ehpa/predict.go
Outdated
case autoscalingv2.ResourceMetricSourceType: | ||
switch metric.Resource.Name { | ||
case "cpu": | ||
metricIdentifier = utils.GetMetricIdentifier(metric, v1.ResourceCPU.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for switch case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -229,7 +235,9 @@ func GetPrediction(ctx context.Context, kubeclient client.Client, namespace stri | |||
matchingLabels := client.MatchingLabels(map[string]string{"app.kubernetes.io/managed-by": known.EffectiveHorizontalPodAutoscalerManagedBy}) | |||
// merge metric selectors | |||
for key, value := range labelSelector { | |||
matchingLabels[key] = value | |||
if key == "targetName" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this works for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because labelSelector set from autoscaling.crane.io/effective-hpa-uid to
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{
"targetKind": ehpa.Spec.ScaleTargetRef.Kind,
"targetName": ehpa.Spec.ScaleTargetRef.Name,
"targetNamespace": ehpa.Namespace,
"resourceIdentifier": metricIdentifier,
},
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from parameter we get metricSelector, which need to matched prediction labels,
Labels: map[string]string{
"app.kubernetes.io/name": name,
"app.kubernetes.io/part-of": ehpa.Name,
"app.kubernetes.io/managed-by": known.EffectiveHorizontalPodAutoscalerManagedBy,
known.EffectiveHorizontalPodAutoscalerUidLabel: string(ehpa.UID),
},
so we can changed effective-hpa-uid to "app.kubernetes.io/part-of": ehpa.Name
we can also add a label like ["app.kubernetes.io/target-name": ehpa.OwnerReferences[0].Name]
This will be more accurate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -219,7 +225,7 @@ func IsLocalCustomMetric(metricInfo provider.CustomMetricInfo, client client.Cli | |||
return false | |||
} | |||
|
|||
func GetPrediction(ctx context.Context, kubeclient client.Client, namespace string, metricSelector labels.Selector) (*predictionapi.TimeSeriesPrediction, error) { | |||
func GetPredictions(ctx context.Context, kubeclient client.Client, namespace string, metricSelector labels.Selector) ([]predictionapi.TimeSeriesPrediction, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to external_metric_provider.go
} | ||
|
||
return &predictionList.Items[0], nil | ||
return predictionList.Items, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why return a tsp list, is there some case that has more than one tsp for a metric query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we query the TSP by effective-hpa-uid, we can achieve exact matching. However, when we query the TSP by Target, if the user misconfigures multiple ehpa, there may be multiple TSPS. In this case, matching by TSP and metricIdentifier is compatible. Subsequent processing can be handed over to the HPA
if v.Timestamp < timestampStart.Unix() || v.Timestamp > timestampEnd.Unix() { | ||
continue | ||
for _, prediction := range predictions { | ||
resourceIdentifier, bl := metricSelector.RequiresExactMatch("resourceIdentifier") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's meaning for bl
? better to change to found
for all metricSelector.RequiresExactMatch(xxxx)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change utils.GetReadyPredictionMetric(info.Metric, prediction) to
utils.GetReadyPredictionMetric(info.Metric, resourceIdentifier, &prediction)
,need to get resourceIdentifier, and same as resourceIdentifier == ""
if strings.HasPrefix(info.Metric, "crane") { | ||
prediction, err := GetPrediction(ctx, p.client, namespace, metricSelector) | ||
case known.MetricNamePrediction: | ||
predictions, err := GetPredictions(ctx, p.client, namespace, metricSelector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for returning more than one tsp?
@@ -149,48 +171,13 @@ func (p *ExternalMetricProvider) GetCronExternalMetrics(ctx context.Context, nam | |||
if len(errs) > 0 { | |||
return nil, fmt.Errorf("%v", errs) | |||
} | |||
replicas := DefaultCronTargetMetricValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove this part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DefaultCronTargetMetricValue set to 1
change to minScales
pkg/metrics/metric_collector.go
Outdated
"strings" | ||
"time" | ||
|
||
autoscalingapi "github.com/gocrane/api/autoscaling/v1alpha1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please format the imports by https://github.com/gocrane/crane/blob/main/docs/code-standards.md
pkg/metrics/metric_collector.go
Outdated
pmMap[pm.ResourceIdentifier] = pm | ||
} | ||
|
||
//* collected metric "crane_prediction_time_series_prediction_metric" { label:<name:"aggregateKey" value:"nodes-mem#instance=192.168.56.166:9100" > label:<name:"algorithm" value:"percentile" > label:<name:"expressionQuery" value:"" > label:<name:"rawQuery" value:"sum(node_memory_MemTotal_bytes{} - node_memory_MemAvailable_bytes{}) by (instance)" > label:<name:"resourceIdentifier" value:"nodes-mem" > label:<name:"resourceQuery" value:"" > label:<name:"targetKind" value:"Node" > label:<name:"targetName" value:"192.168.56.166" > label:<name:"targetNamespace" value:"" > label:<name:"type" value:"RawQuery" > gauge:<value:1.82784510645e+06 > timestamp_ms:1639466220000 } was collected before with the same name and label values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update comments to newest design
pkg/metrics/metric_collector.go
Outdated
if err != nil { | ||
klog.Errorf("Failed to list ehpa: %v", err) | ||
} | ||
var uniqPredictionMetrics []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implement unique metrics by map
// ContainerMemUsageExprTemplate is used to query container cpu usage by promql, param is namespace,pod,container | ||
ContainerMemUsageExprTemplate = `container_memory_working_set_bytes{container!="POD",namespace="%s",pod=~"^%s.*$",container="%s"}` | ||
"github.com/gocrane/crane/pkg/utils" | ||
"k8s.io/apimachinery/pkg/util/sets" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please format imports
case "memory": | ||
expressionQuery = GetWorkloadMemUsageExpression(namespace, name) | ||
} | ||
case autoscalingv2.PodsMetricSourceType: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use PodsMetricSourceType now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for ExpressionQuery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/LGTM good job!
What type of PR is this?
Prometheus多级集群场景应用
prometheus-adapter设置为一级prometheus,craned设置为二级prometheus
扩展:若一级prometheus数据汇总至二级prometheus时增加了label,如{cluster="test"},可能导致模型数据源异常
此时可以基于TSP的expressionQuery修改后添加至annocation
使用示例
APIservice调整
将external的apiservice调整为原生的prometheus-adapter
craned指标查询
注:resourceIdentifier为metricType.metricName, 其中metricName为原始指标
prometheus-adapter配置示例
TSP模型指标查询
参考配置:
EHPA配置示例
此配置示例中开启了cron以及prediction,prediction部分metric配置将生成相应tsp指标
应用该yaml后,将自动生成以ehpa前缀的hpa以及tsp[ehpa-project-test]
TSP查看
kubectl -n test get tsp ehpa-project-test -o yaml
resourceIdentifier值将作为预测指标的matchLabel,实现预测指标匹配
查看hpa状态
此时可以看到hpa共有五个指标
查看hpa配置
# kubectl get HorizontalPodAutoscaler.v2beta2.autoscaling -ntest ehpa-project-test -o yaml
相应指标如下,通过prometheus-adapter获取
结束