Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Onboard ACStor targets #976

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Onboard ACStor targets #976

wants to merge 11 commits into from

Conversation

shlokshah-dev
Copy link
Member

PR Description

This PR onboards the targets for Azure Container Storage (ACStor)

New Feature Checklist

  • List telemetry added about the feature.
  • Link to the one-pager about the feature.
  • List any tasks necessary for release (3P docs, AKS RP chart changes, etc.) after merging the PR.
  • Attach results of scale and perf testing.

Tests Checklist

  • Have end-to-end Ginkgo tests been run on your cluster and passed? To bootstrap your cluster to run the tests, follow these instructions.
    • Labels used when running the tests on your cluster:
      • operator
      • windows
      • arm64
      • arc-extension
      • fips
  • Have new tests been added? For features, have tests been added for this feature? For fixes, is there a test that could have caught this issue and could validate that the fix works?

@shlokshah-dev shlokshah-dev requested a review from a team as a code owner September 18, 2024 20:32
scrape_configs:
- job_name: acstor-capacity-provisioner
honor_labels: true
scrape_interval: 10s
Copy link
Member

@bragi92 bragi92 Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you're using 10s as the scrape interval instead of the default that we set using $$SCRAPE_INTERVAL$$. Any specific reason for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No specific reason as such, I just used this as we are using the same in our current setup. Any concerns which you anticipate from deviating from the default?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using the default setup makes sense to keep everything consistent unless there is a good reason to deviate from it.
The pros for 30 second scrapes are : reduced resource load on the pod and reduced data volume (which can also help with query performance).

Based on the list of metrics I see in the minimal ingestion profile list I would recommend to use the 30 second value i.e. the $$SCRAPE_INTERVAL$$ parameter which we replace during config map parsing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bragi92 Updated the scrape_interval to use default!

@shlokshah-dev shlokshah-dev force-pushed the shlok/acstor-onboarding branch 2 times, most recently from d054b41 to 04664f9 Compare September 19, 2024 22:50
action: keep
regex: metrics

# If prometheus.io/path is specified, scrape this path instead of /metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove all commented out part of config. I put it for future reference when i converted them...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -29,6 +29,8 @@ data:
controlplane-kube-scheduler = false
controlplane-kube-controller-manager = false
controlplane-etcd = true
acstor-capacity-provisioner = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont we want to default both these new targets to "true" , so thay way, customer doesn't have to go thru additional step to enable thru config map ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if a cluster doesn't have ACStor enabled? Would keeping the default to true give out any errors when it doesn't find the targets?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since these are pod discoveries, if those pods aren't there , it wont discover and hence wont scrape..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. So, in case if ACStor is installed later, will the customer need to re-apply the config map? Or does the discovery happen automatically?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -25,6 +25,8 @@ func (fcl *FilesystemConfigLoader) SetDefaultScrapeSettings() (map[string]string
config["networkobservabilityHubble"] = "true"
config["networkobservabilityCilium"] = "true"
config["noDefaultsEnabled"] = "false"
config["acstor-capacity-provisioner"] = "false"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default to true for both?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -93,7 +95,8 @@ func processConfigMap() map[string]string {
"WINDOWSKUBEPROXY_SCRAPE_INTERVAL", "PROMETHEUS_COLLECTOR_HEALTH_SCRAPE_INTERVAL",
"POD_ANNOTATION_SCRAPE_INTERVAL", "KAPPIEBASIC_SCRAPE_INTERVAL",
"NETWORKOBSERVABILITYRETINA_SCRAPE_INTERVAL", "NETWORKOBSERVABILITYHUBBLE_SCRAPE_INTERVAL",
"NETWORKOBSERVABILITYCILIUM_SCRAPE_INTERVAL",
"NETWORKOBSERVABILITYCILIUM_SCRAPE_INTERVAL", "ACSTORCAPACITYPROVISIONER_SCRAPE_INTERVAL",
Copy link
Contributor

@vishiy vishiy Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pick these 2 new environment vars, and add them to be part of our telemetry in telemetry.go (under fluent-bit/src) ? This will help us to determine if these targets are enabled for any given cluster where managed prom is enabled. You could also add your keeplists to telemetry, to see what other metrics of ours, customers are adding to the defaults, so that way we can add to our defaults in the future as needed. This telemetry is also in the same file as in the above comment...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the env variables and keep lists to telemetry.go

@bragi92
Copy link
Member

bragi92 commented Oct 7, 2024

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants