-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect system info and metrics for Azure #591
Conversation
return | ||
} | ||
|
||
// Server metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting from here, it's getting the metrics. Available metrics are different between Cosmos DB and Flexible Server, but we can use the same logic here.
input/system/azure/system.go
Outdated
} else if system.Info.Azure.CoordinatorStorageMB != 0 { | ||
// Cosmos DB | ||
// TODO: check if we need to sum up the node storage MB too | ||
// (e.g. system.Info.Azure.NodeStorageMB * system.Info.Azure.NodeCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lfittl Do you know the answer for this, by any chance? I haven't tried it since making a node with Cosmos DB is really expensive 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My assumption is no, i.e. we don't need to consider the worker node storage as part of the coordinator's storage (they can be considered separate servers, and my assumption is that Azure Monitor reports the per-server metrics, not anything across the whole cluster).
b605159
to
c79ff1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions, and did not actually try this out, but looks good.
…of Reader to clarify what is needed
Redeployed this version of collector to test instances, looking good including tags: This is ready for re-review! |
|
||
if resourceID == "" { | ||
server.SelfTest.MarkCollectionAspectWarning(state.CollectionAspectSystemStats, "unable to find the database server info") | ||
server.SelfTest.HintCollectionAspect(state.CollectionAspectSystemStats, "Make sure the Monitoring Reader permission of the database is granted to the managed identity.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be moved to the earlier parts where we actually make the list API calls? (since we return early on error) -- or do we just not get any results if the permissions are insufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to close the loop: I added the comment above, but this can be reached when the managed identity is assigned to some db but not this db, so I think the wording is still accurate. When it's not assigned to any db resources, it'll raise an error when we do page, err := pager.NextPage(ctx)
earlier in line 46.
I tweaked that part of code, to make sure that it'll early return when it receives errors (and show the same hint). It'll look like the following. It's not super good looking sadly (I think %v is just showing too much), but I think it does show good info for debugging so I'd say it's a good start.
@@ -102,6 +102,7 @@ func getCollectorConfig(c config.ServerConfig) state.CollectorConfig { | |||
AzureAdTenantId: c.AzureADTenantID, | |||
AzureAdClientId: c.AzureADClientID, | |||
AzureHasAdCertificate: c.AzureADCertificatePath != "", | |||
AzureSubscriptionID: c.AzureSubscriptionID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also needs to be emitted on output
// value is total of 1min | ||
diffedNetworkStats.ReceiveThroughputBytesPerSecond = uint64(*metricValue.Average / 60) | ||
case "read_iops": | ||
diffedDiskStats.ReadOperationsPerSecond = *metricValue.Average |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sentry reported a nil pointer dereference here https://sentry.io/issues/5910271246
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's already fixed actually, but thanks for the ping!
Addressed Lukas's feedback, also double checked that this works with both Flexible Server and Cosmos DB. Merging. |
Collect system info and metrics for Azure Flexible Server and Cosmos DB for PostgreSQL. Note that we won't support this for Single Server as it's close to EOL.
This uses 3 different SDK to collect information:
To start using this, the customer needs to supply a new config variable
AZURE_SUBSCRIPTION_ID
to the collector. Also, they'll need to assign the managed identity to the collector VM (if not already with Log Insights), then grant the Read permission of the database to that managed identity. This requires docs update, likely including some setup process change.When
AZURE_SUBSCRIPTION_ID
is not supplied, the collector doesn't error out and simply won't collect the system info.