-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/vcenterreceiver] Optimize vCenter VM network calls #32201
[receiver/vcenterreceiver] Optimize vCenter VM network calls #32201
Conversation
receiver/vcenterreceiver/client.go
Outdated
@@ -89,6 +93,7 @@ func (vc *vcenterClient) Datacenters(ctx context.Context) ([]*object.Datacenter, | |||
return datacenters, nil | |||
} | |||
|
|||
// Computes returns the ComputeResources (Hosts & Clusters) of the vSphere SDK for a given datacenter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is my original comment, but I would modify the parenthesis to "(and ClusterComputeResources)"
component: vcenterreceiver | ||
|
||
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). | ||
note: "Changes the method for collecting VMs used by the `vccenterreceiver` to the more time efficient `CreateContainerView` method." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would modify this a bit. really we are using a container view's "retrieve" method now
receiver/vcenterreceiver/scraper.go
Outdated
errs.AddPartial(1, err) | ||
hostRef := vm.Summary.Runtime.Host | ||
if hostRef == nil { | ||
errs.AddPartial(1, errors.New("VM doesn't have a HostSystem")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should actually be more specific and include what VM has this issue.
@@ -349,12 +342,6 @@ func (v *vcenterMetricScraper) collectVMs( | |||
} | |||
} | |||
|
|||
hostname, err := vmHost.ObjectName(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor and wouldn't hold up PR: since hostname is references a few times, it's still probably worth it to make a variable out of it after creating the hwSum
variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can see, there are fewer VMs being returned than before with this logic. Did you manually check this?
@Caleb-Hurshman ignore this. I was looking at a wrong metric which only reported when VMs were active |
4c5e92e
to
4c0c6e1
Compare
With a few additional changes, collection times in our environment dropped from 90s to 23s-30s. Rather than make a single call to get "shell" VM objects, and then making additional network calls for each VM's properties and performance metrics, we now are up front making a single network call for all VMs along with only the specific properties that we care about as well as a single network call for all of the performance metrics for all VMs. This data is stored and matched up as we cycle through each VM to build the various VM metrics/resources. A follow up PR will be made to follow a similar pattern for Hosts/Clusters/Resource Pools, & Datastores |
7618c4c
to
94c08f9
Compare
- key: vcenter.resource_pool.name | ||
value: | ||
stringValue: Resources | ||
- key: vcenter.resource_pool.inventory_path | ||
value: | ||
stringValue: /Datacenter/host/Cluster/Resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a bug fix or other change here, in addition to the optimization? If not, why would the expected result change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djaglowski I think I would argue that I improved the tests here by having better XML responses being returned (which resulted in updated expected test results). Now the tests show the new resource pool attributes being added to VMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining. Would it be possible to submit the changes to the mock server separately? I'm sure they're good changes but it would be helpful to validate the refactoring independently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djaglowski I can do this.
I do want to call out that the reason that these changes are there are because I think they pertain to the testing of this block of work (the VM code).
I made changes, the tests were broken.
So I started to update the tests, with new recordings from my new requests. But then I noticed that the expected results didn't seem to match what the code was doing (which was what I WAS expecting....if that makes any sense 😄 ).
Like I said, I can work around this and modify this PR to remove the expected results modifications. But it will make the tests in a less than optimal state IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the test changes work without the other changes in this PR? I was thinking that we would first update tests, then rebase this PR so it is only refactoring for performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djaglowski let me give that a try. I think that should work and not be too difficult to implement. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djaglowski rebased off the main which contained the merged in fixes from the previous PR
Makes a few small refactors for the vcenterreceiver scraper and metrics files. Fixes unit tests. Makes a single performance metrics network call for VMs rather than one per VM.
013fc38
to
9ba5070
Compare
9ba5070
to
ad9bd99
Compare
Description:
Changes the method for collecting VMs used by the
vccenterreceiver
to the more time-efficientCreateContainerView
method. This is the first step to addressing the issue linked below.Link to tracking Issue: #31837
Testing: These changes were tested on an environment with 200+ virtual machines. The original collection time was ~80 seconds. Collection times with these changes are ~40 seconds.
Documentation: N/A