Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/vcenterreceiver] Optimize vCenter VM network calls #32201

Merged

Conversation

Caleb-Hurshman
Copy link
Contributor

Description:

Changes the method for collecting VMs used by the vccenterreceiver to the more time-efficient CreateContainerView method. This is the first step to addressing the issue linked below.

Link to tracking Issue: #31837

Testing: These changes were tested on an environment with 200+ virtual machines. The original collection time was ~80 seconds. Collection times with these changes are ~40 seconds.

Documentation: N/A

@@ -89,6 +93,7 @@ func (vc *vcenterClient) Datacenters(ctx context.Context) ([]*object.Datacenter,
return datacenters, nil
}

// Computes returns the ComputeResources (Hosts & Clusters) of the vSphere SDK for a given datacenter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is my original comment, but I would modify the parenthesis to "(and ClusterComputeResources)"

component: vcenterreceiver

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: "Changes the method for collecting VMs used by the `vccenterreceiver` to the more time efficient `CreateContainerView` method."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would modify this a bit. really we are using a container view's "retrieve" method now

errs.AddPartial(1, err)
hostRef := vm.Summary.Runtime.Host
if hostRef == nil {
errs.AddPartial(1, errors.New("VM doesn't have a HostSystem"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually be more specific and include what VM has this issue.

@@ -349,12 +342,6 @@ func (v *vcenterMetricScraper) collectVMs(
}
}

hostname, err := vmHost.ObjectName(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor and wouldn't hold up PR: since hostname is references a few times, it's still probably worth it to make a variable out of it after creating the hwSum variable

Copy link
Contributor

@StefanKurek StefanKurek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see, there are fewer VMs being returned than before with this logic. Did you manually check this?

@StefanKurek
Copy link
Contributor

From what I can see, there are fewer VMs being returned than before with this logic. Did you manually check this?

@Caleb-Hurshman ignore this. I was looking at a wrong metric which only reported when VMs were active

@StefanKurek StefanKurek force-pushed the feat/faster-vcenter-vm-network-calls branch from 4c5e92e to 4c0c6e1 Compare April 10, 2024 04:48
@StefanKurek
Copy link
Contributor

StefanKurek commented Apr 10, 2024

With a few additional changes, collection times in our environment dropped from 90s to 23s-30s.

Rather than make a single call to get "shell" VM objects, and then making additional network calls for each VM's properties and performance metrics, we now are up front making a single network call for all VMs along with only the specific properties that we care about as well as a single network call for all of the performance metrics for all VMs. This data is stored and matched up as we cycle through each VM to build the various VM metrics/resources.

A follow up PR will be made to follow a similar pattern for Hosts/Clusters/Resource Pools, & Datastores

@StefanKurek StefanKurek force-pushed the feat/faster-vcenter-vm-network-calls branch 2 times, most recently from 7618c4c to 94c08f9 Compare April 10, 2024 05:46
Comment on lines 2639 to 2644
- key: vcenter.resource_pool.name
value:
stringValue: Resources
- key: vcenter.resource_pool.inventory_path
value:
stringValue: /Datacenter/host/Cluster/Resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a bug fix or other change here, in addition to the optimization? If not, why would the expected result change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djaglowski I think I would argue that I improved the tests here by having better XML responses being returned (which resulted in updated expected test results). Now the tests show the new resource pool attributes being added to VMs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining. Would it be possible to submit the changes to the mock server separately? I'm sure they're good changes but it would be helpful to validate the refactoring independently.

Copy link
Contributor

@StefanKurek StefanKurek Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djaglowski I can do this.

I do want to call out that the reason that these changes are there are because I think they pertain to the testing of this block of work (the VM code).

I made changes, the tests were broken.
So I started to update the tests, with new recordings from my new requests. But then I noticed that the expected results didn't seem to match what the code was doing (which was what I WAS expecting....if that makes any sense 😄 ).

Like I said, I can work around this and modify this PR to remove the expected results modifications. But it will make the tests in a less than optimal state IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the test changes work without the other changes in this PR? I was thinking that we would first update tests, then rebase this PR so it is only refactoring for performance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djaglowski let me give that a try. I think that should work and not be too difficult to implement. 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djaglowski rebased off the main which contained the merged in fixes from the previous PR

Caleb-Hurshman and others added 5 commits April 16, 2024 15:17
Makes a few small refactors for the vcenterreceiver scraper and metrics files.
Fixes unit tests.
Makes a single performance metrics network call for VMs rather than one per VM.
@StefanKurek StefanKurek force-pushed the feat/faster-vcenter-vm-network-calls branch 2 times, most recently from 013fc38 to 9ba5070 Compare April 16, 2024 21:55
@StefanKurek StefanKurek force-pushed the feat/faster-vcenter-vm-network-calls branch from 9ba5070 to ad9bd99 Compare April 16, 2024 21:58
@Caleb-Hurshman Caleb-Hurshman marked this pull request as ready for review April 17, 2024 12:57
@Caleb-Hurshman Caleb-Hurshman requested review from a team and mx-psi April 17, 2024 12:57
@djaglowski djaglowski merged commit 11d2988 into open-telemetry:main Apr 17, 2024
158 checks passed
@github-actions github-actions bot added this to the next release milestone Apr 17, 2024
@StefanKurek StefanKurek deleted the feat/faster-vcenter-vm-network-calls branch April 18, 2024 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants