Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metricbeat][vSphere] Support for configurable IntervalId for performance API #40678

Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
04ae955
initial commit for intervalId supports for performance metrics
kush-elastic Sep 3, 2024
2e5d6a1
Merge branch 'main' into 36-vsphere-support-for-configurable-interval…
kush-elastic Sep 4, 2024
831b070
Merge branch 'main' into 36-vsphere-support-for-configurable-interval…
kush-elastic Sep 5, 2024
1704e6e
update docs and fix CI
kush-elastic Sep 5, 2024
2e995ba
Add changelog entry
kush-elastic Sep 5, 2024
3727a3b
fix CI
kush-elastic Sep 5, 2024
416177c
resolve review comments
kush-elastic Sep 5, 2024
d2ac1b6
fix loggers
kush-elastic Sep 5, 2024
23fd9e4
resolved review comments
kush-elastic Sep 5, 2024
8da1d2a
update versions
kush-elastic Sep 6, 2024
ab175fd
update UTs
kush-elastic Sep 6, 2024
6b08b73
update integration tests
kush-elastic Sep 6, 2024
48e72b7
10s -> 20s
kush-elastic Sep 6, 2024
24bfb61
Merge branch 'main' into 36-vsphere-support-for-configurable-interval…
kush-elastic Sep 6, 2024
7b38b73
Update CHANGELOG.next.asciidoc
kush-elastic Sep 6, 2024
587aac6
Update metricbeat/docs/modules/vsphere.asciidoc
kush-elastic Sep 6, 2024
d5cd481
make update
kush-elastic Sep 6, 2024
0c4cb23
add recover for ToMetricSeries panic
kush-elastic Sep 6, 2024
7907f68
return error instead just logging it.
kush-elastic Sep 6, 2024
1806e5c
remove restriction of interval IDs
kush-elastic Sep 9, 2024
417703a
remove unnecessary validations
kush-elastic Sep 9, 2024
7fc5691
Merge branch 'main' into 36-vsphere-support-for-configurable-interval…
kush-elastic Sep 9, 2024
ad1c095
remove recover and add empty condition
kush-elastic Sep 9, 2024
c538f15
update changelog entry
kush-elastic Sep 9, 2024
0a156c8
Merge branch 'main' into 36-vsphere-support-for-configurable-interval…
kush-elastic Sep 9, 2024
9808c1e
Fix wrapping of errors in loggers
kush-elastic Sep 9, 2024
5a79756
update data.json
kush-elastic Sep 10, 2024
a2ee751
update data.json
kush-elastic Sep 10, 2024
09f540c
fix CI and loggers
kush-elastic Sep 10, 2024
d17fa70
Merge branch 'main' into 36-vsphere-support-for-configurable-interval…
kush-elastic Sep 10, 2024
4013033
update changelog entries
kush-elastic Sep 10, 2024
3edf243
make update
kush-elastic Sep 10, 2024
2c55bfe
Merge branch 'main' of https://github.com/kush-elastic/beats into 36-…
kush-elastic Sep 10, 2024
8f162b3
fix changelog entries
kush-elastic Sep 10, 2024
99be228
update changelog entry
kush-elastic Sep 10, 2024
1002544
Merge branch 'main' into 36-vsphere-support-for-configurable-interval…
kush-elastic Sep 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -298,6 +298,7 @@ https://github.com/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff]
- Disable event normalization for netflow input {pull}40635[40635]
- Allow attribute selection in the Active Directory entity analytics provider. {issue}40482[40482] {pull}40662[40662]
- Improve error quality when CEL program does not correctly return an events array. {pull}40580[40580]
- Update support for period based intervalID in vSphere host and datastore metric sets {pull}40678[40678]
kush-elastic marked this conversation as resolved.
Show resolved Hide resolved

*Auditbeat*

Expand Down
78 changes: 75 additions & 3 deletions metricbeat/docs/modules/vsphere.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,77 @@ This file is generated! See scripts/mage/docs_collector.go
[[metricbeat-module-vsphere]]
== vSphere module

The vSphere module uses the https://github.com/vmware/govmomi[Govmomi] library to collect metrics from any Vmware SDK URL (ESXi/VCenter). This library is built for and tested against ESXi and vCenter 5.5, 6.0 and 6.5.
The vSphere module uses the https://github.com/vmware/govmomi[Govmomi] library to collect metrics from any VMware SDK URL (ESXi/VCenter). This library is built for and tested against ESXi and vCenter 5.5, 6.0, and 6.5.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update the latest version of govmomi library here.

Copy link
Collaborator Author

@kush-elastic kush-elastic Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still stay true.
Default version is 6.5 that been used for testing as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken link

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have now tested our code with the customer vsphere Client which is version 7.0.3.
I think we can mention that testing has been done up to 7.0.3 version.
@kush-elastic

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, got it.


By default it enables the metricsets `cluster`, `datastore`, `datastorecluster`, `host`, `network`, `resourcepool` and `virtualmachine`.
By default, the vSphere module enables the following metric sets:
kush-elastic marked this conversation as resolved.
Show resolved Hide resolved

1. cluster

2. datastore

3. datastorecluster

4. host

5. network

6. resourcepool

7. virtualmachine

[float]
=== Supported Periods:
The Datastore and Host metric sets support performance data collection using the vSphere performance API. Since the performance API has usage restrictions based on data collection intervals, users should ensure that the period is configured optimally to receive real-time data. This configuration can be determined based on the https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.monitoring.doc/GUID-247646EA-A04B-411A-8DD4-62A3DCFCF49B.html[Data Collection Intervals] and https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.monitoring.doc/GUID-25800DE4-68E5-41CC-82D9-8811E27924BC.html[Data Collection Levels].

[IMPORTANT]

Only host and datastore metric sets have limitation of period restrictions.

[float]
==== Real-time data collection interval:
- 20
kush-elastic marked this conversation as resolved.
Show resolved Hide resolved

[float]
==== Historical data collection interval:
- 300
kush-elastic marked this conversation as resolved.
Show resolved Hide resolved
- 1800
- 7200
- 86400

[float]
=== Example:
If you need to configure multiple metric sets with different periods, you can achieve this by setting up multiple vSphere modules with different metric sets as demonstrated below:

[source,yaml]
----
- module: vsphere
metricsets:
- cluster
- datastorecluster
- network
- resourcepool
- virtualmachine
period: 10s
hosts: ["https://localhost/sdk"]
username: "user"
password: "password"
insecure: false

- module: vsphere
metricsets:
- datastore
- host
period: 300s
hosts: ["https://localhost/sdk"]
username: "user"
password: "password"
insecure: false
----

[float]
=== Dashboard

The vsphere module comes with a predefined dashboard. For example:
The vSphere module includes a predefined dashboard. For example:

image::./images/metricbeat_vsphere_dashboard.png[]
image::./images/metricbeat_vsphere_vm_dashboard.png[]
Expand All @@ -36,7 +99,16 @@ metricbeat.modules:
- module: vsphere
enabled: true
metricsets: ["cluster", "datastore", "datastorecluster", "host", "network", "resourcepool", "virtualmachine"]

# Real-time data collection – An ESXi Server collects data for each performance counter every 20 seconds.
# Supported Periods:
# The Datastore and Host metric sets support performance data collection using the vSphere performance API.
# Since the performance API has usage restrictions based on data collection intervals,
# users should ensure that the period is configured optimally to receive real-time data.
# This configuration can be determined based on the Data Collection Intervals and Data Collection Levels.
# Reference Links:
# Data Collection Intervals: https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.monitoring.doc/GUID-247646EA-A04B-411A-8DD4-62A3DCFCF49B.html
# Data Collection Levels: https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.monitoring.doc/GUID-25800DE4-68E5-41CC-82D9-8811E27924BC.html
period: 20s
hosts: ["https://localhost/sdk"]

Expand Down
9 changes: 9 additions & 0 deletions metricbeat/metricbeat.reference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1008,7 +1008,16 @@ metricbeat.modules:
- module: vsphere
enabled: true
metricsets: ["cluster", "datastore", "datastorecluster", "host", "network", "resourcepool", "virtualmachine"]

# Real-time data collection – An ESXi Server collects data for each performance counter every 20 seconds.
# Supported Periods:
# The Datastore and Host metric sets support performance data collection using the vSphere performance API.
# Since the performance API has usage restrictions based on data collection intervals,
# users should ensure that the period is configured optimally to receive real-time data.
# This configuration can be determined based on the Data Collection Intervals and Data Collection Levels.
# Reference Links:
# Data Collection Intervals: https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.monitoring.doc/GUID-247646EA-A04B-411A-8DD4-62A3DCFCF49B.html
# Data Collection Levels: https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.monitoring.doc/GUID-25800DE4-68E5-41CC-82D9-8811E27924BC.html
period: 20s
hosts: ["https://localhost/sdk"]

Expand Down
9 changes: 9 additions & 0 deletions metricbeat/module/vsphere/_meta/config.reference.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,16 @@
- module: vsphere
enabled: true
metricsets: ["cluster", "datastore", "datastorecluster", "host", "network", "resourcepool", "virtualmachine"]

# Real-time data collection – An ESXi Server collects data for each performance counter every 20 seconds.
# Supported Periods:
# The Datastore and Host metric sets support performance data collection using the vSphere performance API.
# Since the performance API has usage restrictions based on data collection intervals,
# users should ensure that the period is configured optimally to receive real-time data.
# This configuration can be determined based on the Data Collection Intervals and Data Collection Levels.
# Reference Links:
# Data Collection Intervals: https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.monitoring.doc/GUID-247646EA-A04B-411A-8DD4-62A3DCFCF49B.html
# Data Collection Levels: https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.monitoring.doc/GUID-25800DE4-68E5-41CC-82D9-8811E27924BC.html
period: 20s
hosts: ["https://localhost/sdk"]

Expand Down
9 changes: 9 additions & 0 deletions metricbeat/module/vsphere/_meta/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,16 @@
# - network
# - resourcepool
# - virtualmachine

# Real-time data collection – An ESXi Server collects data for each performance counter every 20 seconds.
# Supported Periods:
# The Datastore and Host metric sets support performance data collection using the vSphere performance API.
# Since the performance API has usage restrictions based on data collection intervals,
# users should ensure that the period is configured optimally to receive real-time data.
# This configuration can be determined based on the Data Collection Intervals and Data Collection Levels.
# Reference Links:
# Data Collection Intervals: https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.monitoring.doc/GUID-247646EA-A04B-411A-8DD4-62A3DCFCF49B.html
# Data Collection Levels: https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.monitoring.doc/GUID-25800DE4-68E5-41CC-82D9-8811E27924BC.html
period: 20s
hosts: ["https://localhost/sdk"]

Expand Down
69 changes: 66 additions & 3 deletions metricbeat/module/vsphere/_meta/docs.asciidoc
Original file line number Diff line number Diff line change
@@ -1,11 +1,74 @@
The vSphere module uses the https://github.com/vmware/govmomi[Govmomi] library to collect metrics from any Vmware SDK URL (ESXi/VCenter). This library is built for and tested against ESXi and vCenter 5.5, 6.0 and 6.5.
The vSphere module uses the https://github.com/vmware/govmomi[Govmomi] library to collect metrics from any VMware SDK URL (ESXi/VCenter). This library is built for and tested against ESXi and vCenter 5.5, 6.0, and 6.5.

By default it enables the metricsets `cluster`, `datastore`, `datastorecluster`, `host`, `network`, `resourcepool` and `virtualmachine`.
By default, the vSphere module enables the following metric sets:

1. cluster

2. datastore

3. datastorecluster

4. host

5. network

6. resourcepool

7. virtualmachine

[float]
=== Supported Periods:
The Datastore and Host metric sets support performance data collection using the vSphere performance API. Since the performance API has usage restrictions based on data collection intervals, users should ensure that the period is configured optimally to receive real-time data. This configuration can be determined based on the https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.monitoring.doc/GUID-247646EA-A04B-411A-8DD4-62A3DCFCF49B.html[Data Collection Intervals] and https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.monitoring.doc/GUID-25800DE4-68E5-41CC-82D9-8811E27924BC.html[Data Collection Levels].

[IMPORTANT]

Only host and datastore metric sets have limitation of period restrictions.

[float]
==== Real-time data collection interval:
- 20

[float]
==== Historical data collection interval:
- 300
- 1800
- 7200
- 86400

[float]
=== Example:
If you need to configure multiple metric sets with different periods, you can achieve this by setting up multiple vSphere modules with different metric sets as demonstrated below:

[source,yaml]
----
- module: vsphere
metricsets:
- cluster
- datastorecluster
- network
- resourcepool
- virtualmachine
period: 10s
hosts: ["https://localhost/sdk"]
username: "user"
password: "password"
insecure: false

- module: vsphere
metricsets:
- datastore
- host
period: 300s
hosts: ["https://localhost/sdk"]
username: "user"
password: "password"
insecure: false
----

[float]
=== Dashboard

The vsphere module comes with a predefined dashboard. For example:
The vSphere module includes a predefined dashboard. For example:

image::./images/metricbeat_vsphere_dashboard.png[]
image::./images/metricbeat_vsphere_vm_dashboard.png[]
99 changes: 67 additions & 32 deletions metricbeat/module/vsphere/datastore/datastore.go
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,12 @@ func (m *DataStoreMetricSet) Fetch(ctx context.Context, reporter mb.ReporterV2)
ctx, cancel := context.WithCancel(ctx)
defer cancel()

period := m.Module().Config().Period
if !isValidPeriod(period.Seconds()) {
m.Logger().Errorf("Invalid period %v. Please provide one of the following values: 20, 300, 1800, 7200, 86400", period)
return nil
}

client, err := govmomi.NewClient(ctx, m.HostURL, m.Insecure)
if err != nil {
return fmt.Errorf("error in NewClient: %w", err)
Expand Down Expand Up @@ -144,38 +150,9 @@ func (m *DataStoreMetricSet) Fetch(ctx context.Context, reporter mb.ReporterV2)
continue
kush-elastic marked this conversation as resolved.
Show resolved Hide resolved
}

spec := types.PerfQuerySpec{
Entity: dst[i].Reference(),
MetricId: metricIds,
MaxSample: 1,
IntervalId: 20, // right now we are only grabbing real time metrics from the performance manager
}

// Query performance data
samples, err := perfManager.Query(ctx, []types.PerfQuerySpec{spec})
if err != nil {
m.Logger().Debugf("Failed to query performance data for host %s: %v", dst[i].Name, err)
continue
}

if len(samples) == 0 {
m.Logger().Debugf("No samples returned from performance manager")
continue
}

results, err := perfManager.ToMetricSeries(ctx, samples)
metricMap, err := m.getPerfMetrics(ctx, perfManager, dst[i], metricIds)
if err != nil {
m.Logger().Debugf("Failed to query performance data to metric series for host %s: %v", dst[i].Name, err)
continue
kush-elastic marked this conversation as resolved.
Show resolved Hide resolved
}

metricMap := make(map[string]interface{})
for _, result := range results[0].Value {
if len(result.Value) > 0 {
metricMap[result.Name] = result.Value[0]
continue
}
m.Logger().Debugf("For host %s,Metric %v: No result found", dst[i].Name, result.Name)
m.Logger().Errorf("Failed to retrieve performance metrics from host %s: %w", dst[i].Name, err)
}

reporter.Event(mb.Event{
Expand All @@ -191,7 +168,6 @@ func (m *DataStoreMetricSet) Fetch(ctx context.Context, reporter mb.ReporterV2)
}

func getAssetNames(ctx context.Context, pc *property.Collector, ds *mo.Datastore) (*assetNames, error) {

outputVmNames := make([]string, 0, len(ds.Vm))
if len(ds.Vm) > 0 {
var objects []mo.ManagedEntity
Expand Down Expand Up @@ -235,3 +211,62 @@ func getAssetNames(ctx context.Context, pc *property.Collector, ds *mo.Datastore
outputVmNames: outputVmNames,
}, nil
}

func (m *DataStoreMetricSet) getPerfMetrics(ctx context.Context, perfManager *performance.Manager, dst mo.Datastore, metricIds []types.PerfMetricId) (map[string]interface{}, error) {
metricMap := make(map[string]interface{})
summary, err := perfManager.ProviderSummary(ctx, dst.Reference())
if err != nil {
return metricMap, fmt.Errorf("failed to get summary: %w", err)
}

var refreshRate = int32(m.Module().Config().Period.Seconds())
if summary.CurrentSupported {
refreshRate = summary.RefreshRate
if int32(m.Module().Config().Period.Seconds()) != refreshRate {
m.Logger().Warnf("User-provided period %v does not match system's refresh rate %v. Risk of data duplication. Consider adjusting period.", m.Module().Config().Period, refreshRate)
}
devamanv marked this conversation as resolved.
Show resolved Hide resolved
} else {
m.Logger().Warnf("Live data collection not supported. Use one of the system's historical interval (300, 1800, 7200, 86400). Risk of data duplication. Consider adjusting period.")
}

spec := types.PerfQuerySpec{
Entity: dst.Reference(),
MetricId: metricIds,
MaxSample: 1,
IntervalId: refreshRate, // using refreshRate as interval
}

// Query performance data
samples, err := perfManager.Query(ctx, []types.PerfQuerySpec{spec})
if err != nil {
return metricMap, fmt.Errorf("failed to query performance data: %w", err)
}

if len(samples) == 0 {
m.Logger().Debug("No samples returned from performance manager")
return metricMap, nil
}

results, err := perfManager.ToMetricSeries(ctx, samples)
if err != nil {
m.Logger().Errorf("failed to convert performance data to metric series: %v", err)
}

for _, result := range results[0].Value {
if len(result.Value) > 0 {
metricMap[result.Name] = result.Value[0]
continue
}
m.Logger().Debugf("For datastore %s, Metric %v: No result found", dst.Name, result.Name)
}

return metricMap, nil
}

func isValidPeriod(period float64) bool {
switch period {
case 20, 300, 1800, 7200, 86400:
return true
}
return false
}
Loading
Loading