Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate IO Usage Tracker to the Resource Usage Collector Service and Emit IO Usage Stats #11880

Merged
merged 3 commits into from
Mar 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,12 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Add events correlation engine plugin ([#6854](https://github.com/opensearch-project/OpenSearch/issues/6854))
- Implement on behalf of token passing for extensions ([#8679](https://github.com/opensearch-project/OpenSearch/pull/8679), [#10664](https://github.com/opensearch-project/OpenSearch/pull/10664))
- Provide service accounts tokens to extensions ([#9618](https://github.com/opensearch-project/OpenSearch/pull/9618))
- [AdmissionControl] Added changes for AdmissionControl Interceptor and AdmissionControlService for RateLimiting ([#9286](https://github.com/opensearch-project/OpenSearch/pull/9286))
- GHA to verify checklist items completion in PR descriptions ([#10800](https://github.com/opensearch-project/OpenSearch/pull/10800))
- Allow to pass the list settings through environment variables (like [], ["a", "b", "c"], ...) ([#10625](https://github.com/opensearch-project/OpenSearch/pull/10625))
- [Admission Control] Integrate CPU AC with ResourceUsageCollector and add CPU AC stats to nodes/stats ([#10887](https://github.com/opensearch-project/OpenSearch/pull/10887))
- [S3 Repository] Add setting to control connection count for sync client ([#12028](https://github.com/opensearch-project/OpenSearch/pull/12028))
- Views, simplify data access and manipulation by providing a virtual layer over one or more indices ([#11957](https://github.com/opensearch-project/OpenSearch/pull/11957))
- Add Remote Store Migration Experimental flag and allow mixed mode clusters under same ([#11986](https://github.com/opensearch-project/OpenSearch/pull/11986))
- [Admission Control] Integrate IO Usage Tracker to the Resource Usage Collector Service and Emit IO Usage Stats ([#11880](https://github.com/opensearch-project/OpenSearch/pull/11880))

### Dependencies
- Bump `log4j-core` from 2.18.0 to 2.19.0
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -667,6 +667,7 @@ public void apply(Settings value, Settings current, Settings previous) {
// Settings related to resource trackers
ResourceTrackerSettings.GLOBAL_CPU_USAGE_AC_WINDOW_DURATION_SETTING,
ResourceTrackerSettings.GLOBAL_JVM_USAGE_AC_WINDOW_DURATION_SETTING,
ResourceTrackerSettings.GLOBAL_IO_USAGE_AC_WINDOW_DURATION_SETTING,

// Settings related to Searchable Snapshots
Node.NODE_SEARCH_CACHE_SIZE_SETTING,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -448,9 +448,13 @@
return (currentIOTime - previousIOTime);
}

public String getDeviceName() {
return deviceName;
}

@Override
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
builder.field("device_name", deviceName);
builder.field("device_name", getDeviceName());

Check warning on line 457 in server/src/main/java/org/opensearch/monitor/fs/FsInfo.java

View check run for this annotation

Codecov / codecov/patch

server/src/main/java/org/opensearch/monitor/fs/FsInfo.java#L457

Added line #L457 was not covered by tests
builder.field(IoStats.OPERATIONS, operations());
builder.field(IoStats.READ_OPERATIONS, readOperations());
builder.field(IoStats.WRITE_OPERATIONS, writeOperations());
Expand Down
69 changes: 69 additions & 0 deletions server/src/main/java/org/opensearch/node/IoUsageStats.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
/*
* SPDX-License-Identifier: Apache-2.0
*
* The OpenSearch Contributors require contributions made to
* this file be licensed under the Apache-2.0 license or a
* compatible open source license.
*/

package org.opensearch.node;

import org.opensearch.core.common.io.stream.StreamInput;
import org.opensearch.core.common.io.stream.StreamOutput;
import org.opensearch.core.common.io.stream.Writeable;
import org.opensearch.core.xcontent.ToXContentFragment;
import org.opensearch.core.xcontent.XContentBuilder;

import java.io.IOException;
import java.util.Locale;

/**
* This class is to store tne IO Usage Stats and used to return in node stats API.
*/
public class IoUsageStats implements Writeable, ToXContentFragment {

private double ioUtilisationPercent;

public IoUsageStats(double ioUtilisationPercent) {
this.ioUtilisationPercent = ioUtilisationPercent;
}

/**
*
* @param in the stream to read from
* @throws IOException if an error occurs while reading from the StreamOutput
*/
public IoUsageStats(StreamInput in) throws IOException {
this.ioUtilisationPercent = in.readDouble();
}

/**
* Write this into the {@linkplain StreamOutput}.
*
* @param out the output stream to write entity content to
*/
@Override
public void writeTo(StreamOutput out) throws IOException {
out.writeDouble(this.ioUtilisationPercent);
}

public double getIoUtilisationPercent() {
return ioUtilisationPercent;
}

public void setIoUtilisationPercent(double ioUtilisationPercent) {
this.ioUtilisationPercent = ioUtilisationPercent;
}

@Override
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
builder.startObject();
builder.field("max_io_utilization_percent", String.format(Locale.ROOT, "%.1f", this.ioUtilisationPercent));
return builder.endObject();
}

@Override
public String toString() {
return "IO utilization percent: " + String.format(Locale.ROOT, "%.1f", this.ioUtilisationPercent);
}
}
1 change: 1 addition & 0 deletions server/src/main/java/org/opensearch/node/Node.java
Original file line number Diff line number Diff line change
Expand Up @@ -922,6 +922,7 @@ protected Node(
final RestController restController = actionModule.getRestController();

final NodeResourceUsageTracker nodeResourceUsageTracker = new NodeResourceUsageTracker(
monitorService.fsService(),
threadPool,
settings,
clusterService.getClusterSettings()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

package org.opensearch.node;

import org.opensearch.Version;
import org.opensearch.core.common.io.stream.StreamInput;
import org.opensearch.core.common.io.stream.StreamOutput;
import org.opensearch.core.common.io.stream.Writeable;
Expand All @@ -24,19 +25,32 @@
long timestamp;
double cpuUtilizationPercent;
double memoryUtilizationPercent;
private IoUsageStats ioUsageStats;

public NodeResourceUsageStats(String nodeId, long timestamp, double memoryUtilizationPercent, double cpuUtilizationPercent) {
public NodeResourceUsageStats(
String nodeId,
long timestamp,
double memoryUtilizationPercent,
double cpuUtilizationPercent,
IoUsageStats ioUsageStats
) {
this.nodeId = nodeId;
this.timestamp = timestamp;
this.cpuUtilizationPercent = cpuUtilizationPercent;
this.memoryUtilizationPercent = memoryUtilizationPercent;
this.ioUsageStats = ioUsageStats;
}

public NodeResourceUsageStats(StreamInput in) throws IOException {
this.nodeId = in.readString();
this.timestamp = in.readLong();
this.cpuUtilizationPercent = in.readDouble();
this.memoryUtilizationPercent = in.readDouble();
if (in.getVersion().onOrAfter(Version.V_3_0_0)) {
this.ioUsageStats = in.readOptionalWriteable(IoUsageStats::new);
} else {
this.ioUsageStats = null;

Check warning on line 52 in server/src/main/java/org/opensearch/node/NodeResourceUsageStats.java

View check run for this annotation

Codecov / codecov/patch

server/src/main/java/org/opensearch/node/NodeResourceUsageStats.java#L52

Added line #L52 was not covered by tests
}
}

@Override
Expand All @@ -45,15 +59,21 @@
out.writeLong(this.timestamp);
out.writeDouble(this.cpuUtilizationPercent);
out.writeDouble(this.memoryUtilizationPercent);
if (out.getVersion().onOrAfter(Version.V_3_0_0)) {
out.writeOptionalWriteable(this.ioUsageStats);
}
}

@Override
public String toString() {
StringBuilder sb = new StringBuilder("NodeResourceUsageStats[");
sb.append(nodeId).append("](");
sb.append("Timestamp: ").append(timestamp);
sb.append(", CPU utilization percent: ").append(String.format(Locale.ROOT, "%.1f", cpuUtilizationPercent));
sb.append(", Memory utilization percent: ").append(String.format(Locale.ROOT, "%.1f", memoryUtilizationPercent));
sb.append(", CPU utilization percent: ").append(String.format(Locale.ROOT, "%.1f", this.getCpuUtilizationPercent()));
sb.append(", Memory utilization percent: ").append(String.format(Locale.ROOT, "%.1f", this.getMemoryUtilizationPercent()));

Check warning on line 73 in server/src/main/java/org/opensearch/node/NodeResourceUsageStats.java

View check run for this annotation

Codecov / codecov/patch

server/src/main/java/org/opensearch/node/NodeResourceUsageStats.java#L72-L73

Added lines #L72 - L73 were not covered by tests
if (this.ioUsageStats != null) {
sb.append(", ").append(this.getIoUsageStats());

Check warning on line 75 in server/src/main/java/org/opensearch/node/NodeResourceUsageStats.java

View check run for this annotation

Codecov / codecov/patch

server/src/main/java/org/opensearch/node/NodeResourceUsageStats.java#L75

Added line #L75 was not covered by tests
}
sb.append(")");
return sb.toString();
}
Expand All @@ -63,7 +83,8 @@
nodeResourceUsageStats.nodeId,
nodeResourceUsageStats.timestamp,
nodeResourceUsageStats.memoryUtilizationPercent,
nodeResourceUsageStats.cpuUtilizationPercent
nodeResourceUsageStats.cpuUtilizationPercent,
nodeResourceUsageStats.ioUsageStats
);
}

Expand All @@ -75,6 +96,14 @@
return cpuUtilizationPercent;
}

public IoUsageStats getIoUsageStats() {
return ioUsageStats;
}

public void setIoUsageStats(IoUsageStats ioUsageStats) {
this.ioUsageStats = ioUsageStats;
}

public long getTimestamp() {
return timestamp;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,9 @@
"memory_utilization_percent",
String.format(Locale.ROOT, "%.1f", resourceUsageStats.memoryUtilizationPercent)
);
if (resourceUsageStats.getIoUsageStats() != null) {
builder.field("io_usage_stats", resourceUsageStats.getIoUsageStats());

Check warning on line 64 in server/src/main/java/org/opensearch/node/NodesResourceUsageStats.java

View check run for this annotation

Codecov / codecov/patch

server/src/main/java/org/opensearch/node/NodesResourceUsageStats.java#L64

Added line #L64 was not covered by tests
}
}
builder.endObject();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,14 +78,16 @@ public void collectNodeResourceUsageStats(
String nodeId,
long timestamp,
double memoryUtilizationPercent,
double cpuUtilizationPercent
double cpuUtilizationPercent,
IoUsageStats ioUsageStats
) {
nodeIdToResourceUsageStats.compute(nodeId, (id, resourceUsageStats) -> {
if (resourceUsageStats == null) {
return new NodeResourceUsageStats(nodeId, timestamp, memoryUtilizationPercent, cpuUtilizationPercent);
return new NodeResourceUsageStats(nodeId, timestamp, memoryUtilizationPercent, cpuUtilizationPercent, ioUsageStats);
} else {
resourceUsageStats.cpuUtilizationPercent = cpuUtilizationPercent;
resourceUsageStats.memoryUtilizationPercent = memoryUtilizationPercent;
resourceUsageStats.setIoUsageStats(ioUsageStats);
resourceUsageStats.timestamp = timestamp;
return resourceUsageStats;
}
Expand Down Expand Up @@ -129,7 +131,8 @@ private void collectLocalNodeResourceUsageStats() {
clusterService.state().nodes().getLocalNodeId(),
System.currentTimeMillis(),
nodeResourceUsageTracker.getMemoryUtilizationPercent(),
nodeResourceUsageTracker.getCpuUtilizationPercent()
nodeResourceUsageTracker.getCpuUtilizationPercent(),
nodeResourceUsageTracker.getIoUsageStats()
);
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,12 @@
public abstract class AbstractAverageUsageTracker extends AbstractLifecycleComponent {
private static final Logger LOGGER = LogManager.getLogger(AbstractAverageUsageTracker.class);

private final ThreadPool threadPool;
private final TimeValue pollingInterval;
protected final ThreadPool threadPool;
protected final TimeValue pollingInterval;
private TimeValue windowDuration;
private final AtomicReference<MovingAverage> observations = new AtomicReference<>();

private volatile Scheduler.Cancellable scheduledFuture;
protected volatile Scheduler.Cancellable scheduledFuture;

public AbstractAverageUsageTracker(ThreadPool threadPool, TimeValue pollingInterval, TimeValue windowDuration) {
this.threadPool = threadPool;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
/*
* SPDX-License-Identifier: Apache-2.0
*
* The OpenSearch Contributors require contributions made to
* this file be licensed under the Apache-2.0 license or a
* compatible open source license.
*/

package org.opensearch.node.resource.tracker;

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.apache.lucene.util.Constants;
import org.opensearch.common.ValidationException;
import org.opensearch.common.unit.TimeValue;
import org.opensearch.monitor.fs.FsInfo.DeviceStats;
import org.opensearch.monitor.fs.FsService;
import org.opensearch.node.IoUsageStats;
import org.opensearch.threadpool.ThreadPool;

import java.util.HashMap;
import java.util.Optional;

/**
* AverageIoUsageTracker tracks the IO usage by polling the FS Stats for IO metrics every (pollingInterval)
* and keeping track of the rolling average over a defined time window (windowDuration).
*/
public class AverageIoUsageTracker extends AbstractAverageUsageTracker {

private static final Logger LOGGER = LogManager.getLogger(AverageIoUsageTracker.class);
private final FsService fsService;
private final HashMap<String, Long> prevIoTimeDeviceMap;
private long prevTimeInMillis;
private IoUsageStats ioUsageStats;

public AverageIoUsageTracker(FsService fsService, ThreadPool threadPool, TimeValue pollingInterval, TimeValue windowDuration) {
super(threadPool, pollingInterval, windowDuration);
this.fsService = fsService;
this.prevIoTimeDeviceMap = new HashMap<>();
this.prevTimeInMillis = -1;
this.ioUsageStats = null;
}

/**
* Get current IO usage percentage calculated using fs stats
*/
@Override
public long getUsage() {
long usage = 0;
Optional<ValidationException> validationException = this.preValidateFsStats();
if (validationException != null && validationException.isPresent()) {
throw validationException.get();
}
// Currently even during the raid setup we have only one mount device and it is giving 0 io time from /proc/diskstats
DeviceStats[] devicesStats = fsService.stats().getIoStats().getDevicesStats();
long latestTimeInMillis = fsService.stats().getTimestamp();
ajaymovva marked this conversation as resolved.
Show resolved Hide resolved
for (DeviceStats devicesStat : devicesStats) {
long devicePreviousIoTime = prevIoTimeDeviceMap.getOrDefault(devicesStat.getDeviceName(), (long) -1);
long deviceCurrentIoTime = devicesStat.ioTimeInMillis();
if (prevTimeInMillis > 0 && (latestTimeInMillis - this.prevTimeInMillis > 0) && devicePreviousIoTime > 0) {
long absIoTime = (deviceCurrentIoTime - devicePreviousIoTime);
long deviceCurrentIoUsage = absIoTime * 100 / (latestTimeInMillis - this.prevTimeInMillis);
// We are returning the maximum IO Usage for all the attached devices
ajaymovva marked this conversation as resolved.
Show resolved Hide resolved
usage = Math.max(usage, deviceCurrentIoUsage);
}
prevIoTimeDeviceMap.put(devicesStat.getDeviceName(), devicesStat.ioTimeInMillis());
}
this.prevTimeInMillis = latestTimeInMillis;
return usage;
}

@Override
protected void doStart() {
if (Constants.LINUX) {
this.ioUsageStats = new IoUsageStats(-1);
scheduledFuture = threadPool.scheduleWithFixedDelay(() -> {
long usage = getUsage();
recordUsage(usage);
updateIoUsageStats();
}, pollingInterval, ThreadPool.Names.GENERIC);
}
}

public Optional<ValidationException> preValidateFsStats() {
ValidationException validationException = new ValidationException();
if (fsService == null
|| fsService.stats() == null
|| fsService.stats().getIoStats() == null
|| fsService.stats().getIoStats().getDevicesStats() == null) {
validationException.addValidationError("FSService IoStats Or DeviceStats are Missing");
}
return validationException.validationErrors().isEmpty() ? Optional.empty() : Optional.of(validationException);
}

private void updateIoUsageStats() {
this.ioUsageStats.setIoUtilisationPercent(this.isReady() ? this.getAverage() : -1);
}

public IoUsageStats getIoUsageStats() {
return this.ioUsageStats;
}
}
Loading
Loading