Skip to content

Commit

Permalink
Fix exceptions in IntervalCalculation and ResultIndexingHandler (open…
Browse files Browse the repository at this point in the history
…search-project#1379)

* Fix race condition in PageListener

This PR
- Introduced an `AtomicInteger` called `pagesInFlight` to track the number of pages currently being processed. 
- Incremented `pagesInFlight` before processing each page and decremented it after processing is complete
- Adjusted the condition in `scheduleImputeHCTask` to check both `pagesInFlight.get() == 0` (all pages have been processed) and `sentOutPages.get() == receivedPages.get()` (all responses have been received) before scheduling the `imputeHC` task. 
- Removed the previous final check in `onResponse` that decided when to schedule `imputeHC`, relying instead on the updated counters for accurate synchronization.

These changes address the race condition where `sentOutPages` might not have been incremented in time before checking whether to schedule the `imputeHC` task. By accurately tracking the number of in-flight pages and sent pages, we ensure that `imputeHC` is executed only after all pages have been fully processed and all responses have been received.

Testing done:
1. Reproduced the race condition by starting two detectors with imputation. This causes an out of order illegal argument exception from RCF due to this race condition. Also verified the change fixed the problem.
2. added an IT for the above scenario.

Signed-off-by: Kaituo Li <kaituo@amazon.com>

* Fix exceptions in IntervalCalculation and ResultIndexingHandler

- **IntervalCalculation**: Prevent an `ArrayIndexOutOfBoundsException` by returning early when there are fewer than two timestamps. Previously, the code assumed at least two timestamps, causing an exception when only one was present.

- **ResultIndexingHandler**: Handle exceptions from asynchronous calls by logging error messages instead of throwing exceptions. Since the caller does not wait for these asynchronous operations, throwing exceptions had no effect and could lead to unhandled exceptions. Logging provides visibility without disrupting the caller's flow.

Testing done:
1. added UT and ITs.

Signed-off-by: Kaituo Li <kaituo@amazon.com>

---------

Signed-off-by: Kaituo Li <kaituo@amazon.com>
  • Loading branch information
kaituo committed Dec 5, 2024
1 parent c95c430 commit 6440d77
Show file tree
Hide file tree
Showing 12 changed files with 1,664 additions and 131 deletions.
3 changes: 0 additions & 3 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -699,9 +699,6 @@ List<String> jacocoExclusions = [

// TODO: add test coverage (kaituo)
'org.opensearch.forecast.*',
'org.opensearch.timeseries.transport.ResultBulkTransportAction',
'org.opensearch.timeseries.transport.handler.IndexMemoryPressureAwareResultHandler',
'org.opensearch.timeseries.transport.handler.ResultIndexingHandler',
'org.opensearch.timeseries.ml.Sample',
'org.opensearch.timeseries.ratelimit.FeatureRequest',
'org.opensearch.ad.transport.ADHCImputeNodeRequest',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Compatible with OpenSearch 2.18.0

### Bug Fixes
* Bump RCF Version and Fix Default Rules Bug in AnomalyDetector ([#1334](https://github.com/opensearch-project/anomaly-detection/pull/1334))
* Fix race condition in PageListener ([#1351](https://github.com/opensearch-project/anomaly-detection/pull/1351))

### Infrastructure
* forward port flaky test fix and add forecasting security tests ([#1329](https://github.com/opensearch-project/anomaly-detection/pull/1329))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ public ForecastResultBulkTransportAction(
}

@Override
protected BulkRequest prepareBulkRequest(float indexingPressurePercent, ForecastResultBulkRequest request) {
public BulkRequest prepareBulkRequest(float indexingPressurePercent, ForecastResultBulkRequest request) {
BulkRequest bulkRequest = new BulkRequest();
List<ForecastResultWriteRequest> results = request.getResults();

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -252,8 +252,9 @@ private void findMinimumInterval(LongBounds timeStampBounds, ActionListener<Inte
.createSearchRequest(new IntervalTimeConfiguration(1, ChronoUnit.MINUTES), timeStampBounds, topEntity);
final ActionListener<SearchResponse> searchResponseListener = ActionListener.wrap(response -> {
List<Long> timestamps = aggregationPrep.getTimestamps(response);
if (timestamps.isEmpty()) {
logger.warn("empty data, return one minute by default");
if (timestamps.size() < 2) {
// to calculate the difference we need at least 2 timestamps
logger.warn("not enough data, return one minute by default");
listener.onResponse(new IntervalTimeConfiguration(1, ChronoUnit.MINUTES));
return;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@
import org.opensearch.core.concurrency.OpenSearchRejectedExecutionException;
import org.opensearch.core.xcontent.XContentBuilder;
import org.opensearch.threadpool.ThreadPool;
import org.opensearch.timeseries.common.exception.EndRunException;
import org.opensearch.timeseries.common.exception.TimeSeriesException;
import org.opensearch.timeseries.indices.IndexManagement;
import org.opensearch.timeseries.indices.TimeSeriesIndex;
Expand Down Expand Up @@ -109,88 +108,78 @@ public void setFixedDoc(boolean fixedDoc) {
}

// TODO: check if user has permission to index.
public void index(ResultType toSave, String detectorId, String indexOrAliasName) {
try {
if (indexOrAliasName != null) {
if (indexUtils.checkIndicesBlocked(clusterService.state(), ClusterBlockLevel.WRITE, indexOrAliasName)) {
LOG.warn(String.format(Locale.ROOT, CANNOT_SAVE_ERR_MSG, detectorId));
return;
}
// We create custom result index when creating a detector. Custom result index can be rolled over and thus we may need to
// create a new one.
if (!timeSeriesIndices.doesIndexExist(indexOrAliasName) && !timeSeriesIndices.doesAliasExist(indexOrAliasName)) {
timeSeriesIndices.initCustomResultIndexDirectly(indexOrAliasName, ActionListener.wrap(response -> {
if (response.isAcknowledged()) {
save(toSave, detectorId, indexOrAliasName);
} else {
throw new TimeSeriesException(
detectorId,
/**
* Run async index operation. Cannot guarantee index is done after finishing executing the function as several calls
* in the method are asynchronous.
* @param toSave Result to save
* @param configId config id
* @param indexOrAliasName custom index or alias name
*/
public void index(ResultType toSave, String configId, String indexOrAliasName) {
if (indexOrAliasName != null) {
if (indexUtils.checkIndicesBlocked(clusterService.state(), ClusterBlockLevel.WRITE, indexOrAliasName)) {
LOG.warn(String.format(Locale.ROOT, CANNOT_SAVE_ERR_MSG, configId));
return;
}
// We create custom result index when creating a detector. Custom result index can be rolled over and thus we may need to
// create a new one.
if (!timeSeriesIndices.doesIndexExist(indexOrAliasName) && !timeSeriesIndices.doesAliasExist(indexOrAliasName)) {
timeSeriesIndices.initCustomResultIndexDirectly(indexOrAliasName, ActionListener.wrap(response -> {
if (response.isAcknowledged()) {
save(toSave, configId, indexOrAliasName);
} else {
LOG
.error(
String
.format(
Locale.ROOT,
"Creating custom result index %s with mappings call not acknowledged",
indexOrAliasName
)
);
}
}, exception -> {
if (ExceptionsHelper.unwrapCause(exception) instanceof ResourceAlreadyExistsException) {
// It is possible the index has been created while we sending the create request
save(toSave, detectorId, indexOrAliasName);
} else {
throw new TimeSeriesException(
detectorId,
String.format(Locale.ROOT, "cannot create result index %s", indexOrAliasName),
exception
);
}
}));
} else {
timeSeriesIndices.validateResultIndexMapping(indexOrAliasName, ActionListener.wrap(valid -> {
if (!valid) {
throw new EndRunException(detectorId, "wrong index mapping of custom AD result index", true);
} else {
save(toSave, detectorId, indexOrAliasName);
}
}, exception -> {
throw new TimeSeriesException(
detectorId,
String.format(Locale.ROOT, "cannot validate result index %s", indexOrAliasName),
exception
);
}));
}
}
}, exception -> {
if (ExceptionsHelper.unwrapCause(exception) instanceof ResourceAlreadyExistsException) {
// It is possible the index has been created while we sending the create request
save(toSave, configId, indexOrAliasName);
} else {
LOG.error(String.format(Locale.ROOT, "cannot create result index %s", indexOrAliasName), exception);
}
}));
} else {
if (indexUtils.checkIndicesBlocked(clusterService.state(), ClusterBlockLevel.WRITE, this.defaultResultIndexName)) {
LOG.warn(String.format(Locale.ROOT, CANNOT_SAVE_ERR_MSG, detectorId));
return;
}
if (!timeSeriesIndices.doesDefaultResultIndexExist()) {
timeSeriesIndices
.initDefaultResultIndexDirectly(
ActionListener.wrap(initResponse -> onCreateIndexResponse(initResponse, toSave, detectorId), exception -> {
if (ExceptionsHelper.unwrapCause(exception) instanceof ResourceAlreadyExistsException) {
// It is possible the index has been created while we sending the create request
save(toSave, detectorId);
} else {
throw new TimeSeriesException(
detectorId,
timeSeriesIndices.validateResultIndexMapping(indexOrAliasName, ActionListener.wrap(valid -> {
if (!valid) {
LOG.error("wrong index mapping of custom result index");
} else {
save(toSave, configId, indexOrAliasName);
}
}, exception -> { LOG.error(String.format(Locale.ROOT, "cannot validate result index %s", indexOrAliasName), exception); })
);
}
} else {
if (indexUtils.checkIndicesBlocked(clusterService.state(), ClusterBlockLevel.WRITE, this.defaultResultIndexName)) {
LOG.warn(String.format(Locale.ROOT, CANNOT_SAVE_ERR_MSG, configId));
return;
}
if (!timeSeriesIndices.doesDefaultResultIndexExist()) {
timeSeriesIndices
.initDefaultResultIndexDirectly(
ActionListener.wrap(initResponse -> onCreateIndexResponse(initResponse, toSave, configId), exception -> {
if (ExceptionsHelper.unwrapCause(exception) instanceof ResourceAlreadyExistsException) {
// It is possible the index has been created while we sending the create request
save(toSave, configId);
} else {
LOG
.error(
String.format(Locale.ROOT, "Unexpected error creating index %s", defaultResultIndexName),
exception
);
}
})
);
} else {
save(toSave, detectorId);
}
}
})
);
} else {
save(toSave, configId);
}
} catch (Exception e) {
throw new TimeSeriesException(
detectorId,
String.format(Locale.ROOT, "Error in saving %s for detector %s", defaultResultIndexName, detectorId),
e
);
}
}

Expand Down
Loading

0 comments on commit 6440d77

Please sign in to comment.