Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote Store] Adding Segment download stats to remotestore stats API #8440

Closed

Conversation

shourya035
Copy link
Member

@shourya035 shourya035 commented Jul 5, 2023

Description

  • Adding segment download stats to the _remotestore/stats API
  • Refactored API output to accommodate both segment and translog stats

Sample API outputs from running in local dev setup

Related Issues

Resolves #8395

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 5, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Jul 5, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Jul 5, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Jul 6, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Jul 6, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Jul 6, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Jul 6, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Jul 6, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Jul 6, 2023

Gradle Check (Jenkins) Run Completed with:

Copy link
Member

@ashking94 ashking94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move the segments download to a different java class. That would not bloat the IndexShard class more.

Comment on lines 4819 to 4830
if (!indexSettings.isRemoteStoreEnabled()) {
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this check required? The concerned code only runs for remote store enabled indexes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, the same copySegmentFiles method is being used during snapshot restores from remote store also. This is to handle any indices during snapshot restore which has an overriden index setting to disable RemoteStore. Will add a comment here regarding this.

if (!indexSettings.isRemoteStoreEnabled()) {
return;
}
downloadStatsTracker.incrementTotalDownloadsStarted();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used to track overall downloads and not file level. Lets fix this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

downloadStatsTracker.incrementTotalDownloadsStarted(); should be called only once per invocation of syncSegmentsFromRemoteSegmentStore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

return;
}
long currentTimeInNs = System.nanoTime();
downloadStatsTracker.updateLastDownloadTimestampMs(System.currentTimeMillis());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this file level or per sync level?

Copy link
Member

@ashking94 ashking94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move the segments download to a different java class. That would not bloat the IndexShard class more.

@ashking94
Copy link
Member

builder.endObject();
builder.endObject();
} else {
builder.startObject(SubFields.DOWNLOAD);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for primary shards we will show download field as {}.
Can we validate and put some of the existing API which does the same. We just need to make sure that user is well aware that this stats will be empty in case of primary shard and this should not lead to any issues in future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inline with the other APIs also, wherein null or 0 values are being shown as a simple {}. The shard routing section of the _cluster/state is an instance

.field(SubFields.FAILED, remoteSegmentShardStats.downloadBytesFailed);
builder.endObject();
builder.startObject(DownloadStatsFields.DOWNLOAD_SIZE_IN_BYTES)
.field(SubFields.LAST_SUCCESSFUL, remoteSegmentShardStats.lastSuccessfulSegmentDownloadBytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason to have last successfully downloaded segment file size? How does this benefit user from stats perspective?

remoteSegmentShardStats.localRefreshNumber - remoteSegmentShardStats.remoteRefreshNumber
)
.field(UploadStatsFields.BYTES_LAG, remoteSegmentShardStats.bytesLag)
.field(UploadStatsFields.BACKPRESSURE_REJECTION_COUNT, remoteSegmentShardStats.rejectionCount)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question : Will there be a scenario where we will have back pressure during download?
Is this something that we need or are we good, thinking from future perspective where n numbers of segments are getting restored or recovered which might lead to some delay or congestion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now we are not putting any backpressure on downloads. IMO the indices.recovery.max_bytes_per_sec setting is already there to enforce a check on downloads.

@shourya035 shourya035 force-pushed the segment-download-stats branch 2 times, most recently from 6d49924 to d826579 Compare July 11, 2023 10:44
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@shourya035 shourya035 force-pushed the segment-download-stats branch from 14d8b2e to f282d54 Compare July 11, 2023 13:07
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.cluster.allocation.AwarenessAllocationIT.testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.cluster.allocation.AwarenessAllocationIT.testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testPrimaryRelocation
      1 org.opensearch.http.SearchRestCancellationIT.testAutomaticCancellationDuringFetchPhase

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      2 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.classMethod
      1 org.opensearch.remotestore.multipart.RemoteStoreMultipartIT.testStaleCommitDeletionWithInvokeFlush
      1 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.testScrollWithConcurrentIndexAndSearch
      1 org.opensearch.remotestore.CreateRemoteIndexClusterDefaultDocRep.testRemoteStoreTranslogDisabledByUser

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testPitCreatedOnReplica
      1 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.testPressureServiceStats

@shourya035 shourya035 closed this Jul 13, 2023
@shourya035 shourya035 force-pushed the segment-download-stats branch 2 times, most recently from 9daeb74 to 2f4545a Compare July 13, 2023 11:09
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationIT.testPitCreatedOnReplica

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Remote Store] RFC - Adding segment download metrics to remotestore stats API
4 participants