Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring GatedAutoCloseable and moving RecoveryState.Timer #2965

Merged
merged 7 commits into from
Apr 20, 2022

Conversation

kartg
Copy link
Member

@kartg kartg commented Apr 18, 2022

Signed-off-by: Kartik Ganesh gkart@amazon.com

Description

This PR is one of the first steps in merging the feature branch feature/segment-replication back into main by re-PRing key changes. It covers part of the code changes in #2926 . The breakdown of the plan to merge segment-replication to main is detailed in #2355

The changes included in this PR are as follows:

  1. GatedAutoCloseable currently wraps a subclass of RefCounted. Segment replication adds another subclass, but this also wraps RefCounted. Both subclasses have the same shutdown hook - decRef. Therefore, GatedAutoCloseable has been made less generic (and renamed to AutoCloseableRefCounted) to increase code convergence.
  2. The RecoveryState.Timer inner class has been moved to a top-level class. This will eventually be reused across both replication use-cases - peer recovery and segment replication.
  3. In the RecoveryTargetTests test class, the test methods for Timer classes have been separated into distinct unit tests for each subclass. Common test logic is shared via a private method. The use of RandomBoolean has been removed to avoid non-deterministic code paths in testing. While doing so, the usage of assertThat (deprecated) has been replaced in these code paths.
  4. Finally, a minor change removes the readRecoveryState API from RecoveryState, replacing it by a direct invocation of the constructor.

Segment replication design proposal - #2229

Issues Resolved

N/A

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

kartg added 5 commits April 18, 2022 14:55
This is a part of the process of merging our feature branch - feature/segment-replication - back into main by re-PRing our changes from the feature branch.
GatedAutoCloseable currently wraps a subclass of RefCounted. Segment replication adds another subclass, but this also wraps RefCounted. Both subclasses have the same shutdown hook - decRef. This change makes the superclass less generic to increase code convergence.

The breakdown of the plan to merge segment-replication to main is detailed in opensearch-project#2355
Segment replication design proposal - opensearch-project#2229

Signed-off-by: Kartik Ganesh <gkart@amazon.com>
This change makes two minor updates to RecoveryState -
1. The readRecoveryState API is removed because it can be replaced by an invocation of the constructor
2. The class members of the Timer inner class are changed to private, and accesses are only through the public APIs

Signed-off-by: Kartik Ganesh <gkart@amazon.com>
This change removes the use of RandomBoolean in testing the Timer classes and creates a dedicated unit test for each. The common test logic is shared via a private method.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>
This will eventually be reused across both replication use-cases - peer recovery and segment replication.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>
Removes a non-deterministic code path around stopping the timer, and avoids assertThat (deprecated)

Signed-off-by: Kartik Ganesh <gkart@amazon.com>
@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success e5d632b
Log 4602

Reports 4602

Copy link
Member

@andrross andrross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated reviews are unhappy. Maybe missing a trailing newline?

* A serializable timer that is used to measure the time taken for
* file replication operations like recovery.
*/
public class Timer implements Writeable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Timer" is a super generic name, used in lots of places (e.g. java.util.Timer, org.opensearch.search.profile.Timer, etc). Is there a more specific name that makes sense now that it is no longer an inner class? Maybe ReplicationTimer?

Signed-off-by: Kartik Ganesh <gkart@amazon.com>
@kartg kartg requested a review from reta as a code owner April 20, 2022 01:41
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure ee79dcc
Log 4633

Reports 4633

@kartg
Copy link
Member Author

kartg commented Apr 20, 2022

Failure due to a flaky assertion on a timer. I'm currently looking into how we can improve these tests.

org.opensearch.indices.recovery.RecoveryTargetTests > testIndexTimer FAILED
    java.lang.AssertionError: expected:<20> but was:<14>
        at __randomizedtesting.SeedInfo.seed([1AB462596BA6129E:29DE4F529849686D]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:647)
        at org.junit.Assert.assertEquals(Assert.java:633)
        at org.opensearch.indices.recovery.RecoveryTargetTests.doTimerTest(RecoveryTargetTests.java:180)
        at org.opensearch.indices.recovery.RecoveryTargetTests.testIndexTimer(RecoveryTargetTests.java:146)

Trying to serialize and deserialize a running Timer instance, and then checking for equality leads to flaky test failures when the ser/deser takes time.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 97fa82e
Log 4643

Reports 4643

@kartg
Copy link
Member Author

kartg commented Apr 20, 2022

Flaky test failure #1703

Refiring gradle check

@kartg
Copy link
Member Author

kartg commented Apr 20, 2022

start gradle check

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 97fa82e
Log 4650

Reports 4650

@dblock dblock requested a review from nknize April 20, 2022 21:15
Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks straightforward, thank you. Will leave it for @andrross to do another pass.

@dblock dblock removed the request for review from nknize April 20, 2022 21:18
@dblock dblock merged commit c7c410a into opensearch-project:main Apr 20, 2022
@dblock dblock added the backport 2.x Backport to 2.x branch label Apr 20, 2022
@dblock
Copy link
Member

dblock commented Apr 20, 2022

I labelled to backport to 2.x (2.1), nothing breaking here right?

opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 20, 2022
* Refactoring GatedAutoCloseable to AutoCloseableRefCounted

This is a part of the process of merging our feature branch - feature/segment-replication - back into main by re-PRing our changes from the feature branch.
GatedAutoCloseable currently wraps a subclass of RefCounted. Segment replication adds another subclass, but this also wraps RefCounted. Both subclasses have the same shutdown hook - decRef. This change makes the superclass less generic to increase code convergence.

The breakdown of the plan to merge segment-replication to main is detailed in #2355
Segment replication design proposal - #2229

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Minor refactoring in RecoveryState

This change makes two minor updates to RecoveryState -
1. The readRecoveryState API is removed because it can be replaced by an invocation of the constructor
2. The class members of the Timer inner class are changed to private, and accesses are only through the public APIs

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Update RecoveryTargetTests to test Timer subclasses deterministically

This change removes the use of RandomBoolean in testing the Timer classes and creates a dedicated unit test for each. The common test logic is shared via a private method.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Move the RecoveryState.Timer class to a top-level class

This will eventually be reused across both replication use-cases - peer recovery and segment replication.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Further update of timer tests in RecoveryTargetTests

Removes a non-deterministic code path around stopping the timer, and avoids assertThat (deprecated)

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Rename to ReplicationTimer

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Remove RecoveryTargetTests assert on a running timer

Trying to serialize and deserialize a running Timer instance, and then checking for equality leads to flaky test failures when the ser/deser takes time.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>
(cherry picked from commit c7c410a)
@kartg
Copy link
Member Author

kartg commented Apr 21, 2022

I labelled to backport to 2.x (2.1), nothing breaking here right?

Correct, nothing breaking; safe to backport to 2.x

kartg added a commit that referenced this pull request Apr 21, 2022
…#3014)

* Refactoring GatedAutoCloseable to AutoCloseableRefCounted

This is a part of the process of merging our feature branch - feature/segment-replication - back into main by re-PRing our changes from the feature branch.
GatedAutoCloseable currently wraps a subclass of RefCounted. Segment replication adds another subclass, but this also wraps RefCounted. Both subclasses have the same shutdown hook - decRef. This change makes the superclass less generic to increase code convergence.

The breakdown of the plan to merge segment-replication to main is detailed in #2355
Segment replication design proposal - #2229

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Minor refactoring in RecoveryState

This change makes two minor updates to RecoveryState -
1. The readRecoveryState API is removed because it can be replaced by an invocation of the constructor
2. The class members of the Timer inner class are changed to private, and accesses are only through the public APIs

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Update RecoveryTargetTests to test Timer subclasses deterministically

This change removes the use of RandomBoolean in testing the Timer classes and creates a dedicated unit test for each. The common test logic is shared via a private method.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Move the RecoveryState.Timer class to a top-level class

This will eventually be reused across both replication use-cases - peer recovery and segment replication.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Further update of timer tests in RecoveryTargetTests

Removes a non-deterministic code path around stopping the timer, and avoids assertThat (deprecated)

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Rename to ReplicationTimer

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Remove RecoveryTargetTests assert on a running timer

Trying to serialize and deserialize a running Timer instance, and then checking for equality leads to flaky test failures when the ser/deser takes time.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>
(cherry picked from commit c7c410a)

Co-authored-by: Kartik Ganesh <gkart@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants