Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ILM: add searchable snapshot action #52585

Conversation

andreidan
Copy link
Contributor

@andreidan andreidan commented Feb 20, 2020

Add ILM support for searchable snapshots in the cold phase. Searchable snapshots are part of the lazy snapshot restores effort that's being tracked here

The API for configuring creating a searchable snapshot is:

"cold": {
  "searchable_snapshot" : { 
    "snapshot_repository" : "snapshotRepositoryName"
  }
}

@andreidan andreidan added the :Data Management/ILM+SLM Index and Snapshot lifecycle management label Feb 20, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/ILM+SLM)

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @andreidan, I know it's still a WIP, but I left a bunch of comments!

Comment on lines 33 to 34
Objects.nonNull(indexPrefix);
Objects.nonNull(settingsKeys);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These need to be Objects.requireNonNull, currently it just returns true or false but doesn't throw an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch (I hadn't even acknowledged the existence of an API that just does obj != null - TIL :) )

settings.put(key, value);
}

UpdateSettingsRequest updateSettingsRequest = new UpdateSettingsRequest(indexName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we do this as an AsyncActionStep, but should we rather do this as a ClusterStateActionStep to avoid having to actually issue the request and wait for it? I believe it would be a bit better then, and it would also allow us to copy settings that may not be settable (if we need that functionality in the future)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting. I agree it'd be a better avenue

"the ilm policy but has " + lifecycleState.getSnapshotName();
LifecycleExecutionState.Builder newCustomData = LifecycleExecutionState.builder(lifecycleState);
String policy = indexMetaData.getSettings().get(LifecycleSettings.LIFECYCLE_NAME);
String snapshotName = generateSnapshotName(generateSnapshotName("<{now/M}-" + index.getName() + "-" + policy + ">"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we generating the snapshot name twice here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we think {now/M} is enough granularity? We could probably go with /d to be a bit more granular?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this should use the same validation that SnapshotLifecyclePolicy.validate() does for validating things (for example, no # in the name). We should factor that validation out into a method and use it here.

getClient().admin().indices().aliases(aliasesRequest, new ActionListener<>() {

@Override
public void onResponse(AcknowledgedResponse response) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check that the request was actually acknowledged and throw an exception if it wasn't so that it can be retried


import static org.elasticsearch.xpack.core.ilm.LifecycleExecutionState.fromIndexMetadata;

public class TakeSnapshotStep extends AsyncActionStep {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better name might be CreateSnapshotStep

Comment on lines 52 to 53
// TODO should we expose this?
request.includeGlobalState(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should expose this (meaning to hardcode it as false like you have), these are specific snapshots used only for searchable snapshots

Comment on lines 56 to 57
public void onResponse(CreateSnapshotResponse createSnapshotResponse) {
listener.onResponse(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably check the status() of the response, it's possible it could be a failure in which case I think we could retry again immediately?

/**
* Optionally derives the index name using the provided prefix (if any) and waits for the status of the index to be GREEN.
*/
public class WaitForGreenIndexHealthStep extends ClusterStateWaitStep {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could refactor the users of this class to use WaitForIndexColorStep and then we wouldn't need to add this class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally, thanks for the tip. I didn't know of its existence

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ui (:ES-UI)

@cjcenizal
Copy link
Contributor

@andreidan Would you mind adding some details to the PR description? The ES UI team would be particularly interested in changes to the API and an explanation of the use cases for new features.

- renamed TakeSnapshotStep to CreateSnapshotStep
- made the Create and RestoreSnapshotStep retry-able using the
AsyncRetryDuringSnapshotActionStep infrastructure
- CreateSnapshotStep checks the response status for internal server
error and fails the step in this case so we retry before we go to the
next steps that’ll discover the snapshot was not created successfully
- CleanupSnapshotStep reports missing repository in a more detailed way
- CopySettingsStep copies the settings using the cluster state directly
- GenerateSnapshotNameStep validates the snapshot name
This redesign is meant to allow the client to specify that
it wants to stop waiting and move to the “condition unfulfilled”
side of the branch.
@andreidan
Copy link
Contributor Author

@elasticmachine update branch

andreidan and others added 6 commits March 25, 2020 21:20
This speeds up the test a bit and also avoids a race condition where the
explain API will not get a response as the ILM is busy executing cluster
action steps for the restored index.
@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay whew, sorry about the delay on this Andrei, I left a few more really minor comments, I think the only thing that's major is the null check boolean flipping in WaitForSnapshotInProgressStep.java

Comment on lines 75 to 77
if (settingsKeys == null || settingsKeys.length == 0) {
return clusterState;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super minor, but we can probably stick this before the targetIndexMetadata == null check to avoid an error where no settings are copied?

public DeleteAction() {
this(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth having a separate discussion (I can't remember if we discussed it) with David & Co about whether we want the default for this to be true or false (can be had after this PR)

Comment on lines 77 to 78
return new Result(false, new Info(String.format(Locale.ROOT,
"snapshot [%s] generated by policy [%s] for index [%s] is still in progress", snapshotName, policyName, indexName)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think we need to return true here, it's possible that the snapshot was taken and completed (prior to the step being updated), and then the cluster was restarted, so the SnapshotsInProgress is null and won't ever exist because the snapshot was successful, best to err on the side of moving forward and doing a check than be wedged I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted the WaitForSnapshotInProgressStep altogether as we'll design a step that waits for a generation change in the repo metadata in a subsequent PR.

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

elasticmachine and others added 7 commits March 30, 2020 11:53
Co-Authored-By: Lee Hinman <dakrone@users.noreply.github.com>
Co-Authored-By: Lee Hinman <dakrone@users.noreply.github.com>
We'll add a new way to save snapshot status api calls based on the
repository data generation in a future PR.
@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan andreidan requested a review from dakrone March 31, 2020 16:53
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I left two really minor typo change comments, thanks for iterating so much on this @andreidan!

* newly created searchable snapshot backed index.
*/
public class SearchableSnapshotAction implements LifecycleAction {
public static final String NAME = "searchable_snapshot";
Copy link
Member

@dakrone dakrone Mar 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one may have been lost in some of the other reviews, but I still think it should be changed (to searchable-snapshot)

import static org.elasticsearch.xpack.core.ilm.SearchableSnapshotAction.getCheckSnapshotStatusAsyncAction;
import static org.hamcrest.Matchers.is;

public class SearchableSnaposhotActionTests extends AbstractActionTestCase<SearchableSnapshotAction> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super minor, but there's a typo here (and in the name of the file):
SearchableSnaposhotActionTests vs
SearchableSnapshotActionTests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it was hard to spot the typo, great eyes, thanks @dakrone !

Regarding the action name, I renamed it to "searchable_snapshot" (lowercase) on purpose when we renamed "snapshot-repository" to "snapshot_repository" as I thought it was confusing to have a configuration that mixes scores and underscores eg.:

  "searchable-snapshot" : { 
    "snapshot_repository" : "snapshotRepositoryName"
  }

I using underscores everywhere employs less cognitive load:

  "searchable_snapshot" : { 
    "snapshot_repository" : "snapshotRepositoryName"
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other "composed" action we have is set_priority so it would be inconsistent to use scores. If you don't have a strong opinion on this, I think I'd prefer we use underscores everywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the other one as the step name rather than the action name, I agree that the action name should use underscores (like our other ones), sorry about that!

@andreidan andreidan merged commit a5c7bec into elastic:feature/searchable-snapshots Apr 1, 2020
@andreidan
Copy link
Contributor Author

Thanks for reviewing this rather huge chunk of functionality Lee

@cjcenizal cjcenizal added the Team:Deployment Management Meta label for Management Experience - Deployment Management team label Jun 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >feature Team:Deployment Management Meta label for Management Experience - Deployment Management team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants