Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transform] Redirect transform actions to transform&remote_cluster_client node when needed #70125

Merged

Conversation

przemekwitek
Copy link
Contributor

@przemekwitek przemekwitek commented Mar 9, 2021

This PR makes a number of transform actions redirect to nodes with the transform role, and, if required, the remote_cluster_client role.

remote_cluster_client role is required to access remote indices in case CCS (cross-cluster search) is used.

The following actions are changed as part of this PR:

  1. PreviewTransformAction
  2. ValidateTransformAction
  3. GetTransformStatsAction

Relates #72292

@przemekwitek przemekwitek force-pushed the source_dest_validation_as_action branch from 339f04f to 9b8f405 Compare March 9, 2021 10:02
@przemekwitek przemekwitek changed the title [DRAFT] Introduce new internal action [DRAFT] Redirect _preview action when it needs remote_cluster_client role to execute Mar 10, 2021
@przemekwitek przemekwitek force-pushed the source_dest_validation_as_action branch 12 times, most recently from 48a99ab to e87c8d7 Compare March 17, 2021 09:26
@przemekwitek przemekwitek changed the title [DRAFT] Redirect _preview action when it needs remote_cluster_client role to execute [DRAFT] Redirect transform actions to remote_cluster_client node when needed Mar 17, 2021
@przemekwitek przemekwitek force-pushed the source_dest_validation_as_action branch 10 times, most recently from 25d75dd to f40a6ed Compare March 22, 2021 15:38
@przemekwitek przemekwitek changed the title [DRAFT] Redirect transform actions to remote_cluster_client node when needed Redirect transform actions to transform&remote_cluster_client node when needed Mar 22, 2021
@przemekwitek przemekwitek removed the WIP label Mar 22, 2021
@przemekwitek przemekwitek marked this pull request as ready for review March 22, 2021 15:41
@przemekwitek przemekwitek added the :ml/Transform Transform label Mar 22, 2021
Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main comment is about the clarity of when a remote_cluster_client is needed.

The PR description currently says:

This PR makes a number of transform actions redirected to nodes with transform and remote_cluster_client roles.

We shouldn't be forcing users who are not using CCS at all to have remote_cluster_client nodes, and I don't think this PR does. But then the messaging needs to be clearer, for example:

This PR makes a number of transform actions redirect to nodes with the transform role, and, if required, the remote_cluster_client role.

There are also some code comments that might confuse future maintainers for the same reason.

My other observation is that there might be a nasty problem in the backport, unless all the modified integration tests suites solely use REST clients.

Comment on lines +42 to +44
nodes."javaRestTest-1".setting 'node.roles', '[data,ingest,master,transform]'
nodes."javaRestTest-2".setting 'node.roles', '[data,ingest,master,transform,remote_cluster_client]'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the transform role here is fine for master, but might cause problems in 7.x. The reason is that 7.x still uses the transport client for half of integration tests that are not REST tests, and the transform client only understands roles that are defined in core Elasticsearch. I thought the fix would be simple, so tried it in #68226, and that fixed it for the external cluster tests (and end users too), but unfortunately there is a big problem with the internal cluster tests - #69693 (comment) has the details.

So you can leave the code like this in master but will need to plan what to do in the 7.x backport. If all the transform multi-node tests are REST tests then you might get away with what you've got here in 7.x. But make sure you run the backport PR CI at least 3 times to confirm the client randomisation doesn't sometimes cause problems. If some of the transform multi-node tests use the node/transport client then it will be necessary to do quite a lot more refactoring in the backport.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all multinode tests use ESRestTestCase, it sounds like we are fine in this case as interaction happens using HLRC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all multinode tests use ESRestTestCase, it sounds like we are fine in this case as interaction happens using HLRC.

Ok, I'll run the backport PR a few times to confirm this is the case.

/**
* Select any node among provided nodes that satisfies all of the following:
* - is a transform node
* - is a remote cluster client node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't strictly true given the implementation of this method. It's only true because this method only gets called after determining that a remote cluster node is required.

nodeCanRunThisTransform() only enforces remote cluster client nodes if the transform requires one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't strictly true given the implementation of this method. It's only true because this method only gets called after determining that a remote cluster node is required.

You're right. Knowing that remote_cluster_client is required is a precondition to call this method.
I've changed wording to reflect that.

new ActionListenerResponseHandler<>(finalListener, Response::new));
} else {
// There are no remote_cluster_client nodes in the cluster, fail
finalListener.onFailure(ExceptionsHelper.badRequestException("No remote_cluster_client node to run on"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the real reason could be that there are no transform nodes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's true. I updated the message.
Additionally, I extracted redirection to a separate method (TransformNodes.redirectToAnotherNodeIfNeeded) in order to avoid code duplication.

Comment on lines +36 to +37
nodes."mixed-cluster-0".setting 'node.roles', '[data,ingest,master,transform]'
nodes."mixed-cluster-1".setting 'node.roles', '[data,ingest,master,transform,remote_cluster_client]'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment to above about backporting complications. Do any tests in this suite use the node client rather than the REST client on master? If so those tests will randomly use either the node or transport client on 7.x and when they choose the transport client there will be problems with specifying the transform role here in 7.x.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inheritance path is: MultiClusterYamlTestSuiteIT > ESClientYamlSuiteTestCase > ESRestTestCase which AFAIU (by what Hendrik said in the other comment) means that transport client will never be used so we should be fine here as well.

Copy link

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more comments. The change is trickier than originally thought.

public static class Request extends AcknowledgedRequest<Request> {

private final TransformConfig config;
private final boolean deferValidation;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems confusing for a user to call _validate?defer_validation=true

This parameter originates from PUT and disables some validation parts which require data access. In the context of PUT, I still think it makes sense, but for _validate it seems odd to me. We could:

  • only rename it for _validate
  • rename it for PUT and _validate with BWC in PUT

I think we should do the 2nd option as its confusing to have different parameters. Naming is hard, maybe defer_data_validation? This is rather long, however, this is an expert parameter, therefore it seems acceptable to me to have a long name.

However: For the purpose of this PR, lets not do the naming change now, its better to commit this with the old name and have a follow up PR which only does the naming change (+BWC).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However: For the purpose of this PR, lets not do the naming change now, its better to commit this with the old name and have a follow up PR which only does the naming change (+BWC).

+1
Added a TODO

A side question though: Should we even expose this method via REST? My idea was to keep it as internal (transport-only) action used by other actions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no harm having _validate as REST endpoint. It might be useful for the UI, calling the lightweight _validate instead of _preview after the data part has been finished.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's postpone making _validate a REST endpoint. Although I agree it will be useful in the UI eventually, it's not going to be used in 7.13.

Comment on lines +42 to +44
nodes."javaRestTest-1".setting 'node.roles', '[data,ingest,master,transform]'
nodes."javaRestTest-2".setting 'node.roles', '[data,ingest,master,transform,remote_cluster_client]'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all multinode tests use ESRestTestCase, it sounds like we are fine in this case as interaction happens using HLRC.

final TransformNodeAssignments transformNodeAssignments = TransformNodes.transformTaskNodes(hitsAndIds.v2(), state);

if (hitsAndIds.v2().v2().stream().anyMatch(config -> config.getSource().requiresRemoteCluster())
&& (isRemoteClusterClientNode == false)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not necessary at this level:

if a transform is running, the call is redirected to the node that runs the transform to get live data from that task. If the transform requires remote client, the transform running node must be transform and remote client by definition.

But we hit a problem for collectStatsForTransformsWithoutTasks: This is executed on this node.

transformNodeAssignments tells you which transforms are waitingForAssignment or stopped. I would use your added selectAnyTransformRemoteNode method to send a request only with non running transforms to an appropriate transform node with or without remote client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a transform is running, the call is redirected to the node that runs the transform to get live data from that task. If the transform requires remote client, the transform running node must be transform and remote client by definition.

Right, got it.

But we hit a problem for collectStatsForTransformsWithoutTasks: This is executed on this node.

Yes, this is exactly where the tests failed.

I would use your added selectAnyTransformRemoteNode method to send a request only with non running transforms to an appropriate transform node with or without remote client.

Should I only send the ids of waitingForAssignment or stopped too?

@przemekwitek przemekwitek force-pushed the source_dest_validation_as_action branch 2 times, most recently from ca539b8 to ea6b917 Compare April 1, 2021 14:30
/**
* Select any node among provided nodes that satisfies all of the following:
* - is a transform node
* - is a remote cluster client node
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't strictly true given the implementation of this method. It's only true because this method only gets called after determining that a remote cluster node is required.

You're right. Knowing that remote_cluster_client is required is a precondition to call this method.
I've changed wording to reflect that.

@@ -39,25 +39,6 @@ teardown:
username: "joe"
ignore: 404

---
"Search remote cluster":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: This is not a transform-specific test.
Multi-cluster search is covered in qa/multi-cluster-search/src/test/resources/rest-api-spec/test/multi_cluster/10_basic.yml

Comment on lines +36 to +37
nodes."mixed-cluster-0".setting 'node.roles', '[data,ingest,master,transform]'
nodes."mixed-cluster-1".setting 'node.roles', '[data,ingest,master,transform,remote_cluster_client]'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inheritance path is: MultiClusterYamlTestSuiteIT > ESClientYamlSuiteTestCase > ESRestTestCase which AFAIU (by what Hendrik said in the other comment) means that transport client will never be used so we should be fine here as well.

final TransformNodeAssignments transformNodeAssignments = TransformNodes.transformTaskNodes(hitsAndIds.v2(), state);

if (hitsAndIds.v2().v2().stream().anyMatch(config -> config.getSource().requiresRemoteCluster())
&& (isRemoteClusterClientNode == false)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a transform is running, the call is redirected to the node that runs the transform to get live data from that task. If the transform requires remote client, the transform running node must be transform and remote client by definition.

Right, got it.

But we hit a problem for collectStatsForTransformsWithoutTasks: This is executed on this node.

Yes, this is exactly where the tests failed.

I would use your added selectAnyTransformRemoteNode method to send a request only with non running transforms to an appropriate transform node with or without remote client.

Should I only send the ids of waitingForAssignment or stopped too?

new ActionListenerResponseHandler<>(finalListener, Response::new));
} else {
// There are no remote_cluster_client nodes in the cluster, fail
finalListener.onFailure(ExceptionsHelper.badRequestException("No remote_cluster_client node to run on"));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's true. I updated the message.
Additionally, I extracted redirection to a separate method (TransformNodes.redirectToAnotherNodeIfNeeded) in order to avoid code duplication.

@przemekwitek przemekwitek force-pushed the source_dest_validation_as_action branch from ea6b917 to f5d0927 Compare April 2, 2021 10:39
@przemekwitek przemekwitek force-pushed the source_dest_validation_as_action branch 3 times, most recently from a3745fe to 3b465c3 Compare April 15, 2021 07:41
@przemekwitek przemekwitek force-pushed the source_dest_validation_as_action branch from 3b465c3 to ca3147e Compare April 19, 2021 10:23
Copy link

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@przemekwitek przemekwitek merged commit 319548c into elastic:master Apr 19, 2021
@przemekwitek przemekwitek deleted the source_dest_validation_as_action branch April 19, 2021 15:22
@przemekwitek przemekwitek changed the title Redirect transform actions to transform&remote_cluster_client node when needed [Transform] Redirect transform actions to transform&remote_cluster_client node when needed Apr 19, 2021
przemekwitek added a commit that referenced this pull request Apr 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants