-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Transform] Redirect transform actions to transform&remote_cluster_client node when needed #70125
[Transform] Redirect transform actions to transform&remote_cluster_client node when needed #70125
Conversation
339f04f
to
9b8f405
Compare
48a99ab
to
e87c8d7
Compare
25d75dd
to
f40a6ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main comment is about the clarity of when a remote_cluster_client
is needed.
The PR description currently says:
This PR makes a number of transform actions redirected to nodes with
transform
andremote_cluster_client
roles.
We shouldn't be forcing users who are not using CCS at all to have remote_cluster_client
nodes, and I don't think this PR does. But then the messaging needs to be clearer, for example:
This PR makes a number of transform actions redirect to nodes with the
transform
role, and, if required, theremote_cluster_client
role.
There are also some code comments that might confuse future maintainers for the same reason.
My other observation is that there might be a nasty problem in the backport, unless all the modified integration tests suites solely use REST clients.
...n/core/src/main/java/org/elasticsearch/xpack/core/common/validation/SourceDestValidator.java
Outdated
Show resolved
Hide resolved
nodes."javaRestTest-1".setting 'node.roles', '[data,ingest,master,transform]' | ||
nodes."javaRestTest-2".setting 'node.roles', '[data,ingest,master,transform,remote_cluster_client]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the transform
role here is fine for master, but might cause problems in 7.x. The reason is that 7.x still uses the transport client for half of integration tests that are not REST tests, and the transform client only understands roles that are defined in core Elasticsearch. I thought the fix would be simple, so tried it in #68226, and that fixed it for the external cluster tests (and end users too), but unfortunately there is a big problem with the internal cluster tests - #69693 (comment) has the details.
So you can leave the code like this in master but will need to plan what to do in the 7.x backport. If all the transform multi-node tests are REST tests then you might get away with what you've got here in 7.x. But make sure you run the backport PR CI at least 3 times to confirm the client randomisation doesn't sometimes cause problems. If some of the transform multi-node tests use the node/transport client then it will be necessary to do quite a lot more refactoring in the backport.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all multinode tests use ESRestTestCase
, it sounds like we are fine in this case as interaction happens using HLRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all multinode tests use ESRestTestCase, it sounds like we are fine in this case as interaction happens using HLRC.
Ok, I'll run the backport PR a few times to confirm this is the case.
/** | ||
* Select any node among provided nodes that satisfies all of the following: | ||
* - is a transform node | ||
* - is a remote cluster client node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't strictly true given the implementation of this method. It's only true because this method only gets called after determining that a remote cluster node is required.
nodeCanRunThisTransform()
only enforces remote cluster client nodes if the transform requires one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't strictly true given the implementation of this method. It's only true because this method only gets called after determining that a remote cluster node is required.
You're right. Knowing that remote_cluster_client
is required is a precondition to call this method.
I've changed wording to reflect that.
new ActionListenerResponseHandler<>(finalListener, Response::new)); | ||
} else { | ||
// There are no remote_cluster_client nodes in the cluster, fail | ||
finalListener.onFailure(ExceptionsHelper.badRequestException("No remote_cluster_client node to run on")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that the real reason could be that there are no transform nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's true. I updated the message.
Additionally, I extracted redirection to a separate method (TransformNodes.redirectToAnotherNodeIfNeeded
) in order to avoid code duplication.
nodes."mixed-cluster-0".setting 'node.roles', '[data,ingest,master,transform]' | ||
nodes."mixed-cluster-1".setting 'node.roles', '[data,ingest,master,transform,remote_cluster_client]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment to above about backporting complications. Do any tests in this suite use the node client rather than the REST client on master? If so those tests will randomly use either the node or transport client on 7.x and when they choose the transport client there will be problems with specifying the transform
role here in 7.x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inheritance path is: MultiClusterYamlTestSuiteIT
> ESClientYamlSuiteTestCase
> ESRestTestCase
which AFAIU (by what Hendrik said in the other comment) means that transport client will never be used so we should be fine here as well.
...n/core/src/main/java/org/elasticsearch/xpack/core/common/validation/SourceDestValidator.java
Outdated
Show resolved
Hide resolved
.../plugin/transform/src/main/java/org/elasticsearch/xpack/transform/action/TransformNodes.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some more comments. The change is trickier than originally thought.
public static class Request extends AcknowledgedRequest<Request> { | ||
|
||
private final TransformConfig config; | ||
private final boolean deferValidation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems confusing for a user to call _validate?defer_validation=true
This parameter originates from PUT and disables some validation parts which require data access. In the context of PUT, I still think it makes sense, but for _validate
it seems odd to me. We could:
- only rename it for
_validate
- rename it for PUT and
_validate
with BWC in PUT
I think we should do the 2nd option as its confusing to have different parameters. Naming is hard, maybe defer_data_validation
? This is rather long, however, this is an expert parameter, therefore it seems acceptable to me to have a long name.
However: For the purpose of this PR, lets not do the naming change now, its better to commit this with the old name and have a follow up PR which only does the naming change (+BWC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However: For the purpose of this PR, lets not do the naming change now, its better to commit this with the old name and have a follow up PR which only does the naming change (+BWC).
+1
Added a TODO
A side question though: Should we even expose this method via REST? My idea was to keep it as internal (transport-only) action used by other actions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no harm having _validate
as REST endpoint. It might be useful for the UI, calling the lightweight _validate
instead of _preview
after the data part has been finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's postpone making _validate
a REST endpoint. Although I agree it will be useful in the UI eventually, it's not going to be used in 7.13
.
.../java/org/elasticsearch/xpack/core/transform/transforms/TransformDestIndexSettingsTests.java
Show resolved
Hide resolved
nodes."javaRestTest-1".setting 'node.roles', '[data,ingest,master,transform]' | ||
nodes."javaRestTest-2".setting 'node.roles', '[data,ingest,master,transform,remote_cluster_client]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all multinode tests use ESRestTestCase
, it sounds like we are fine in this case as interaction happens using HLRC.
.../plugin/transform/src/main/java/org/elasticsearch/xpack/transform/action/TransformNodes.java
Outdated
Show resolved
Hide resolved
final TransformNodeAssignments transformNodeAssignments = TransformNodes.transformTaskNodes(hitsAndIds.v2(), state); | ||
|
||
if (hitsAndIds.v2().v2().stream().anyMatch(config -> config.getSource().requiresRemoteCluster()) | ||
&& (isRemoteClusterClientNode == false)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not necessary at this level:
if a transform is running, the call is redirected to the node that runs the transform to get live data from that task. If the transform requires remote client, the transform running node must be transform and remote client by definition.
But we hit a problem for collectStatsForTransformsWithoutTasks
: This is executed on this node.
transformNodeAssignments
tells you which transforms are waitingForAssignment
or stopped
. I would use your added selectAnyTransformRemoteNode
method to send a request only with non running transforms to an appropriate transform node with or without remote client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a transform is running, the call is redirected to the node that runs the transform to get live data from that task. If the transform requires remote client, the transform running node must be transform and remote client by definition.
Right, got it.
But we hit a problem for collectStatsForTransformsWithoutTasks: This is executed on this node.
Yes, this is exactly where the tests failed.
I would use your added selectAnyTransformRemoteNode method to send a request only with non running transforms to an appropriate transform node with or without remote client.
Should I only send the ids of waitingForAssignment
or stopped
too?
.../src/main/java/org/elasticsearch/xpack/transform/action/TransportPreviewTransformAction.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/elasticsearch/xpack/transform/action/TransportPreviewTransformAction.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/transform/action/TransportValidateTransformAction.java
Outdated
Show resolved
Hide resolved
ca539b8
to
ea6b917
Compare
...n/core/src/main/java/org/elasticsearch/xpack/core/common/validation/SourceDestValidator.java
Outdated
Show resolved
Hide resolved
.../plugin/transform/src/main/java/org/elasticsearch/xpack/transform/action/TransformNodes.java
Outdated
Show resolved
Hide resolved
/** | ||
* Select any node among provided nodes that satisfies all of the following: | ||
* - is a transform node | ||
* - is a remote cluster client node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't strictly true given the implementation of this method. It's only true because this method only gets called after determining that a remote cluster node is required.
You're right. Knowing that remote_cluster_client
is required is a precondition to call this method.
I've changed wording to reflect that.
.../plugin/transform/src/main/java/org/elasticsearch/xpack/transform/action/TransformNodes.java
Outdated
Show resolved
Hide resolved
@@ -39,25 +39,6 @@ teardown: | |||
username: "joe" | |||
ignore: 404 | |||
|
|||
--- | |||
"Search remote cluster": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: This is not a transform-specific test.
Multi-cluster search is covered in qa/multi-cluster-search/src/test/resources/rest-api-spec/test/multi_cluster/10_basic.yml
.../src/main/java/org/elasticsearch/xpack/transform/action/TransportPreviewTransformAction.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/transform/action/TransportValidateTransformAction.java
Outdated
Show resolved
Hide resolved
nodes."mixed-cluster-0".setting 'node.roles', '[data,ingest,master,transform]' | ||
nodes."mixed-cluster-1".setting 'node.roles', '[data,ingest,master,transform,remote_cluster_client]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inheritance path is: MultiClusterYamlTestSuiteIT
> ESClientYamlSuiteTestCase
> ESRestTestCase
which AFAIU (by what Hendrik said in the other comment) means that transport client will never be used so we should be fine here as well.
final TransformNodeAssignments transformNodeAssignments = TransformNodes.transformTaskNodes(hitsAndIds.v2(), state); | ||
|
||
if (hitsAndIds.v2().v2().stream().anyMatch(config -> config.getSource().requiresRemoteCluster()) | ||
&& (isRemoteClusterClientNode == false)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a transform is running, the call is redirected to the node that runs the transform to get live data from that task. If the transform requires remote client, the transform running node must be transform and remote client by definition.
Right, got it.
But we hit a problem for collectStatsForTransformsWithoutTasks: This is executed on this node.
Yes, this is exactly where the tests failed.
I would use your added selectAnyTransformRemoteNode method to send a request only with non running transforms to an appropriate transform node with or without remote client.
Should I only send the ids of waitingForAssignment
or stopped
too?
new ActionListenerResponseHandler<>(finalListener, Response::new)); | ||
} else { | ||
// There are no remote_cluster_client nodes in the cluster, fail | ||
finalListener.onFailure(ExceptionsHelper.badRequestException("No remote_cluster_client node to run on")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's true. I updated the message.
Additionally, I extracted redirection to a separate method (TransformNodes.redirectToAnotherNodeIfNeeded
) in order to avoid code duplication.
ea6b917
to
f5d0927
Compare
a3745fe
to
3b465c3
Compare
3b465c3
to
ca3147e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR makes a number of transform actions redirect to nodes with the
transform
role, and, if required, theremote_cluster_client
role.remote_cluster_client
role is required to access remote indices in case CCS (cross-cluster search) is used.The following actions are changed as part of this PR:
PreviewTransformAction
ValidateTransformAction
GetTransformStatsAction
Relates #72292