Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transform] add new exclude_generated flag to GET transform #63093

Merged

Conversation

benwtrent
Copy link
Member

@benwtrent benwtrent commented Sep 30, 2020

This adds a new flag exclude_generated for GET transform API.

This flag is useful for when a transform needs to be cloned within a cluster or exported/imported between clusters.

It removes certain fields that are not able to be set via the PUT api (e.g. version, create_time).

relates #63055

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml/Transform)

Copy link
Contributor

@przemekwitek przemekwitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 33 to 34
public static final String FOR_EXPORT = "for_export";
public static final String ALLOW_NO_MATCH = "allow_no_match";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static final String FOR_EXPORT = "for_export";
public static final String ALLOW_NO_MATCH = "allow_no_match";
public static final String ALLOW_NO_MATCH = "allow_no_match";
public static final String FOR_EXPORT = "for_export";

@hendrikmuhs
Copy link

Code look good, added 2 suggestions. I wonder about the end2end usage.

The export removes the id, I think this limits usability, you might want to store transform exported configs in VCS. Without the name this isn't practical.

The parser actually allows setting the id, however it must be equal to the one in the path. That means technically we could keep the id.

Therefore the bigger story seems incomplete to me, we need to define how import is going to work. If we e.g. allow PUT _transform/_import the id could be parsed from the body (we do not allow _ as prefix, that means _import would not be a breaking change).

However, single import isn't useful either, you likely want a bulk import that takes the exact output of _export.

@benwtrent
Copy link
Member Author

The parser actually allows setting the id, however it must be equal to the one in the path. That means technically we could keep the id.

Technically we could, but this defeats the idea of clone. Exporting and then easily importing in the same cluster seems like a vital use case here.

The configuration object itself shouldn't reference an ID. If the users want to store the config in a VCS, the file name itself should have the ID, not the enclosed object. If they are using some VCS system, a user is either copying and setting a file name anyways, or using some automation to do this.

Additionally, we already have a cluster backup mechanism. Snapshots should be used for cluster or index backup and restore.

we need to define how import is going to work.

We already have an import, it's PUT. The goal of this work is to make two situations easier:

  • Cloning a single transform in the same cluster
  • Getting a transform that can be easily moved to a new cluster

I would think that when experimenting with configurations in staging and then moving them to production, this is done one at a time.

_transforms/_import

I think having an _import and a PUT that effectively do the same thing would cause confusion for most users.

We could easily update PUT to not only take an object but also a {"transforms": []}, but I don't think that is necessary for now.

@benwtrent benwtrent requested a review from hendrikmuhs October 2, 2020 12:27
@andreidan andreidan added v7.11.0 and removed v7.10.0 labels Oct 7, 2020
@benwtrent benwtrent changed the title [Transform] add new for_export flag to GET transform [Transform] add new exclude_generated flag to GET transform Oct 19, 2020
@benwtrent benwtrent requested a review from davidkyle October 20, 2020 13:22
Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, 1 minor comment

if (params.paramAsBoolean(TransformField.EXCLUDE_GENERATED, false) == false) {
builder.field(QUERY.getPreferredName(), queryConfig);
} else if(queryConfig.equals(QueryConfig.matchAll()) == false) {
builder.field(QUERY.getPreferredName(), queryConfig);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could be done with an or and 1 if

@benwtrent benwtrent merged commit ebcac2d into elastic:master Oct 20, 2020
@benwtrent benwtrent deleted the feature/transform-add-for-export-flag branch October 20, 2020 15:38
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Oct 20, 2020
…63093)

This adds a new flag `exclude_generated` for GET transform API.

This flag is useful for when a transform needs to be cloned within a cluster or exported/imported between clusters.

It removes certain fields that are not able to be set via the PUT api (e.g. version, create_time).

relates elastic#63055
benwtrent added a commit that referenced this pull request Oct 20, 2020
…63947)

This adds a new flag `exclude_generated` for GET transform API.

This flag is useful for when a transform needs to be cloned within a cluster or exported/imported between clusters.

It removes certain fields that are not able to be set via the PUT api (e.g. version, create_time).

relates #63055
pugnascotia pushed a commit to pugnascotia/elasticsearch that referenced this pull request Oct 21, 2020
…63093)

This adds a new flag `exclude_generated` for GET transform API.

This flag is useful for when a transform needs to be cloned within a cluster or exported/imported between clusters.

It removes certain fields that are not able to be set via the PUT api (e.g. version, create_time).

relates elastic#63055
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants