Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transform] Fail checkpoint on missing clusters #106793

Merged
merged 3 commits into from
Mar 27, 2024

Conversation

prwhelan
Copy link
Member

When there are no remote or local clusters for a given source index, we call the listener's onFailure method with a CheckpointException. A running transform will fail and retry, eventually moving into an unhealthy and failed state. Any call to the stats API will note the checkpoint failure and return.

This fixes a timeout issue calling the Transform stats API and prevents the Transform from being stuck in indexing.

Fix #106790
Fix #104533

When there are no remote or local clusters for a given source index, we
call the listener's `onFailure` method with a `CheckpointException`.
A running transform will fail and retry, eventually moving into an
unhealthy and failed state.  Any call to the stats API will note the
checkpoint failure and return.

This fixes a timeout issue calling the Transform stats API and prevents
the Transform from being stuck in indexing.

Fix elastic#106790
Fix elastic#104533
@prwhelan prwhelan added >bug :ml/Transform Transform Team:ML Meta label for the ML team v8.14.0 labels Mar 26, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Hi @prwhelan, I've created a changelog YAML for you.

Copy link
Contributor

@przemekwitek przemekwitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@prwhelan prwhelan merged commit eea8d1d into elastic:main Mar 27, 2024
14 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.13

prwhelan added a commit to prwhelan/elasticsearch that referenced this pull request Mar 27, 2024
When there are no remote or local clusters for a given source index, we
call the listener's `onFailure` method with a `CheckpointException`.
A running transform will fail and retry, eventually moving into an
unhealthy and failed state.  Any call to the stats API will note the
checkpoint failure and return.

This fixes a timeout issue calling the Transform stats API and prevents
the Transform from being stuck in indexing.

Fix elastic#106790
Fix elastic#104533
elasticsearchmachine pushed a commit that referenced this pull request Mar 27, 2024
When there are no remote or local clusters for a given source index, we
call the listener's `onFailure` method with a `CheckpointException`.
A running transform will fail and retry, eventually moving into an
unhealthy and failed state.  Any call to the stats API will note the
checkpoint failure and return.

This fixes a timeout issue calling the Transform stats API and prevents
the Transform from being stuck in indexing.

Fix #106790
Fix #104533
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml/Transform Transform Team:ML Meta label for the ML team v8.13.1 v8.14.0
Projects
None yet
3 participants