Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Prevent job creation/opening if .ml indices cannot be created #34430

Closed
droberts195 opened this issue Oct 13, 2018 · 6 comments
Closed

[ML] Prevent job creation/opening if .ml indices cannot be created #34430

droberts195 opened this issue Oct 13, 2018 · 6 comments
Labels
>bug :ml Machine learning

Comments

@droberts195
Copy link
Contributor

In https://www.elastic.co/guide/en/elasticsearch/reference/current/zip-targz.html#zip-targz-enable-indices we document that auto-creation must be permitted for .ml* indices even if auto-creation of indices is disabled in general. Unfortunately, if a user does not follow the guidance then the resulting failure is very hard to diagnose.

We should explicitly check that auto-creation of .ml* indices is permitted at a few key points where we have the ability to fail early and report a clear error message. If we wait until results and/or state cannot be written then we cannot report the problem in a REST response.

@droberts195 droberts195 added >bug :ml Machine learning labels Oct 13, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@droberts195
Copy link
Contributor Author

The other option here is that we switch to a strategy of creating our indices without relying on the auto-create mechanism at all. Security does this:

public void prepareIndexIfNeededThenExecute(final Consumer<Exception> consumer, final Runnable andThen) {

However, there is only ever one active security index, whereas ML has several, and in future we'll want to use rollover. Rollover works best with the standard auto-create mechanism, so that points to us continuing to use auto-create.

@dimitris-athanasiou
Copy link
Contributor

Don't we win this for free once we need to store the job config in an index?

@droberts195
Copy link
Contributor Author

Don't we win this for free once we need to store the job config in an index?

I don't think it's completely free.

For the support case that caused this issue to be raised some .ml indices had been created. So I imagine they restricted auto-create after having previously run ML. The same thing could happen in the future - a user could create their first job, thus causing the .ml-config index to be created, then restrict auto-creation of indices leading to subsequent problems with ML. So it would be nice to have some other checks that can report the underlying problem with restricted auto-creation.

@davidkyle
Copy link
Member

The .ml-anomalies-{shared|specific} index is created explicitly in JobResultsProvider.createJobResultIndex we do this because of the need to update mappings and check the number of fields in the mapping. The other indices - state, notifications, meta - are created automatically on write.

The chain of events that led to the failure is

  1. Create job and the results index
  2. Run job writing results and model snapshots to the results index
  3. Close job causing the model state to be persisted which then fails because .ml-state could not be auto-created
  4. Re-opening the job fails because there is a model snapshot doc saying model state exists but that state cannot be found

This is important to consider for the config migration change #32905. If the config cannot be written to .ml-config due to an auto-create restriction the background migration task will fail repeated. The only affect will be the spam the log files and the user won't have any feedback on how to remedy the issue. For this reason the migrator must explicitly create the index prior to migration.

@sophiec20
Copy link
Contributor

closing as out of date

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml Machine learning
Projects
None yet
Development

No branches or pull requests

5 participants