Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG in multiple plugins] Using CreateIndexRequest.mapping() without _doc wrapper fails with v1 index templates #14984

Open
dbwiddis opened this issue Jul 26, 2024 · 13 comments
Labels
bug Something isn't working Plugins

Comments

@dbwiddis
Copy link
Member

dbwiddis commented Jul 26, 2024

Describe the bug

** NOTE: This is not a bug in OpenSearch. **

This is existing behavior with legacy/deprecated code, with a Javadoc instructing what to do that many plugins have ignored by copying each other's code. I'm posting here for visibility to as many plugin maintainers as can see it and add the appropriate fix.

The bug (in at least 7 plugins that I've seen with a brief search):

Using new CreateIndexRequest(indexName).mapping(mapping) with a JSON string that does not wrap the mapping in a _doc field as specified in the Javadocs, fails to apply if a v1 Index Template (/_template API, deprecated) is present which matches that index. This causes the index to use dynamic mapping which can cause follow-on failures such as IllegalArgumentException[mapper [master_key] cannot be changed from type [text] to [keyword]].

Related component

Plugins

To Reproduce

  1. Create a (deprecated) v1 template, for example:
PUT /_template/repro-template
{
  "index_patterns": ["*"],
  "order": 0,
  "settings": { },
  "mappings": { }
}

(Note: this is roughly what a "default" template in pre-7.8 ES actually contained and may still exist in clusters built then and later upgraded.)

  1. Define a mapping in a JSON string, for example
String mapping= "{\"dynamic\":false,\"_meta\":{\"schema_version\":1},"
 + "\"properties\":{\"master_key\":{\"type\":\"keyword\"},"
 + "\"create_time\":{\"type\":\"date\",\"format\":\"strict_date_time||epoch_millis\"}}}";
  1. Instantiate a CreateIndexRequest with this mapping:
CreateIndexRequest request = new CreateIndexRequest(indexName).mapping(mapping);
  1. Create the index via IndicesClient:
client.admin().indices().create(request, actionListener);
  1. Observe the index mappings are the default dynamic mappings and not the specified mappings
{
  "properties" : {
    "create_time" : {
      "type" : "long"
    },
    "master_key" : {
      "type" : "text",
      "fields" : {
        "keyword" : {
          "type" : "keyword",
          "ignore_above" : 256
        }
      }
    }
  }
}

Expected behavior

The index should be created with the user-specified mapping.

Additional Details

Plugins

I found thus bug in flow-framework and ml-commons plugins while debugging. It looks like it also exists in, as a minimum, index-management, alerting, k-nn, security-analytics, job-scheduler, and potentially more that I have not verified

Additional context

The Javadoc for CreateIndexRequest clearly states that a _doc field wrapper is required:

/**
* Set the mapping for this index
* <p>
* The mapping should be in the form of a JSON string, with an outer _doc key
* <pre>
* .mapping("{\"_doc\":{\"properties\": ... }}")
* </pre>
*/
public CreateIndexRequest mapping(String mapping) {
this.mappings = mapping;
return this;
}

When a v1 template matching the index is present, the code path followed requires this _doc field.

V1 templates were deprecated in ElasticSearch 7.8.0, so may not be familiar to OpenSearch users, or in common use. However, for users who have regularly upgraded, they may still have some of these existing templates and as the repro case shows, they can still create them even if they're not documented except in legacy documentation/tutorials/examples.

@dbwiddis
Copy link
Member Author

dbwiddis commented Jul 26, 2024

Additional information for other plugins on seeing if you are impacted:

  1. Search for "CreateIndexRequest" and "mapping" in your repo. (Subset of this search)
  2. Look for where the mapping is specified with .mapping().
    • If you use the .mapping(String json) method, you may be impacted.
      • Inspect where that string comes from and if it is not wrapped in _doc field, it's a problem. Either add it (per javadoc) or add the MediaType argument as shown below.
    • If you use .mapping(Map<String,?>) you are ok, code adds the _doc
    • If you use .mapping(BytesReference bytes, XContentType type) you are ok, code adds the _doc
    • If you use .mapping(String source, MediaType mediatype) you are ok because it delegates to BytesReference (this is probably the best fix if you're impacted above)
    • If you use a deprecated .mapping() method, it requires further investigation

Based on my review of the above search link:

My review may be incomplete, other plugin owners should confirm.

@dblock
Copy link
Member

dblock commented Jul 29, 2024

@dbwiddis How about we make this a meta issue and open child issues (you can use https://github.com/opensearch-project/project-meta) that use this one as a parent to track all the instances? We can also move this one to https://github.com/opensearch-project/opensearch-plugins.

@prudhvigodithi
Copy link
Member

prudhvigodithi commented Jul 29, 2024

[Triage]

For the job-scheduler .opendistro-job-scheduler-lock index uses the mapping opensearch_job_scheduler_lock.json which gets created once the dependant plugin uses job-scheduler. From the following output I can see it honors the declared mapping opensearch_job_scheduler_lock.json

{
  ".opendistro-job-scheduler-lock": {
    "aliases": {},
    "mappings": {
      "dynamic": "strict",
      "properties": {
        "job_id": {
          "type": "keyword"
        },
        "job_index_name": {
          "type": "keyword"
        },
        "lock_duration_seconds": {
          "type": "long"
        },
        "lock_time": {
          "type": "date",
          "format": "epoch_second"
        },
        "released": {
          "type": "boolean"
        }
      }
    },
    "settings": {
      "index": {
        "replication": {
          "type": "DOCUMENT"
        },
        "number_of_shards": "1",
        "provided_name": ".opendistro-job-scheduler-lock",
        "creation_date": "1722273784739",
        "number_of_replicas": "1",
        "uuid": "bHh3wovtQXqzwiiFDjRvvg",
        "version": {
          "created": "136367827"
        }
      }
    }
  }
}

The other mapping opensearch_job_scheduler_job_details.json is used with the extension project which is not being used at the moment.

Adding @cwperks @joshpalis @dbwiddis @getsaurabh02

@owaiskazi19
Copy link
Member

@prudhvigodithi for JS I see in CreateIndexRequest mapping beings passed as String here and here. All we have to do is to change to

final CreateIndexRequest request = new CreateIndexRequest(LOCK_INDEX_NAME).mapping(lockMapping(), XContentType.JSON);
 CreateIndexRequest request = new CreateIndexRequest(JOB_DETAILS_INDEX_NAME).mapping(jobDetailsMapping(), XContentType.JSON);

@prudhvigodithi
Copy link
Member

Thanks @owaiskazi19 we can add XContentType.JSON, but what I was trying to say is with the current code I dont see mappings are created with default dynamic mappings, the index is created with the specified mappings.

@owaiskazi19
Copy link
Member

Thanks @owaiskazi19 we can add XContentType.JSON, but what I was trying to say is with the current code I dont see mappings are created with default dynamic mappings, the index is created with the specified mappings.

Mappings won't be created dynamic from the json. It would be if a user initialized the index with v1 index templates.

@prudhvigodithi
Copy link
Member

Mappings won't be created dynamic from the json. It would be if a user initialized the index with v1 index templates.

Right, Coming from here and here, for JS the index creation is handled internally by the plugin and not user initiated.

@eirsep
Copy link
Member

eirsep commented Jul 29, 2024

Using new CreateIndexRequest(indexName).mapping(mapping) with a JSON string that does not wrap the mapping in a _doc field as specified in the Javadocs, fails to apply if a v1 Index Template (/_template API, deprecated) is present which matches that index. This causes the index to use dynamic mapping which can cause follow-on failures such as IllegalArgumentException[mapper [master_key] cannot be changed from type [text] to [keyword]].

I am little confused due to my lack of knowledge on templates. Apologies if my question is naive but I ddint understand this
fails to apply if a v1 Index Template (/_template API, deprecated) is present which matches that index.

I am trying to understand why would system indices created by plugin and not discoverable by user be affected by user's v1 templates?

@dbwiddis
Copy link
Member Author

From the following output I can see it honors the declared mapping opensearch_job_scheduler_lock.json

Try deleting that index and adding the template in step 1 of the original issue and then recreating a job.

@dbwiddis
Copy link
Member Author

I am trying to understand why would system indices created by plugin and not discoverable by user be affected by user's v1 templates?

Because the specific mapping() method called by the plugin has a javadoc describing how it must be used, and that is not followed.

In many of the links above to specific plugins you can use git blame to go back and find the PR that removed the second XContentType argument which uses a different mapping() method, which adds that _doc automatically.

The presence of a v1 template which matches the system index will cause the code to follow a path which loses the user-specified mapping. In particular, v2 mappings process in a list and there is code to add the missing _doc type if it's not there; but in the v1 code path there is a map-based merge, so the template mapping provides the _doc and the code never looks for the provided mapping.

It's a rare edge case:

  • User has v1 templates which means they may have been upgrading regularly since ES 7.7 or earlier
  • User has not used an affected plugin after the bug was introduced
  • User uses the plugin for the first time, and on installation, the mappings are created improperly by the plugin

@vikasvb90
Copy link
Contributor

@dbwiddis In your description you mentioned that mappings from v1 template won't get applied only if following two conditions are met:

  1. A v1 index template matching the index to be created is already present.
  2. A new index is created via the deprecated mapping method of CreateIndexRequest without wrapping the mapping string in _doc.

My question to you is if both conditions are required for the occurrence of this issue then how come plugins you mentioned are impacted currently? For e.g., in ISM I don't think there is any template created first to create the ISM system indices. I will let @bowenlan-amzn further confirm this. Are you suggesting that this can happen in future if we start using templates for the creation of system indices?

@vikasvb90
Copy link
Contributor

One scenario I can think of where this issue can occur today is where there's a template present which unintentionally resolves to system indices used by plugins and then a new system index is created.

@dbwiddis
Copy link
Member Author

@dbwiddis In your description you mentioned that mappings from v1 template won't get applied only if following two conditions are met:

  1. A v1 index template matching the index to be created is already present.
  2. A new index is created via the deprecated mapping method of CreateIndexRequest without wrapping the mapping string in _doc.

My question to you is if both conditions are required for the occurrence of this issue then how come plugins you mentioned are impacted currently?

In the specific examples, we discovered this when people were activating an agent using ML Commons Agent Framework, using a Flow Framework template. In both cases they had used neither plugin before, so the indices did not previously exist.

For e.g., in ISM I don't think there is any template created first to create the ISM system indices. I will let @bowenlan-amzn further confirm this. Are you suggesting that this can happen in future if we start using templates for the creation of system indices?

I am not an expert in any other plugin which is why I did my best to describe the specific conditions required. As an older plugin, users who have upgraded from ES7.7 to the present likely already have the ISM indices created before the bug was introduced (a lot of refactoring was done prior to 2.x to remove typed mappings which introduced this bug).

One scenario I can think of where this issue can occur today is where there's a template present which unintentionally resolves to system indices used by plugins and then a new system index is created.

That is exactly the scenario we are facing. It seems that ES 7.7 and previous used these templates for defaults such as numbers of shards, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Plugins
Projects
None yet
Development

No branches or pull requests

6 participants