Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ignore_malformed more vocal about skipped fields #29494

Closed
jpountz opened this issue Apr 12, 2018 · 3 comments
Closed

Make ignore_malformed more vocal about skipped fields #29494

jpountz opened this issue Apr 12, 2018 · 3 comments
Assignees
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@jpountz
Copy link
Contributor

jpountz commented Apr 12, 2018

ignore_malformed is sometimes used to deal with messy data in order to not fail indexing an entire document when only one or two fields are malformed. However once you start using it, this option doesn't give you any feedback about which documents succeeded or failed indexing, which is trappy. It makes it possible to think you are querying all your data when actually the queried field only has the correct format in a minority of documents. It can also make it hard to answer questions like "why does this document not match this query" eg. if the date field has a hard-to-spot typo. I'm especially more worried about this as we are considering opening of the scope of the ignore_malformed option (#12366).

We had a discussion about it with @clintongormley and thought that maybe we should add feedback about parsing failures back to the _source document, similarly to how Logstash's grok plugin can add tags to failed documents.

Exact details are up for discussion but for instance we could add an _ignored field with the list of fields that failed parsing. This may never collide with a document's field since we reject fields that start with an underscore.

@jpountz jpountz added >enhancement discuss :Search Foundations/Mapping Index mappings, including merging and defining field types labels Apr 12, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@clintongormley
Copy link
Contributor

See #12366 (comment) for more

@jpountz
Copy link
Contributor Author

jpountz commented Apr 17, 2018

We discussed this in the search meeting and agreed that we should do something. @jimczi suggested that we could do this with a meta field instead, which we would return with the search API alongside the _index or _id fields. If we follow that path, there is also an option to make this feature available through a plugin to make users even more aware that they are doing something unusual that may lead to data loss. The latter point still needs discussion.

jpountz added a commit to jpountz/elasticsearch that referenced this issue Apr 23, 2018
This adds a new `_ignored` meta field which indexes and stores fields that have
been ignored at index time because of the `ignore_malformed` option. It makes
malformed documents easier to identify by using `exists` or `term(s)` queries
on the `_ignored` field.

Closes elastic#29494
@colings86 colings86 removed the discuss label Apr 26, 2018
jpountz added a commit that referenced this issue May 2, 2018
This adds a new `_ignored` meta field which indexes and stores fields that have
been ignored at index time because of the `ignore_malformed` option. It makes
malformed documents easier to identify by using `exists` or `term(s)` queries
on the `_ignored` field.

Closes #29494
jpountz added a commit that referenced this issue May 2, 2018
This adds a new `_ignored` meta field which indexes and stores fields that have
been ignored at index time because of the `ignore_malformed` option. It makes
malformed documents easier to identify by using `exists` or `term(s)` queries
on the `_ignored` field.

Closes #29494
@javanna javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

5 participants