-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ignore_malformed
more vocal about skipped fields
#29494
Comments
Pinging @elastic/es-search-aggs |
See #12366 (comment) for more |
We discussed this in the search meeting and agreed that we should do something. @jimczi suggested that we could do this with a meta field instead, which we would return with the search API alongside the |
This adds a new `_ignored` meta field which indexes and stores fields that have been ignored at index time because of the `ignore_malformed` option. It makes malformed documents easier to identify by using `exists` or `term(s)` queries on the `_ignored` field. Closes elastic#29494
This adds a new `_ignored` meta field which indexes and stores fields that have been ignored at index time because of the `ignore_malformed` option. It makes malformed documents easier to identify by using `exists` or `term(s)` queries on the `_ignored` field. Closes #29494
This adds a new `_ignored` meta field which indexes and stores fields that have been ignored at index time because of the `ignore_malformed` option. It makes malformed documents easier to identify by using `exists` or `term(s)` queries on the `_ignored` field. Closes #29494
ignore_malformed
is sometimes used to deal with messy data in order to not fail indexing an entire document when only one or two fields are malformed. However once you start using it, this option doesn't give you any feedback about which documents succeeded or failed indexing, which is trappy. It makes it possible to think you are querying all your data when actually the queried field only has the correct format in a minority of documents. It can also make it hard to answer questions like "why does this document not match this query" eg. if the date field has a hard-to-spot typo. I'm especially more worried about this as we are considering opening of the scope of theignore_malformed
option (#12366).We had a discussion about it with @clintongormley and thought that maybe we should add feedback about parsing failures back to the
_source
document, similarly to how Logstash's grok plugin can add tags to failed documents.Exact details are up for discussion but for instance we could add an
_ignored
field with the list of fields that failed parsing. This may never collide with a document's field since we reject fields that start with an underscore.The text was updated successfully, but these errors were encountered: