Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle extremely large values during field mapping #403

Closed
ghukill opened this issue May 4, 2019 · 1 comment
Closed

Handle extremely large values during field mapping #403

ghukill opened this issue May 4, 2019 · 1 comment

Comments

@ghukill
Copy link
Contributor

ghukill commented May 4, 2019

Following error has been observed when a single value for a field is very large:

Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [141.211.168.139:9200] returned Bad Request(400) - Document contains at least one immense term in field="dc_subject.keyword" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[72, 111, 103, 103, 44, 32, 74, 97, 109, 101, 115, 32, 82, 46, 59, 32, 85, 110, 105, 118, 101, 114, 115, 105, 116, 121, 32, 69, 118, 101]...', original message: bytes can be at most 32766 in length; got 38743; Bailing out.

Sounds like removing the long value in the XML fixed this indexing error, but should gracefully handle even extremely long fields (e.g. full-text transcription for a book).

Possible approaches:

  • add flag to xml2kvp that would set truncate length for a field
    • if assuming that index mapping is largely about analysis, and ES is not designed for storing data in that way, not a bad option
  • hard limit: skip the flag, just truncate below the 32766 length
@ghukill
Copy link
Contributor Author

ghukill commented May 6, 2019

Closed: b697a55

@ghukill ghukill closed this as completed May 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant