Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid NPE in more_like_this when field has zero tokens #30365

Merged
merged 5 commits into from
May 8, 2018

Conversation

aditya-agrawal
Copy link
Contributor

@aditya-agrawal aditya-agrawal commented May 3, 2018

Closes #30148

At this point ->
https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/action/termvectors/TermVectorsWriter.java#L69
fieldTermVector is getting null in case of empty field. I initialized fieldTermVector with EMPTY_TERMS in case of null. This solves both the given issue given in the ticket.

@karmi
Copy link
Contributor

karmi commented May 3, 2018

Hi @aditya-agrawal, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in yout Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile?

@aditya-agrawal
Copy link
Contributor Author

Done.

@aditya-agrawal
Copy link
Contributor Author

@karmi can you please review the PR and let me know if I need to change anything?

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aditya-agrawal, thanks for opening this PR. In order to get this merged, we would need at least some unit tests that show that this fixes the reported issue. I'm not sure if TermVectorsWriter#setFields can be tested individually, for the NPE this might be possible.
For the whole two fields scenario described in the issue it probably makes sense to add something similar to MoreLikeThisIT, although a unit test would be prefered as well if possible.

@aditya-agrawal
Copy link
Contributor Author

Okay @cbuescher I am working on it.

@aditya-agrawal
Copy link
Contributor Author

@cbuescher I have written the integration test for the two field scenario given in the ticket.
Since I have given a default value to the field that was giving null it will no more throw NPE. Not sure how can I write unit test here to prove this.

@javanna javanna added the :Search/Search Search-related issues that do not fall into other categories label May 7, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test, LGTM. I left one super tiny nit-pick, nothing to really worry about. I will kick of a full CI test run to see if anything else turns up.
Could you add a line mentioning this PRs title and number similar to e.g. https://github.com/elastic/elasticsearch/pull/29641/files#diff-297418c07078d2b31d8e94a7d34b5255 to the docs/CHANGELOG.asciidoc. I think this fix should also be backported to the next 6.x branches. At the moment I believe this means putting the same entry in two places (under 7.0 and 6.4). I will edit the title slightly so make it a bit shorter. Thanks a lot.

client().index(indexRequest("test").type("type").id("1").source(jsonBuilder().startObject()
.field("myField", "and_foo").field("empty", "").endObject())).actionGet();


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you remove the second empty line (or maybe even both?), I think a few empty lines to separate blocks don't hurt but too many are also just bloating the code.

@cbuescher cbuescher changed the title fix more like this queries return 0 results if field in source document has 0 tokens in analyzed field Avoid NPE in more_like_this` when field has zero tokens May 7, 2018
@cbuescher cbuescher changed the title Avoid NPE in more_like_this` when field has zero tokens Avoid NPE in more_like_this when field has zero tokens May 7, 2018
@cbuescher
Copy link
Member

@elasticmachine test this please

@aditya-agrawal
Copy link
Contributor Author

aditya-agrawal commented May 7, 2018

@cbuescher I have made the suggested changes.

@cbuescher
Copy link
Member

@elasticmachine test this please

@cbuescher cbuescher merged commit 27ddb4f into elastic:master May 8, 2018
@aditya-agrawal aditya-agrawal deleted the bug-fix/mlt-empty-field branch May 8, 2018 13:25
cbuescher pushed a commit that referenced this pull request May 8, 2018
Fixes and edge case when using `more_like_this` where TermVectorsWriter
could throw an NPE when a field produced zero tokens after analysis. This
changes the implementation to use an empty list of tokens in this case.

Closes #30148
@cbuescher
Copy link
Member

@aditya-agrawal thanks a lot, merged to master and the upcoming 6.4 branch.

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request May 8, 2018
* master:
  [Docs] Fix typo in cardinality-aggregation.asciidoc (elastic#30434)
  Avoid NPE in `more_like_this` when field has zero tokens (elastic#30365)
  Build: Switch to building javadoc with html5 (elastic#30440)
colings86 pushed a commit that referenced this pull request May 8, 2018
Fixes and edge case when using `more_like_this` where TermVectorsWriter
could throw an NPE when a field produced zero tokens after analysis. This
changes the implementation to use an empty list of tokens in this case.

Closes #30148
dnhatn added a commit that referenced this pull request May 8, 2018
* master:
  Mute ML upgrade test (#30458)
  Stop forking javac (#30462)
  Client: Deprecate many argument performRequest (#30315)
  Docs: Use task_id in examples of tasks (#30436)
  Security: Rename IndexLifecycleManager to SecurityIndexManager (#30442)
  [Docs] Fix typo in cardinality-aggregation.asciidoc (#30434)
  Avoid NPE in `more_like_this` when field has zero tokens (#30365)
  Build: Switch to building javadoc with html5 (#30440)
  Add a quick tour of the project to CONTRIBUTING (#30187)
  Reindex: Use request flavored methods (#30317)
  Silence SplitIndexIT.testSplitIndexPrimaryTerm test failure. (#30432)
  Auto-expand replicas when adding or removing nodes (#30423)
  Docs: fix changelog merge
  Fix line length violation in cache tests
  Add stricter geohash parsing (#30376)
  Add failing test for core cache deadlock
  [DOCS] convert forcemerge snippet
  Update forcemerge.asciidoc (#30113)
  Added zentity to the list of API extension plugins (#29143)
  Fix the search request default operation behavior doc (#29302) (#29405)
dnhatn added a commit that referenced this pull request May 8, 2018
* 6.x:
  Stop forking javac (#30462)
  Fix tribe tests
  Docs: Use task_id in examples of tasks (#30436)
  Security: Rename IndexLifecycleManager to SecurityIndexManager (#30442)
  Packaging: Set elasticsearch user to have non-existent homedir (#29007)
  [Docs] Fix typo in cardinality-aggregation.asciidoc (#30434)
  Avoid NPE in `more_like_this` when field has zero tokens (#30365)
  Build: Switch to building javadoc with html5 (#30440)
  Add a quick tour of the project to CONTRIBUTING (#30187)
  Add stricter geohash parsing (#30376)
  Reindex: Use request flavored methods (#30317)
  Silence SplitIndexIT.testSplitIndexPrimaryTerm test failure.  (#30432)
  Auto-expand replicas when adding or removing nodes (#30423)
  Silence IndexUpgradeIT test failures. (#30430)
  Fix line length violation in cache tests
  Add failing test for core cache deadlock
  [DOCS] convert forcemerge snippet
  Update forcemerge.asciidoc (#30113)
  Added zentity to the list of API extension plugins (#29143)
  Fix the search request default operation behavior doc (#29302) (#29405)
  Watcher: Mark watcher as started only after loading watches (#30403)
  Correct wording in log message (#30336)
  Do not fail snapshot when deleting a missing snapshotted file (#30332)
  AwaitsFix testCreateShrinkIndexToN
  DOCS: Correct mapping tags in put-template api
  DOCS: Fix broken link in the put index template api
  Add put index template api to high level rest client (#30400)
  [Docs] Add snippets for POS stop tags default value
  Remove entry inadvertently picked into changelog
  Move respect accept header on no handler to 6.3.1
  Respect accept header on no handler (#30383)
  [Test] Add analysis-nori plugin to the vagrant tests
  [Rollup] Validate timezone in range queries (#30338)
  [Docs] Fix bad link
  [Docs] Fix end of section in the korean plugin docs
  add the Korean nori plugin to the change logs
  Expose the Lucene Korean analyzer module in a plugin (#30397)
  Docs: remove transport_client from CCS role example (#30263)
  Test: remove cluster permission from CCS user (#30262)
  Watcher: Remove unneeded index deletion in tests
  fix docs branch version
  fix lucene snapshot version
  Upgrade to 7.4.0-snapshot-1ed95c097b (#30357)
  [ML][TEST] Clean up jobs in ModelPlotIT
  Watcher: Ensure trigger service pauses execution (#30363)
  [DOCS] Fixes ordering of changelog sections
  [DOCS] Commented out empty sections in the changelog to fix the doc build. (#30372)
  Make RepositoriesMetaData contents unmodifiable (#30361)
  Change signature of Get Repositories Response (#30333)
  6.x Backport: Terms query validate bug  (#30319)
  InternalEngineTests.testConcurrentOutOfOrderDocsOnReplica should use two documents (#30121)
  Security: reduce garbage during index resolution (#30180)
  Test: use trial license in qa tests with security
  [ML] Add integration test for model plots (#30359)
  SQL: Fix bug caused by empty composites (#30343)
  [ML] Account for gaps in data counts after job is reopened (#30294)
  [ML] Refactor DataStreamDiagnostics to use array (#30129)
  Make licensing FIPS-140 compliant (#30251)
  Do not load global state when deleting a snapshot (#29278)
  Don't load global state when only restoring indices (#29239)
  Tests: Use different watch ids per test in smoke test (#30331)
  Watcher: Make start/stop cycle more predictable and synchronous (#30118)
  [Docs] Add term query with normalizer example
  Adds Eclipse config for xpack licence headers (#30299)
  Fix message content in users tool (#30293)
  [DOCS] Removed X-Pack breaking changes page
  [DOCS] Added security breaking change
  [DOCS] Fixes link to TLS LDAP info
  [DOCS] Merges X-Pack release notes into changelog (#30350)
  [DOCS] Fixes broken links to bootstrap user (#30349)
  [Docs] Remove errant changelog line
  Fix NPE when CumulativeSum agg encounters null/empty bucket (#29641)
  [DOCS] Reorganizes authentication details in Stack Overview (#30280)
  Tests: Simplify VersionUtils released version splitting (#30322)
  Fix merging logic of Suggester Options (#29514)
  ReplicationTracker.markAllocationIdAsInSync may hang if allocation is cancelled (#30316)
  [DOCS] Adds LDAP realm configuration details (#30214)
  [DOCS] Adds native realm configuration details (#30215)
  Disable SSL on testing old BWC nodes (#30337)
  [DOCS] Enables edit links for X-Pack pages
  Cancelling a peer recovery on the source can leak a primary permit (#30318)
  SQL: Reduce number of ranges generated for comparisons (#30267)
  [DOCS] Adds links to changelog sections
  Convert server javadoc to html5 (#30279)
  REST Client: Add Request object flavored methods (#29623)
  Create default ES_TMPDIR on Windows (#30325)
  [Docs] Clarify `fuzzy_like_this` redirect (#30183)
  Fix docs of the `_ignored` meta field.
  Add a new `_ignored` meta field. (#29658)
  Move repository-azure fixture test to QA project (#30253)
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request May 9, 2018
…or-you

* elastic/master: (22 commits)
  Docs: Test examples that recreate lang analyzers  (elastic#29535)
  BulkProcessor to retry based on status code (elastic#29329)
  Add GET Repository High Level REST API (elastic#30362)
  add a comment explaining the need for RetryOnReplicaException on missing mappings
  Add `coordinating_only` node selector (elastic#30313)
  Stop forking groovyc (elastic#30471)
  Avoid setting connection request timeout (elastic#30384)
  Use date format in `date_range` mapping before fallback to default (elastic#29310)
  Watcher: Increase HttpClient parallel sent requests (elastic#30130)
  Mute ML upgrade test (elastic#30458)
  Stop forking javac (elastic#30462)
  Client: Deprecate many argument performRequest (elastic#30315)
  Docs: Use task_id in examples of tasks (elastic#30436)
  Security: Rename IndexLifecycleManager to SecurityIndexManager (elastic#30442)
  [Docs] Fix typo in cardinality-aggregation.asciidoc (elastic#30434)
  Avoid NPE in `more_like_this` when field has zero tokens (elastic#30365)
  Build: Switch to building javadoc with html5 (elastic#30440)
  Add a quick tour of the project to CONTRIBUTING (elastic#30187)
  Reindex: Use request flavored methods (elastic#30317)
  Silence SplitIndexIT.testSplitIndexPrimaryTerm test failure. (elastic#30432)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories v6.4.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants