Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and optimize "Add XContentFieldFilter (#81970)" #83152

Merged
merged 13 commits into from
Feb 2, 2022

Conversation

mushao999
Copy link
Contributor

@mushao999 mushao999 commented Jan 26, 2022

In "Add XContentFieldFilter (#81970)",I introduced XContentFieldFilter to improve filtering performance for SourceFieldMapper and ShardGetService. But no tests is added. Something unexpected was then found in issue #82891 .So changes in that PR was reverted in #82897 .

  • This PR fix the bug reported in @82891 and add some tests that similiar to XContentMapValuesTests to make new filter behave in the same way with the old map filter.
  • We found that new filter will not preserve empty array and object. @nik9000 had fixed this in jackson-core repository : Allow TokenFilter to preserve empty FasterXML/jackson-core#729. We marked the related tests to @AwaitsFix to wait for this patch to be added into ES.
  • We found that dot in fieldName is not supported in the new filter. So we made some changes to support this feature.(moved to FilterPathBasedFilter support match fieldname with dot #83178 )
  • We also add some null and empty check for the new filter.

This commit introduces XContentFieldFilter, which applies field includes/excludes to 
XContent without having to realise the xcontent itself as a java map.  SourceFieldMapper
and ShardGetService are cut over to use this class
@elasticsearchmachine elasticsearchmachine added v8.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Jan 26, 2022
@nik9000
Copy link
Member

nik9000 commented Jan 26, 2022

Neat! This really is almost the same as #83148. Cool.

@elasticmachine, ok to tests.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we're stuck behind:

We found that new filter will not preserve empty array and object. @nik9000 had fixed this in jackson-core repository : Allow TokenFilter to preserve empty FasterXML/jackson-core#729. We marked the related tests to @AwaitsFix to wait for this patch to be added into ES.

But, otherwise, it makes sense to me.

);
}

public XContentParserConfiguration withSupportDotInFieldName(boolean supportDotInFieldName) {
return new XContentParserConfiguration(registry, deprecationHandler, restApiVersion, includes, excludes, supportDotInFieldName);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the name I used for this variable more. And I like forcing folks to configure it when configuring filtering. That way then they build the filter they have to think about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like yours too. Forgive my poor Englisth, LOL

@@ -71,6 +71,17 @@ private boolean isFinalNode() {
* @return true if the name equal a final node, otherwise return false
*/
boolean matches(String name, List<FilterPath> nextFilters) {
return matches(name, nextFilters, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to keep the two argument version of this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly! I did this because I want to make the changes as smaller as possible. I will change it.

@@ -99,6 +110,23 @@ boolean matches(String name, List<FilterPath> nextFilters) {
nextFilters.add(this);
}

// support dot in path
if (supportDotInFieldName && nextFilters.isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't really wrapped my head around when nextFilters is empty. That's why I did mine in the caller. Could you explain what's up here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NextFilters are used to filter nested sub objects if current fieldName is matched and current FilterPathNode is not a final node. So if nextFilters is not empty, current fieldName must have been matched.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if nextFilters is not empty and our code reach here, current fieldName is not matched. We can then check whether it is the dot case or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! So if nextFilters is non-empty that means that some other filter has already matched - it could be from a previous pattern.

Now, here's a question! Is it valid to skip? What if we had an object like {"foo.bar": {"baz": 2}} and two patterns foo.bar.baz and f*r.bort. If the second one matches first we won't try to do the dots in field names thing and never notice the second one.

Now, my PR has similar but different problems for sure. If you turn on the mode it'll never match f*r for foo.bar keys. I'm realizing now this is wrong. Your approach here of running the test both times is good.

@@ -42,6 +42,13 @@ public String toString() {

private final boolean inclusive;

private boolean supportDotInFieldName = false;

public FilterPathBasedFilter supportDotInFieldName(boolean supportDotInFieldName) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer adding another boolean parameter to the ctor.

filterPaths = FilterPath.compile(singleton("foo.bar.test"));
assertFalse(filterPaths[0].matches("foo.bar.text", nextFilters, true));
assertEquals(nextFilters.size(), 0);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth taking some of the tests from my PR into this one. Just to test the filtering "above" this layer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks , that's what I intended to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@nik9000 nik9000 self-assigned this Jan 26, 2022
@nik9000 nik9000 added the :Search Foundations/Mapping Index mappings, including merging and defining field types label Jan 26, 2022
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jan 26, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@mushao999
Copy link
Contributor Author

The match fieldname with dot part was reverted and move to PR #83178 @nik9000 @romseygeek

elasticsearchmachine pushed a commit that referenced this pull request Jan 28, 2022
Current `FilterPathBasedFilter` does not support match fieldName with
dot. This PR merges the changes of #83148 and #83152 together to add the
ability of `matching fieldName with dot` to the `FilterPathBasedFilter`.
@nik9000
Copy link
Member

nik9000 commented Jan 28, 2022

Now that #83178 is in I think you'll probably want to merge it into this.

@mushao999
Copy link
Contributor Author

Now that #83178 is in I think you'll probably want to merge it into this.

@nik9000 @romseygeek #83178 is merged and related code and tests are updated, thanks.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mushao999, thanks for working on this. I left a couple of comments but it's looking very close.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your work on this. I will merge tomorrow morning.

@romseygeek romseygeek merged commit 919ed3a into elastic:master Feb 2, 2022
@@ -253,15 +249,11 @@ private GetResult innerGetLoadFromStoredFields(
if (fetchSourceContext.fetchSource() == false) {
source = null;
} else if (fetchSourceContext.includes().length > 0 || fetchSourceContext.excludes().length > 0) {
Map<String, Object> sourceAsMap;
// TODO: The source might be parsed and available in the sourceLookup but that one uses unordered maps so different.
// Do we care?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nik9000 @romseygeek is this TODO still valid here? It seems like the code around it changed quite a bit.

Copy link
Contributor Author

@mushao999 mushao999 Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SourceLookup in ShardGetSevice has been removed in #20158, so this todo is invalid now.
I open #85921 to remove another similar todo in this file, please help to reivew it.@javanna

elasticsearchmachine pushed a commit that referenced this pull request Apr 20, 2022
668d557
added some `todo` in `ShardGetService` suggest that source might parsed
and available in the sourceLookup.
https://github.com/elastic/elasticsearch/blob/668d55758b751e848f2b82299e2c5e74af55cd12/src/main/java/org/elasticsearch/index/get/ShardGetService.java#L315
https://github.com/elastic/elasticsearch/blob/668d55758b751e848f2b82299e2c5e74af55cd12/src/main/java/org/elasticsearch/index/get/ShardGetService.java#L431
However ,`SourceLookup` in `ShardGetService` has been remove in #20158
which made these `todo` invalid.  One of them has been removed in #83152
This PR remove the left invalid `todo` in `ShardGetService`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants