-
Notifications
You must be signed in to change notification settings - Fork 684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to differentiate between nested fields and those whose names contain .
#1793
Draft
abhinavdangeti
wants to merge
2
commits into
blevesearch:master
Choose a base branch
from
abhinavdangeti:mb55699
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
abhinavdangeti
force-pushed
the
mb55699
branch
from
February 24, 2023 21:59
2b2d531
to
99aa596
Compare
+ Reference: https://issues.couchbase.com/browse/MB-55699 + bleve uses "." as the path separator for nested field names. This can conflict with those fields whose names contains "." within them - which is an allowed parameter. + So if one were to index a document such as this .. ``` { "x": { "y": "1" }, "x.y": "2" } ``` The field "x.y" will contain tokens "1" and "2". The real problem seeps in here when different analyzers are used for these 2 fields - and during search time, an analytic query will not be able to accurately pick an analyzer to apply over the search criteria. + The proposal here is decorate field names under the hood within backticks to preserve their true meaning. So for example .. - ``` `a.b` ``` is a single unnested field name - ``` `a`.`b` ``` is a nested field name with ``` `b` ``` being a child field of ``` `a` ``` + Here're the ramifications with this approach: - While indexing, users can still specify fields names as they appear in their JSON documents. Underneath the hood however, these field names will now be registered with their decorated versions to avoid ambiguity. - While querying, users can still specify fields as they expect to see them within their json documents. Note that, it will be the user's responsibility to differentiate between nested field names and others. Let's consider an index mapping that indexes the document used earlier^. The searches that'd work here are .. 1. ```{"field": "`x.y`", "match": 2}``` 2. ```{"field": "x.y", "match": 1}``` 3. ```{"field": "`x`.`y`", "match": 1}``` - Users will also be responsible for specifying sort keys, facet fields, highlight fields accordingly in their search requests. For example .. ``` x : interpreted as `x` `x` : interpreted as `x` x.y : interpreted as `x`.`y` `x.y` : interpreted as `x.y` `x`.`y`. : interpreted as `x`.`y` ``` - In the search response, users will now see decorated names for fragments, locations and facets to avoid any ambiguous interpretation of the field names.
abhinavdangeti
force-pushed
the
mb55699
branch
from
February 24, 2023 22:24
99aa596
to
c4245c8
Compare
abhinavdangeti
requested review from
Thejas-bhat,
metonymic-smokey and
moshaad7
February 24, 2023 22:26
Is this change backward-compatible? |
Not yet, need to think about it. |
abhinavdangeti
changed the title
Ability to differentiate between nested fields and those with
Ability to differentiate between nested fields and those whose names contain May 4, 2023
.
.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reference: https://issues.couchbase.com/browse/MB-55699
bleve uses "." as the path separator for nested field names.
This can conflict with those fields whose names contains "."
within them - which is an allowed parameter.
So if one were to index a document such as this ..
The field "x.y" will contain tokens "1" and "2". The real
problem seeps in here when different analyzers are used for
these 2 fields - and during search time, an analytic query
will not be able to accurately pick an analyzer to apply
over the search criteria. Also problematic is when the
data types for the 2 fields are different.
The proposal here is to decorate field names under the hood
within backticks to preserve their true meaning.
So for example ..
`a.b`
is a single unnested field name`a`.`b`
is a nested field name with`b`
being a child field of`a`
Here're the ramifications with this approach:
While indexing, users can still specify fields names as
they appear in their JSON documents. Underneath the hood
however, these field names will now be registered with
their decorated versions to avoid ambiguity.
While querying, users can still specify fields as they
expect to see them within their json documents. Note that,
it will be the user's responsibility to differentiate
between nested field names and others.
Let's consider an index mapping that indexes the document
used earlier^. The searches that'd work here are ..
1.
{"field": "`x.y`", "match": 2}
2.
{"field": "x.y", "match": 1}
3.
{"field": "`x`.`y`", "match": 1}
Users will also be responsible for specifying sort keys,
facet fields, highlight fields accordingly in their search
requests. For example ..
In the search response, users will now see decorated
names for fragments, locations and facets to avoid any
ambiguous interpretation of the field names.