Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to differentiate between nested fields and those whose names contain . #1793

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

abhinavdangeti
Copy link
Member

@abhinavdangeti abhinavdangeti commented Feb 24, 2023

  • Reference: https://issues.couchbase.com/browse/MB-55699

  • bleve uses "." as the path separator for nested field names.
    This can conflict with those fields whose names contains "."
    within them - which is an allowed parameter.

  • So if one were to index a document such as this ..

      {
          "x": {
              "y": "1"
          },
          "x.y": "2"
      }
    

    The field "x.y" will contain tokens "1" and "2". The real
    problem seeps in here when different analyzers are used for
    these 2 fields - and during search time, an analytic query
    will not be able to accurately pick an analyzer to apply
    over the search criteria. Also problematic is when the
    data types for the 2 fields are different.

  • The proposal here is to decorate field names under the hood
    within backticks to preserve their true meaning.
    So for example ..

    • `a.b` is a single unnested field name
    • `a`.`b` is a nested field name with `b` being a child field of `a`
  • Here're the ramifications with this approach:

    • While indexing, users can still specify fields names as
      they appear in their JSON documents. Underneath the hood
      however, these field names will now be registered with
      their decorated versions to avoid ambiguity.

    • While querying, users can still specify fields as they
      expect to see them within their json documents. Note that,
      it will be the user's responsibility to differentiate
      between nested field names and others.
      Let's consider an index mapping that indexes the document
      used earlier^. The searches that'd work here are ..
      1. {"field": "`x.y`", "match": 2}
      2. {"field": "x.y", "match": 1}
      3. {"field": "`x`.`y`", "match": 1}

    • Users will also be responsible for specifying sort keys,
      facet fields, highlight fields accordingly in their search
      requests. For example ..

        x        : interpreted as `x`
        `x`      : interpreted as `x`
        x.y      : interpreted as `x`.`y`
        `x.y`    : interpreted as `x.y`
        `x`.`y`. : interpreted as `x`.`y`
      
    • In the search response, users will now see decorated
      names for fragments, locations and facets to avoid any
      ambiguous interpretation of the field names.

+ Reference: https://issues.couchbase.com/browse/MB-55699

+ bleve uses "." as the path separator for nested field names.
  This can conflict with those fields whose names contains "."
  within them - which is an allowed parameter.

+ So if one were to index a document such as this ..
  ```
    {
        "x": {
            "y": "1"
        },
        "x.y": "2"
    }
  ```
  The field "x.y" will contain tokens "1" and "2". The real
  problem seeps in here when different analyzers are used for
  these 2 fields - and during search time, an analytic query
  will not be able to accurately pick an analyzer to apply
  over the search criteria.

+ The proposal here is decorate field names under the hood
  within backticks to preserve their true meaning.
  So for example ..
  - ``` `a.b` ``` is a single unnested field name
  - ``` `a`.`b` ``` is a nested field name with ``` `b` ``` being a child field of ``` `a` ```

+ Here're the ramifications with this approach:

    - While indexing, users can still specify fields names as
      they appear in their JSON documents. Underneath the hood
      however, these field names will now be registered with
      their decorated versions to avoid ambiguity.

    - While querying, users can still specify fields as they
      expect to see them within their json documents. Note that,
      it will be the user's responsibility to differentiate
      between nested field names and others.
        Let's consider an index mapping that indexes the document
        used earlier^. The searches that'd work here are ..
            1. ```{"field": "`x.y`", "match": 2}```
            2. ```{"field": "x.y", "match": 1}```
            3. ```{"field": "`x`.`y`", "match": 1}```

    - Users will also be responsible for specifying sort keys,
      facet fields, highlight fields accordingly in their search
      requests. For example ..
      ```
        x        : interpreted as `x`
        `x`      : interpreted as `x`
        x.y      : interpreted as `x`.`y`
        `x.y`    : interpreted as `x.y`
        `x`.`y`. : interpreted as `x`.`y`
      ```

    - In the search response, users will now see decorated
      names for fragments, locations and facets to avoid any
      ambiguous interpretation of the field names.
@iredmail
Copy link
Contributor

Is this change backward-compatible?

@abhinavdangeti
Copy link
Member Author

Not yet, need to think about it.

@abhinavdangeti abhinavdangeti modified the milestone: v2.4.0 Feb 27, 2023
@abhinavdangeti abhinavdangeti changed the title Ability to differentiate between nested fields and those with . Ability to differentiate between nested fields and those whose names contain . May 4, 2023
@abhinavdangeti abhinavdangeti removed this from the v2.4.0 milestone Oct 31, 2023
@abhinavdangeti abhinavdangeti marked this pull request as draft October 31, 2023 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants