Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable support for default model id on HybridQueryBuilder #541

Conversation

vibrantvarun
Copy link
Member

@vibrantvarun vibrantvarun commented Jan 17, 2024

Description

This PR enables the support for Default model Id in Hybrid Query Builder.

Premise:
A customer is facing an issue while using the feature of default model id when neural query is a part of hybrid query.
See example below

query{
     hybrid{
            queries[
                  neural {
                       // do not pass model id here as you expect that neural query enricher will add it here.
                  }
           ]  
     }
}
{
  "from" : 0,
  "size" : 1000,
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "passage": {
              "query": "how many camels are there in the world."
            }
          }
        },
        {
          "neural": {
            "content_vector": {
              "query_text": "how many camels are there in the world.",
              "k": 10,
              "filter" :
              {
                "bool" :
                {
                  "must": [
                    {
                      "term": {
                      "publisher_id": {
                        "value": "abc"
                        }
                      }
                    },
                    {
                      "term": {
                      "published": {
                        "value": true
                        }
                      }
                    }
                  ]
                }
              }
            }
          }
        }
      ]
    }
  },
  "_source": true
}

Reason:
NeuralQueryEnricher processor triggers a neural query visitor to visit all the querybuilders present in the search query and find the neuralquerybuilder and update the model id in it if not present. Visitor finds the visit method in the querybuilder and parses them. However, in hybridquerybuilder the definition of visit method was not there, therefore it was not able to parse the inner sub queries in it.

Issues Resolved

#539

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
@vibrantvarun
Copy link
Member Author

Test method name testVisit has been kept in hybridQueryBuilder to maintain name consistency with OpenSearch
https://github.com/opensearch-project/OpenSearch/pull/10110/files

@vibrantvarun vibrantvarun self-assigned this Jan 17, 2024
Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
Copy link

codecov bot commented Jan 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (7320cd3) 84.33% compared to head (2697e1b) 84.39%.

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #541      +/-   ##
============================================
+ Coverage     84.33%   84.39%   +0.05%     
- Complexity      533      535       +2     
============================================
  Files            40       40              
  Lines          1564     1570       +6     
  Branches        244      245       +1     
============================================
+ Hits           1319     1325       +6     
  Misses          133      133              
  Partials        112      112              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@martin-gaievski
Copy link
Member

martin-gaievski commented Jan 17, 2024

can we add integ test that would fail without changes in this PR? It can be something like: processor is set with the model id, we use hybrid query with neural query as one of sub-queries, do not specify model id in the query. My understanding without this change that is failing with a "missing model id" exception. We can even extend this to a re-indexing scenario, where I suppose similar error will be in reindexing call

@martin-gaievski
Copy link
Member

can we add integ test that would fail without changes in this PR? It can be something like: processor is set with the model id, we use hybrid query with neural query as one of sub-queries, do not specify model id in the query. My understanding without this change that is failing with a "missing model id" exception. We can even extend this to a re-indexing scenario, where I suppose similar error will be in reindexing call

discussed offline with @vibrantvarun, test is already a part of this PR

@martin-gaievski martin-gaievski added Bug Fixes Changes to a system or product designed to handle a programming bug/glitch backport 2.x Label will add auto workflow to backport PR to 2.x branch labels Jan 17, 2024
@vibrantvarun
Copy link
Member Author

vibrantvarun commented Jan 17, 2024

As per discussion with @martin-gaievski offline,
I have tested the scenario where the test(testNeuralQueryEnricherProcessor_whenHybridQueryBuilderAndNoModelIdPassed_thenSuccess) added in NeuralQueryEnricherProcessorIT was failing with an error("Model cannot be non-null") on my local machine If I do not add a visit method in the HybridQueryBuilder.

Therefore after adding the visit method the visitor is able to parse inner subqueries of the hybrid query builder and add the missing model id in it.

Thanks

@vibrantvarun vibrantvarun requested a review from heemin32 January 17, 2024 20:36
Signed-off-by: Varun Jain <varunudr@amazon.com>
@vibrantvarun
Copy link
Member Author

This PR merge is dependent on opensearch-project/security#3959

@heemin32 heemin32 merged commit 98e5534 into opensearch-project:main Jan 23, 2024
54 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 23, 2024
* Enable support for default model id on HybridQueryBuilder

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Adding tests and updating changelog.md

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Optimizing code

Signed-off-by: Varun Jain <varunudr@amazon.com>

* modyfing the tests4

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Addressing Heemin comment

Signed-off-by: Varun Jain <varunudr@amazon.com>

---------

Signed-off-by: Varun Jain <varunudr@amazon.com>
(cherry picked from commit 98e5534)
heemin32 pushed a commit that referenced this pull request Jan 23, 2024
* Enable support for default model id on HybridQueryBuilder

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Adding tests and updating changelog.md

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Optimizing code

Signed-off-by: Varun Jain <varunudr@amazon.com>

* modyfing the tests4

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Addressing Heemin comment

Signed-off-by: Varun Jain <varunudr@amazon.com>

---------

Signed-off-by: Varun Jain <varunudr@amazon.com>
(cherry picked from commit 98e5534)

Co-authored-by: Varun Jain <varunudr@amazon.com>
martin-gaievski pushed a commit that referenced this pull request Jan 24, 2024
* Enable support for default model id on HybridQueryBuilder

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Adding tests and updating changelog.md

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Optimizing code

Signed-off-by: Varun Jain <varunudr@amazon.com>

* modyfing the tests4

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Addressing Heemin comment

Signed-off-by: Varun Jain <varunudr@amazon.com>

---------

Signed-off-by: Varun Jain <varunudr@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Label will add auto workflow to backport PR to 2.x branch Bug Fixes Changes to a system or product designed to handle a programming bug/glitch
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants