Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support for Default Model Id #337

Merged
merged 19 commits into from
Sep 29, 2023

Conversation

vibrantvarun
Copy link
Member

@vibrantvarun vibrantvarun commented Sep 25, 2023

Description

This PR provides a functionality to client to skip passing model id in the neural search query request.

Issues Resolved

70

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
@codecov
Copy link

codecov bot commented Sep 26, 2023

Codecov Report

Merging #337 (ce28954) into main (7bef7a0) will increase coverage by 0.05%.
Report is 1 commits behind head on main.
The diff coverage is 87.50%.

@@             Coverage Diff              @@
##               main     #337      +/-   ##
============================================
+ Coverage     84.56%   84.62%   +0.05%     
- Complexity      427      445      +18     
============================================
  Files            35       38       +3     
  Lines          1289     1333      +44     
  Branches        189      199      +10     
============================================
+ Hits           1090     1128      +38     
- Misses          118      119       +1     
- Partials         81       86       +5     
Files Coverage Δ
...rch/neuralsearch/util/NeuralSearchClusterUtil.java 100.00% <100.00%> (ø)
...g/opensearch/neuralsearch/plugin/NeuralSearch.java 71.42% <50.00%> (-2.26%) ⬇️
...search/processor/NeuralQueryEnricherProcessor.java 92.85% <92.85%> (ø)
...nsearch/neuralsearch/query/NeuralQueryBuilder.java 88.88% <91.66%> (-0.13%) ⬇️
...search/query/visitor/NeuralSearchQueryVisitor.java 76.92% <76.92%> (ø)

@navneet1v
Copy link
Collaborator

@vibrantvarun remove the skip changelog tag and put the info in change log.

@navneet1v
Copy link
Collaborator

@vibrantvarun fill the checklist too

@navneet1v
Copy link
Collaborator

@vibrantvarun add documentation on all public classes and public functions

Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
@navneet1v
Copy link
Collaborator

@vibrantvarun can we fix the github action: codecov/patch and codecov/project

seems like the coverage has dropped

Signed-off-by: Varun Jain <varunudr@amazon.com>
Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few minor things, over looks good


public class NeuralQueryProcessorTests extends OpenSearchTestCase {

public void testFactory() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use naming convention from other tests - whatYouTesting_conditions_expectation, I think may be optional sometimes, when it's obvious, but other two are must

assertEquals("vasdcvkcjkbldbjkd", processor.getModelId());
assertEquals("bahbkcdkacb", processor.getNeuralFieldDefaultIdMap().get("fieldName").toString());

// Missing "query" parameter:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check exception

expectThrows(IllegalArgumentException.class, () -> factory.create(Collections.emptyMap(), null, null, false, configMap, null));
}

public void testProcessRequest() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can use @SneakyThrow for exceptions in tests

*/
@Setter
@Getter
public class NeuralQueryProcessor extends AbstractProcessor implements SearchRequestProcessor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename class to something closer to a processor type name, although I left the final call to you

Signed-off-by: Varun Jain <varunudr@amazon.com>
/**
* Key to reference this processor type from a search pipeline.
*/
public static final String TYPE = "enriching_query_defaults";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neural_query_enricher name seems better here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-gaievski I am finally going with neural_query_enricher

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, that sounds reasonable, please update the class with that new name

) throws IllegalArgumentException {
String modelId;
try {
modelId = (String) config.remove(DEFAULT_MODEL_ID);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why config.remove?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took reference from OS core

) throws IllegalArgumentException {
String modelId;
try {
modelId = (String) config.remove(DEFAULT_MODEL_ID);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -80,6 +80,7 @@ public Collection<Object> createComponents(
final IndexNameExpressionResolver indexNameExpressionResolver,
final Supplier<RepositoriesService> repositoriesServiceSupplier
) {
NeuralSearchClusterUtil.instance().initialize(clusterService);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should start getting rid of these kind of inits and move towards singleton pattern without this kind of inits.

May be an AI for maintainers

Comment on lines +70 to +76
},
"passage_embedding": {
"type": "knn_vector",
"dimension": 768
},
"passage_text": {
"type": "text"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need these new fields? why we cannot use older fields?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they don't have text matching

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need text matching?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i wrote a test case where I am passing querytext as hello world. If I don't add this field here then it will fail

Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
@navneet1v navneet1v merged commit 9c37b0e into opensearch-project:main Sep 29, 2023
16 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-337-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 9c37b0e9dbaac0c4607385c728196f36164f705d
# Push it to GitHub
git push --set-upstream origin backport/backport-337-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-337-to-2.x.

@vibrantvarun vibrantvarun deleted the Default-Processor branch September 29, 2023 19:37
vibrantvarun added a commit to vibrantvarun/neural-search that referenced this pull request Sep 29, 2023
* Support for default Model Id

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Support for Default Model id

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Support for default model Id

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Removing wildcard Imports

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Typo fix

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Integ test cases

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Fixing Integ Test case

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Addressing Comments

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Added Visitor test cases and addressed comments

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Comments Addressed of Jack

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Addressed changes requested by Martin

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Addressed changes requested by Martin

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Fixing test cases

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Increasing test coverage

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Renaming and addressing comments of Martin

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Addressing Comments of Navneet

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Updating tests

Signed-off-by: Varun Jain <varunudr@amazon.com>

---------

Signed-off-by: Varun Jain <varunudr@amazon.com>
vibrantvarun added a commit to vibrantvarun/neural-search that referenced this pull request Sep 29, 2023
Signed-off-by: Varun Jain <varunudr@amazon.com>
martin-gaievski pushed a commit that referenced this pull request Oct 3, 2023
* [Feature] Support for Default Model Id (#337)

Signed-off-by: Varun Jain <varunudr@amazon.com>
},
"passage_embedding": {
"type": "knn_vector",
"dimension": 768
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this need to have a lucene as engine, otherwise it uses default which is nmslib and that fails, as nmslib is not part of the knn zip fopr integ tests @vibrantvarun

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this field used by the way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Label will add auto workflow to backport PR to 2.x branch Features Introduces a new unit of functionality that satisfies a requirement v2.11.0 Issues targeting release v2.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants