Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add index for storing template models #34

Merged
merged 21 commits into from
Jun 14, 2021

Conversation

jmazanec15
Copy link
Member

@jmazanec15 jmazanec15 commented Jun 1, 2021

Description

This PR adds a hidden model index that will be used to stored serialized template indices used during index creation. Some of faiss's index types require a training step before indexing can begin. In order to support these index types, we need a way for a user to train a model index template and serialize it. This template will then be retrieved during segment creation to initialize the faiss index.

The mapping is fairly straightforward:

{
  "properties": {
    "engine": {
      "type": "keyword"
    },
    "model_blob": {
      "type": "binary"
    }
  }
}

The engine is required to identify which engine the model is intended to be used with. The model_blob stores the binary representation of the model.

1 model maps to 1 OpenSearch document in this index. models are identified by the OpenSearch id field. The index allows users to provide a custom id or allow OpenSearch to generate one for them. Documents cannot be updated in this index, however, they can be removed.

For implementing the index, I referred to:

  1. Anomaly Detection plugin
  2. Index Management plugin

Additionally, I added test cases for each operation.

Note -- changes related to jni can be ignored as they are reviewed in #28. The files to be reviewed are:

  1. ModelIndex.java
  2. KNNSettings.java
  3. KNNConstants.java
  4. KNNPlugin.java
  5. model-mapping.json
  6. ModelIndexTests.java

Issues Resolved

#27

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

jmazanec15 added 15 commits May 24, 2021 15:42
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
@jmazanec15 jmazanec15 requested review from VijayanB and vamshin June 1, 2021 19:53
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
public static final String MODEL_BLOB_PARAMETER = "model_blob";

public static final String MODEL_INDEX_MAPPING_PATH = "mappings/model-index.json";
public static final String MODEL_INDEX_NAME = ".knn-model-index";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the file name start with opensearch? May be confirm other plugins system indices naming convention with openSearch changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Conventions, the only requirement is prefixed with ".".

  1. AD is still using "opendistro-"
  2. ISM also prefixes with ".opendistro".

That being said, I think ".opensearch-knn-model-index" is good. I will update.

Setting.Property.NodeScope,
Setting.Property.Dynamic);

public static final Setting<Integer> MODEL_INDEX_NUMBER_OF_REPLICAS_SETTING = Setting.intSetting(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default should be 1 replica?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will update.

* @throws IOException thrown when get mapping fails
*/
public void create(ActionListener<CreateIndexResponse> actionListener) throws IOException {
if (isCreated()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log message here and return?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little worried this may blow up the logs if isCreated is not called by caller. Id prefer not to log here.


private String getMapping() throws IOException {
URL url = ModelIndex.class.getClassLoader().getResource(MODEL_INDEX_MAPPING_PATH);
assert url != null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asserts are sometimes disabled on production hosts. Can we do manual check and throw exception?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will update.

Object blob = getResponse.getSourceAsMap().get(KNNConstants.MODEL_BLOB_PARAMETER);

if (blob == null) {
throw new IllegalArgumentException("ModelID: \"" + modelId + "\" is not present in index");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this message be more user focused? Something like There is no model available with the provided Id?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can update.

Signed-off-by: John Mazanec <jmazane@amazon.com>
Copy link
Member

@vamshin vamshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks

@jmazanec15 jmazanec15 merged commit 1e4e176 into opensearch-project:faiss-develop Jun 14, 2021
jmazanec15 added a commit to jmazanec15/k-NN-1 that referenced this pull request Oct 22, 2021
Signed-off-by: Jack Mazanec <jmazane1@nd.edu>
jmazanec15 added a commit that referenced this pull request Oct 22, 2021
Signed-off-by: Jack Mazanec <jmazane1@nd.edu>
martin-gaievski pushed a commit to martin-gaievski/k-NN that referenced this pull request Mar 7, 2022
Signed-off-by: Jack Mazanec <jmazane1@nd.edu>
martin-gaievski pushed a commit to martin-gaievski/k-NN that referenced this pull request Mar 7, 2022
Signed-off-by: Jack Mazanec <jmazane1@nd.edu>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
martin-gaievski pushed a commit to martin-gaievski/k-NN that referenced this pull request Mar 30, 2022
Signed-off-by: Jack Mazanec <jmazane1@nd.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants